DTB parser implementing notes
Ever find yourself needing to implement a device tree blob (aka FDT, flattened device tree) parser and want to save yourself some time? Learn from my mistakes!
If you try to do it in one pass, you will hurt yourself
I charged headlong into writing dtb.zig by starting at the top of the Devicetree Specification page on the “Flattened Devicetree (DTB)” Format” and reading down. It looked delightfully simple. Keep in mind, I still didn’t know what I yet needed out of it, just that I probably needed to reference the DTB to get it. (I kind of know better now.)
The tree was taking shape, and then I had to parse the contents of one field by
the contents of a prop in its parent (
Add some contexts and derive them from their parent, allowing overriding for
Then I needed to parse
It turns out the
interrupts property of a node has its format defined by the
#interrupt-cells of the “binding of the interrupt domain root”. It turns out
might be defined forward in the file, as referenced by its phandle.
You find out the same thing about clocks, though the documentation is harder
A clock provider specifies
#clock-cells, which is usually 0 or 1. When
another node refers to a clock on that node, it addresses the phandle of the
clock provider, followed by
#clock-cells worth of cells to index which clock
on that provider.
In other words, a
clocks like this:
00000000: 00000085 0000001c 0000002e
could refer to either:
- one clock specified by phandle
0x85, with a
#clock-cellsof 2, the index being
- two clocks;
- either a clock at phandle
#clock-cellsof 1 indexed by
0x1c, and a clock at
0x2ewith no index, or,
- a clock at
0x85with no index, and a clock at
- either a clock at phandle
- three clocks, all with no index;
You need to be able to look up the clocks and obtain their properties to interpret this, so you need a second pass, or delayed/on-time resolution of fields, or whatever. There end up being quite a few props that need a second pass.
It’s worth noting all numbers and indexes in DTB are in big-endian, unsigned 32-bit integer cells. That makes hexdumps easier, since you can read them byte-by-byte or in groups of 4 and don’t need to rearrange them in your head.
#address-cells of 2 and similar for most 64-bit devices. I saw an
#address-cells of 3 once in a PCIe node and it scared me.
Strings are NUL-terminated, and NUL padded
This tripped me up. Strings are NUL-terminated, and then the field will be
padded with NULs (if needed) to align on a
u32 (i.e. offset divisible by 4).
This is helpful, because a
u32 is literally what will always follow, and Arm
devices (which DTBs are often used on) don’t like unaligned reads.
So, when you need to read a NUL-terminated string, don’t do what I did first:
It seems reasonable at first blush: count the NULs (there’s a much better
then advance the index past the name, plus align to advance past padding.
(Hack for aligning to a power of two,
n-1, then logical AND with
The problem is that I never advanced past the NUL terminator, which is still part of the string. Here are some example NUL-terminated strings:
00000000: 6100 a. 00000000: 616200 ab. 00000000: 61626300 abc. 00000000: 61626364 00 abcd.
Here are the same strings padded with NULs to align on
00000000: 61000000 a... 00000000: 61620000 ab.. 00000000: 61626300 abc. 00000000: 61626364 00000000 abcd....
Here’s the corrected code:
That’s all for now
I ended up separating dtb.zig into two parts, given it’s used in boot-time code where allocating memory can mess around with things:
Traverser, which emits events SAX style. I tried using Zig’s
resumehere, and it works pretty well.
Parserwhich uses the
Traverserand creates an AST, parsing props into an immediately usable AST in two passes.