Add new copy of 005 for review purposes. by ccoutant · Pull Request #83 · ccoutant/dwarf-locations

ccoutant · 2024-05-21T20:18:06Z

Fresh copy of 005 so that comments can be made on this pull request with the whole document in view.

simark · 2024-05-22T04:04:58Z

005-1-locations-on-stack.md

+use to cases where the location described is final, and not subject to
+some further modification, with two exceptions. First, if the location
+description is a memory location description, it is a simple DWARF
+expression (Section 2.5) that can be modified by further DWARF


it is a simple DWARF expression (Section 2.5) that can be modified by further DWARF expression operators.

Just to confirm my understanding of what you are saying here. It can be modified, because a memory location description in DWARF 5 is just a value / number?

simark · 2024-05-22T04:12:32Z

005-1-locations-on-stack.md

+Also consider the case where a `DW_OP_call*` operator is used to get the
+location of a variable. If the variable happens to be in a register at
+the current PC, the call operator cannot succeed, as it cannot push
+anything but a memory location on the stack.


It's perhaps nitpicking, but I find it a bit misleading to say that a memory location can be pushed on the stack in DWARF 5 (especially with the following paragraph that says that location descriptions can't be pushed on the stack). I would suggest:

"... as it cannot push anything but an address representing the location of the variable in memory"

simark · 2024-05-22T04:24:14Z

005-1-locations-on-stack.md

+Most existing arithmetic and logical operators, defined in Section 2.5.1.4,
+continue to be limited to operating on values only.
+
+The `DW_OP_deref*` and `DW_OP_xderef*` operators are extended to operate on


How does it work for xderef, which consumes a stack element indicating the memory space? Because of that, I don't think it makes sense for it to operate on locations. It would make no sense to deref a register location while providing an explicit address space number. Even for memory location descriptions, what if the memory location description to deref has address space number 2, but the address space operator of xderef specifies address space number 3?

In the original proposal [1], it keeps the behavior it had in DWARF 5, and is marked deprecated. It consumes a scalar value representing an address and a scalar value representing the address space. I think it would make sense to define it like that here (the goal being only to keep backwards compat).

[1] https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#a-2-5-4-3-4-special-value-operations

simark · 2024-05-22T04:25:57Z

005-1-locations-on-stack.md

+location of the object as implicitly-pushed elements on the stack. The
+latter element is now allowed to be any location.
+
+Two new operators, `DW_OP_offset` and `DW_OP_bit_offset`, are introduced


Do you want to preemptively answer the question "why not just use DW_OP_plus"?

simark · 2024-05-22T04:27:32Z

005-1-locations-on-stack.md

+or a bit offset.
+
+The composite location operators, `DW_OP_piece` and `DW_OP_bit_piece`,
+are redefined to build up a composite location, which is held in the top


nitpicking: when appending a piece (unless using the implicit behavior to append an undefined piece), the composite location itself is not the topmost element of the stack. Not sure it matters, but I mention it in case you want to find a better formulation.

Looks like a bad edit. I'll fix it.

simark · 2024-05-22T04:57:01Z

005-1-locations-on-stack.md

+> object or other entity in memory. On architectures that support
+> multiple address spaces, a memory location contains a component that
+> identifies the address space (which may be provided by the
+> `DW_OP_xderef` operation). A memory location is considered


That sounds wrong. I don't think that DW_OP_xderef pushes a memory location, I think it pushes a value. Unless you mean that a memory location conceptually briefly exists during the execution of DW_OP_xderef?

Removed the parenthetical remark.

simark · 2024-05-22T05:04:36Z

005-1-locations-on-stack.md

+>     The `DW_OP_piece` operation takes a single operand, which is an unsigned
+>     LEB128 number. The number describes the size `S`, in bytes, of the piece
+>     of the object referenced by the location `A` on the top of the stack. If
+>     the piece is located in a register, but does not occupy the entire


If the piece is located in a register, but does not occupy the entire register, the placement of the piece within that register is defined by the ABI.

Yeah, so in our ideal world where we treat all storages uniformly, just as sequences of bits, we wouldn't have that. Perhaps a point to discuss.

In DWARF 5, register location descriptions just specified the whole register, so I guess a consumer could interpret: "when you say reg 2, I magically know that you mean those specific bytes of reg 2".

But now that register locations have an offset component, I'm not sure how a consumer is supposed to interpret these:

DW_OP_composite DW_OP_reg2 DW_OP_piece 2

vs

DW_OP_composite DW_OP_reg2 DW_OP_offset 1 DW_OP_piece 2

In the first one, the register location has an offset of 0. So I guess that consumers in that specific case (offset == 0) could somehow hand waving decide to apply the same "placement defined by ABI" rules they applied in DWARF 5.

But then, what about the second example? The producer specified an explicit offset, so it would be strange to just ignore it and pick some other bytes.

And then, what if the producer really wants to point at those bytes at offset 0 in register 2? If some magic "placement defined by ABI" rules kick in, it's just not possible.

I recall this presentation from Andreas Arnez, if I recall correctly it's a concrete example of this problem.

https://youtu.be/iQAd5Atlz1s?list=PL_GiHdX17Wtx2Bu1O_bREetZZv4moIaRi&t=2137

And also these threads:

[Dwarf-Discuss] DWARF piece questions
https://www.mail-archive.com/dwarf-discuss@lists.dwarfstd.org/msg00344.html

[gdb] [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register
https://inbox.sourceware.org/gdb/m3vb6wm86q.fsf@oc1027705133.ibm.com/

If time permits, I'd like to go through them again to understand what impact our proposal would have for the problems he presents.

Yes, this is the point I was trying to get at in my "registers really are different" arguments and my Feb. 27 email about "offsets".

The issue is that the piece operations define registers as treated differently. So to me it is not that registers are really different, it is that the piece operation were defined to treat them differently.

In the proposal, it redefined the piece operations to tread all storage kinds the same way. That includes registers and implicits. Then registers are not different.

If we do not want to do that then I advocate we should add new piece operations that act like the proposal, and leave the old ones for legacy reasons. The old ones are not very useful when building expressions incrementally as optimizations are applied that change the storage from one kind to another. Then also do not compose if using DWARF procedures.

simark · 2024-05-22T05:28:02Z

005-1-locations-on-stack.md

+> addition of an undefined piece to the existing composite location.
+>
+> - Otherwise, if the top of the stack `A` is a location, or convertible
+> to a location, and the preceding element is not a composite location,


I don't know if you want to be explicit here, that "the preceding element is not a composite location" includes the case where "the preceding element doesn't exist", aka "A is the only element in the stack".

This may need a separate case. I was trying to write it to include the case where A is already the only element on the stack.

I am not in favor of the rule that pops random numbers of entries of the stack. I would much prefer to simply define the expression as invalid. There are many other cases where expressions are stated as being invalid, so why not in this case? It makes for a much simpler formal semantic model that is far easier to reason about.

While discussing this in the meetings, John Del Signore made a point that keeping backwards compatibility with existing expressions was important, to help producers migrate from DWARF 5 to 6 incrementally. That's why we decided to keep those cases in there.

simark · 2024-05-22T05:29:41Z

005-1-locations-on-stack.md

+> one or more elements below `A` are popped and discarded until the
+> preceding element `B` is a composite location, or until `A` is the
+> only element on the stack. If `A` is the only remaining element, a new
+> empty composite is inserted before it (as if `DW_OP_composite


as if DW_OP_composite DW_OP_swap had been processed immediately prior to the piece operation

I understand what you mean, but that might be confusing, because if DW_OP_composite DW_OP_swap had occurred prior to the piece operation, then we wouldn't do all this popping and special case handling.

simark · 2024-05-22T05:35:08Z

005-1-locations-on-stack.md

+> DWARF expression stack before the `DW_AT_use_location` description is
+> evaluated. The first value pushed is the value of the pointer to member
+> object itself. The second value pushed is the location of the
+> entire structure or union instance containing the member whose address


whose address -> whose location?

ghost · 2024-06-05T13:23:25Z

005-1-locations-on-stack.md

+> a value, the value is implicitly treated as a memory address in the
+> default address space, and converted to a memory location. If a value
+> is expected, but the result is a memory location in the default
+> address space, the address is implicitly converted to a value.


Given that values are typed, is it worth specifying here something like "... converted to a value of generic type."?

ghost · 2024-06-05T13:35:30Z

005-1-locations-on-stack.md

+> The `DW_OP_deref_size` takes a single 1-byte unsigned integral operand
+> that specifies the size `S`, in bytes, of the value to be retrieved. The
+> operation behaves like the `DW_OP_deref` operation: it
+> pops the top stack entry and treats it as a location. The first `S` bytes
+> are retrieved from the location, zero extended to the size of an
+> address on the target machine, and pushed onto the stack as a value of
+> the generic type.


What if S is bigger than the size of an address? DWARF-5 says "whose value may not be larger than the size of the generic type". This information seems to have been lost.

ghost · 2024-06-05T13:40:45Z

005-1-locations-on-stack.md

+> is a 1-byte unsigned integer that specifies the size `S` of the type
+> given by the second operand. The second operand is an unsigned LEB128


DWARF-5 explicitly says that S is the same as the size of type T. Here it is not so clear that they must match.

ghost · 2024-06-05T14:00:33Z

005-1-locations-on-stack.md

+> multiple address spaces, a memory location contains a component that
+> identifies the address space (which may be provided by the
+> `DW_OP_xderef` operation). A memory location is considered
+> _unbounded_, as the size of the location storage is only implied by


IMHO, "location storage" is a confusing term, because the storage is not meant to store locations. Can we just say "storage"?

Agreed, but I think "storage" is too generic and not obvious we're using a term for something specific. I've thought about other names for this, but the best I've come up with so far is "storage bank". What do you think?

Agreed, but I think "storage" is too generic and not obvious we're using a term for something specific. I've thought about other names for this, but the best I've come up with so far is "storage bank". What do you think?

I think it's better than "location storage".

ghost · 2024-06-05T14:03:18Z

005-1-locations-on-stack.md

+> identifies the address space (which may be provided by the
+> `DW_OP_xderef` operation). A memory location is considered
+> _unbounded_, as the size of the location storage is only implied by
+> the type of object stored at that location.


Hmm, isn't the size of the storage (in this case, a memory), bounded by the max value of the address? Or do you mean the location (not the underlying storage) implicitly has a size determined by the type of the object stored there?

I've removed this sentence; it was an attempt to define "unbounded" vs. "bounded", but that concept isn't referenced anywhere else in this proposal. I think it's best left to 004-clarifications-mem.

ghost · 2024-06-05T14:11:20Z

005-1-locations-on-stack.md

+Move the contents of Section 2.6.1.1.4 here, replacing the term
+"location description" with "location" throughout.


There is one sentence: "DWARF location descriptions are intended to yield the location of a value rather than the value itself." I think it should not be replaced in this instance.

Don't we need to edit the following text for DW_OP_stack_value more:
"In this form of location description, the DWARF expression
represents the actual value of the object, rather than its location."

to something like:
"In this form of location, the value at the top of the DWARF expression
stack represents the actual value of the object, rather than its location."

There is also
"DW_OP_stack_value operation terminates the expression."
Should this limitation stay? Why not just remove it to have more flexibility? DW_OP_stack_value is essentially the same as DW_OP_implicit_value, except that the contents come from the top of the stack instead of the operand. An implicit value does not say that it should terminate the expression.

ghost · 2024-06-05T14:31:34Z

005-1-locations-on-stack.md

+> optimization). The `DW_OP_undefined` operation pushes an undefined
+> location onto the stack. A DWARF expression containing no operations or


I understand what it meant here by "pushes an undefined location". We are pushing a location of kind "undefined" (similar to "memory", "register", etc. kinds). But I fear that some readers may think an arbitrary location is pushed. I don't have a suggestion to make it better, though. Just noting.

Add new copy of 005 for review purposes.

45da6a7

simark reviewed May 22, 2024

View reviewed changes

ghost reviewed Jun 5, 2024

View reviewed changes

		> is a 1-byte unsigned integer that specifies the size `S` of the type
		> given by the second operand. The second operand is an unsigned LEB128

		Move the contents of Section 2.6.1.1.4 here, replacing the term
		"location description" with "location" throughout.

		> optimization). The `DW_OP_undefined` operation pushes an undefined
		> location onto the stack. A DWARF expression containing no operations or

Comments

Conversation

ccoutant commented May 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants