-
Notifications
You must be signed in to change notification settings - Fork 68
Add EEP for native records #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bjorng and @garazdawi! The proposal is well written and building on top of records does a beautiful job of keeping the language changes minimal. Removing the undefined as default for fields is another great change, as well as having the optional of having "required fields" (which must be given on creation).
My only criticism to the proposal is that it doesn't discuss maps at all. For the last ~10 years, we have been using maps as replacements for records, and while native records aim to ease the migration between "records -> native records", there are no paths between "maps -> native records". Perhaps this is intentional, you are not expecting anyone to migrate between maps and native records, but then we have to deal with the side-effect that, as native records improve, a lot of code (specially in Elixir) will be forever suboptimal.
If I am allowed to spitball a bit, I'd love if "native records" were implemented as "named maps" or "record maps" behind the scenes. The proposal would stay roughly the same, the only differences are that:
-
Accessing a field in any record could use the existing map syntax and just work:
Expr#Field
-
Updating any record could use the existing map syntax and just work:
Expr#{Field1=Expr1, ..., FieldN=ExprN}
You could also support the proposed #_{...} syntax and it would additionally check it is a record and not a "plain map". is_map/1 would obviously return true for records but you could do a more precise check with is_record.
Regarding the key-destructive map operations, such as maps:put/3 or maps:remove/2, I'd make them fail unless you "untag" the map (or you could allow them to succeed but the tag would be removed at the end, which I find too subtle).
Overall, the goal would be to effectively unify records and maps, removing the decade-old question "records or maps". This would also provide a path for Erlang and Elixir to unify behind the same constructs, so I'd love to hear your opinions.
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth adding more context to the errors, such as {badrecord, {default_missing, key}}. I know format_error can be used (and have additional side-band information attached), but there may be a benefits in being upfront about it too?
This would also allow distinguishing from other errors below, such as {badrecord, not_found}, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More context definitely better, even if it gives only one missing key out of several.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! I think more fine-grained errors would would be greatly beneficial for newcomers especially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the error reason to {novalue,Field} (where novalue is a new error reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing a field in any record could use the existing map syntax and just work:
Expr#Field
In erlang there is no existing map syntax for accessing a single field today,
that is still missing, was never implemented I guess because maps can have arbitrary terms as keys.
I guess it could be added for "literal" fields?
And it is really annoying to have to use the add record_name when accessing fields in records today, e.g. Rec#record_name.field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgud yes! I keep forgetting that it was part of EEP but not implemented.
And it is really annoying to have to use the add record_name when accessing fields in records today, e.g. Rec#record_name.field
I am assuming that, as long as you pattern match on the named record when the variable is defined, the compiler would be able to optimize "unamed" field access and updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming that, as long as you pattern match on the named record when the variable is defined, the compiler would be able to optimize "unamed" field access and updates?
Yes, that should be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion in erlang/otp#9174 on a possible x[y] notation - @michalmuskala suggested M#[Field].
essen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Would it make sense to have finer grained control on who can do what? For example restrict creation to the defining module while still providing access to fields; or read-only fields outside the defining module. Probably doesn't matter for 95% of use cases I reckon.
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More context definitely better, even if it gives only one missing key out of several.
| The following syntax allows accessing field `Field` in any record: | ||
|
|
||
| ```erlang | ||
| Expr#_.Field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a new use case because we could have two records v1 and v2 that share some fields, and use this syntax to access the common fields. It probably needs to be highlighted.
a4bdeec to
f5b4834
Compare
We didn't really think about migrations from maps. Your suggestion seems reasonable. We will discuss this in the OTP team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a cool EEP! Thank you! I've a handful of questions
Language interop
One aspect of this which the document does not touch on that I think could be highly impactful for the BEAM ecosystem is language interop. Today each major language has a different preference for fixed key-value data structures:
- Erlang: maps and records
- Elixir: maps with a special field containing a module atom
- Gleam: records
This creates some degree of friction when calling code from other BEAM languages. If they all were to largely use native records then this friction go away, making interop between languages would be a much better experience.
I'm not immediately seeing any problems for Gleam as we use records there, but it seems like it would be more challenging for Elixir there we use maps.
Adoption within existing OTP modules
It seems that in an ideal world that native records would be the ubiquitous data structure, once they are available. Would existing Erlang/OTP modules be updated to work with them?
Functions that expect classic records, tagged tuples, and maps could have new function clauses added to handle native records in a backwards compatible way, unless I am mistaken. It seems that due to not being compatible with maps or tuples there would be very little ability to update existing functions to return native records.
Is there something we could do here? Or is the expectation that Erlang/OTP code will use different data structures depending on how old it is?
Construction syntax
Is the only difference between the native and classic record syntaxes the # character in the name of the definition? This seems like it will be very error prone, and also hard for less familiar people to debug as the definition will be accepted by the Erlang compiler, but their attempts to construct the record will fail.
Thank you all!
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! I think more fine-grained errors would would be greatly beneficial for newcomers especially.
| ### Anonymous access of native records | ||
| The following syntax allows accessing field `Field` in any record: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference when using this syntax compared to using the non-anonymous syntax?
Are there situations in which one cannot use the anonymous syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference when using this syntax compared to using the non-anonymous syntax?
Yes, certain optimizations we are thinking about implementing will not be applied when using anonymous syntax.
Are there situations in which one cannot use the anonymous syntax?
No.
| 1. Creation of native records cannot be done in guards | ||
| 2. `element/2` will not accept native records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple records can be constructed with field names (using the #blah{a=1,b=2} syntax) or without field names (using the {blah, 1, 2}) syntax. Do native records have a field-nameless #blah{1, 2} construction syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
The nameless syntax for tuple records only works because traditional records are implemented using tuples.
| (tuples). So, their performance characteristics should align with maps | ||
| (with insignificant overhead for runtime validation). Additionally, | ||
| given that native-records are more specialized versions of maps (with | ||
| all keys being atoms), there is potential for optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the expectation that native records will have inferior performance to tuple records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know yet, and it also depends on exactly which operations we are talking about. For example, updating multiple elements in a small map (no more than 32 elements) is quite efficient because it can be done in one pass, and similarly matching out multiple elements is also efficient because it can be done in one pass. Accessing one element at the time is more expensive for a map than for a tuple record.
In the first implementation to be released, native records will be implemented similar to how small maps are implemented, with similar performance. We have an idea for an optimization that would make it faster than maps.
Yes. If a function returns a tuple record, all we can do is to create a new function that returns a native record.
To some extent, yes. I think that is already the case.
Agreed. |
|
Thank you @bjorng. To confirm: all fields must have names? There are no positional fields? Can one define a record which does not have any fields? -record #none{}.
-record #some{value = term()}.
-type option() :: #none{} | #some{} |
Yes.
Yes. |
We are trying to limit the scope here to try to get it in to 29. Hmm, this would require some additional syntax, 'private' | 'protected' | 'public' (borrowing from ets access types), personally |
|
@lpil wrote:
It is a different format, not only the # character, it is like record creation: -record(foo, {a, b, c}).
%% vs.
-record #foo{a, b, c}.So there is also a comma and parentheses that are different. |
|
I don't see the concern raised that we'll now have record, native records and maps. I know they all serve different purposes, but the differences are slight and new users definitely get confused about which to use. I don't think it should be a blocker to adding a new data structure that can help improve devex or performance when choosing the right one but it does worry me that this can't replace records. Related to not replacing records, I think |
Right it's something that likely doesn't have to be done at runtime, but it is important to consider at least in the documentation part, as internal fields definitely shouldn't be documented. Read-only fields can be marked as such easily in the text. But this depends on what the documentation will look like I suppose. |
997a60f to
d2ab33d
Compare
|
Update: there can now be two distinct errors when |
|
@tsloughter wrote:
"Not replacing records" is really phrased: not "Replacing all tuple record usage scenarios.", and "Replace most tuple-record usages without having to update anything but the declaration.". With that in mind I think overloading -record may be more good than bad. |
|
@RaimoNiskanen unless it outright replaces, 100%, named tuple records I think using I take it there are a million reasons At first I wanted I can concede there is no good alternative to |
|
I do share the concern that the similar syntax is confusing, and the difference being 2 characters of punctuation makes it challenging to differentiate between the two when reviewing code.
-struct #state{
values = [] :: list(number()),
avg = 0.0 :: float()
}.Elixir does already have "defstruct", though that construct does seem very similar to this proposal in design and purpose. |
Question: Behavior when adding a field across distributed nodesConsider this scenario: Node A (old code): -record #state{count = 0, name = "default"}.
State = #state{count = 5, name = "server1"}Node B (new code): -record #state{count = 0, name = "default", version = 1}.When Node A sends a
Issue: This seems to check against the definition on Node B, not the fields in the value from Node A. It's unclear whether the update would:
|
Question: Field renaming across nodesConsider this scenario where a field is renamed: Node A (old code): -record #user{id, username, city}.
User = #user{id = 1, username = "alice", city = "Stockholm"}Node B (new code - username renamed to name): -record #user{id, name, city}.When Node A sends
Issue: Field renames appear to be breaking changes in distributed systems. The EEP states:
This means reading works purely based on the value's fields, not the definition. So:
|
Question: Removing a fieldConsider this scenario: Node A (old code): -record #config{host, port, legacy_timeout}.
Config = #config{host = "localhost", port = 8080, legacy_timeout = 5000}Node B (new code - legacy_timeout removed): -record #config{host, port}.When Node A sends
Issue: The interaction between field access (which ignores definition) and pattern matching (which may or may not check definition) is unclear. |
4024026 to
dc31588
Compare
|
It turns out that I had failed to fully update the EEP regarding when a record definition is consulted (or not). In our internal meetings we had decided that the definition for a native record is only used when creating a record. All update and read operations will only refer to the value of record (which includes the names, whether it was exported at value-creation time, the name of the record, and of course all values). I've now pushed another commit to hopefully make that clearer. |
|
If the definition is never consulted on update, how do you upgrade values to the new definition? Do you have to explicitly deconstruct & reconstruct and keep version of the code doing that for all possible values that are live in the system? |
Yes. Upgrades must be done explicitly.
While the design was simple, we discovered that the actual implementation was far from simple, especially when more than one node is involved (or even when loading from the disk native records written by a previous instance of the runtime system). Also, we are not sure that is easy for the user to understand and handle all the implications of automatic upgrades of native records. |
Apply Rickard's suggestions.
601ecf5 to
376a2d0
Compare
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
|
tl;dr: ❤️❤️❤️ Disclaimer: I haven't read all comments here in tiniest detail, so sorry if I repeat a question that has already been answered 😜 As I understand it, native records, exported or not, can be sent to any other process ( Another thing that AFAICS is missing from the doc is (non-)extensibility of native records. (Note that I'm not saying that we should have that. But IMO it should at least be discussed and forthwith mentioned, Goals, Hard/Soft Not-Goals, whatever). That is to say, should or should we not have a way to define "general" native records which other native records can extend like "record |
| As it can be seen from the next sections, working with native records outside of | ||
| the defining module needs using fully qualified names of the native records: | ||
| `#misc:user{}`, ... This quickly may become too cumbersome and verbose. The | ||
| `-import_record` directive works similar to the `-import` and `-import_type` directives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need explicit directives for each case, instead of folding e.g. import_record into one of the existing ones? Is this an explicit design choice, or a technical restriction?
| not defined (in the structure definition). | ||
|
|
||
| * No value is provided for a field FN and the native-record definition | ||
| has no default value for FN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above it's said that this fails with {novalue,FieldName}. Which is true? Or is this something like a {badrecord, {novalue, FieldName}}?
| * The field `Field` is not defined in the native-record value. | ||
| * The definition of the record was not exported when the native-record | ||
| value was created, and it is now used outside its defining module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the non-exported case also be a badrecord error in the anonymous case? Otherwise that seems oddly inconsistent.
| native-record value. | ||
| When a field from a native-record value is accessed, the current | ||
| native-record definition is **not** consulted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is it consulted at compile-time, or can I write an expression that accesses a field that doesn't exist in my record as per the current code?
| An update operation fails with a `{badfield,FN}` error if: | ||
| * The native-record update expression references the field FN which is | ||
| not defined native-record value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if older code on another node sends me a record, and I want to (blindly) update a field in it that my code knows about but the other node doesn't (yet)?
I'm likening this to Protocol Buffers, which let you always do whatever you want with known (to you) fields, while otherwise preserving the message structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I saw the comment saying upgrades are explicit only. Hm. Protobufs have this nice property of being able to deal with values based on your current knowledge of the structure, but they make the cost of this explicit to the user (fields use numeric IDs, and you are responsible for managing them). I wonder if there's a middle ground, though.
|
Good proposal! My overall thought is that I see this as a step towards "cleaning up" struct-like types, i.e. types with known fields. Strongly agree with @josevalim on the unification of maps and records due to this. In fact, this non-goal from the proposal irks me a bit:
This can be read in two ways:
I think that being better should be a design goal: Native records look, in many ways, like "Records V2" - which sets an expectation (and, imho, a reasonable one) that they should be strictly better than "Records V1". I also think introducing as little new syntax and definitions as possible for this would also be great. If there was a way to just declare a record in essentially "the map type language", maybe literally through the |
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
|
When reading about public | private access of fields in another comment I've got a question regarding native records hierarchy:
|
|
This looks very nice! I would like to suggest a -spec to_map(record()) -> #{atom() => term()}.in the reflection API to make it easy to write generic traversals supporting native records: foo(BaseCase) -> ...;
foo([H|T]) ->
[foo(H)|foo(T)];
foo(M) when is_map(M) ->
maps:from_list(foo(maps:to_list(M));
foo(R) when is_record(R) ->
records:create(records:get_module(R), records:get_name(R), foo(records:to_map(R));
foo(X) -> X. |
This is an implementation of native records as desribed in EEP-79 (erlang/eep#81). Co-authored-by: John Högberg <john@erlang.org> Co-authored-by: Isabell Huang <isabell@erlang.org> Co-authored-by: Ilya Klyuchnikov <ilya.klyuchnikov@gmail.com>
zachallaun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to see this! Comments are only related to minor typos found during my read-through.
| Soft Non-Goals | ||
| -------------- | ||
|
|
||
| Here are some features we might consider implementin in a future release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Here are some features we might consider implementin in a future release. | |
| Here are some features we might consider implementing in a future release. |
| * a tuple record | ||
| * a local native record | ||
| * a native record imported using `-import_record()' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * a native record imported using `-import_record()' | |
| * a native record imported using `-import_record()` |
|
|
||
| In OTP 29, the default value can only be an expression that can be | ||
| evaluated to a constant at compile-time. Essentially, the allowed | ||
| expressions are guard expression, with the following additional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| expressions are guard expression, with the following additional | |
| expressions are guard expressions, with the following additional |
| When a native-record value is created, it “captures” key information | ||
| from the current native-record definition, namely: | ||
| * They fully qualified name of the native-record (module name and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * They fully qualified name of the native-record (module name and | |
| * The fully qualified name of the native-record (module name and |
| native-record value. | ||
| When a field from a native-record value is accessed, the current | ||
| native-record definition is **not** consulted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I assume this means that accesses only use information contained in or referenced by the native record value itself, and there is no lookup to the current version of the native record's creating module?
I understand that implementation details are outside the scope of a language proposal, but it would be useful to have (or link to) a discussion about possible implementations. Understanding the performance implications of native records compared to maps and tuple records would be essential before changing code to use them.
|
|
||
| The main advantage of the native records is that more fields can | ||
| be added to the definition without having to recompile all code | ||
| that uses the record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's true then it seems like a fairly weak motivation to me.
Below you briefly write that they could be faster than maps, and that seems a more reasonable motivation.
The rest of the EEP has a lot of stuff about syntax, restrictions, and how-to-do-X, but very little in terms of why I, the user, would want to use these things.
No description provided.