-
Notifications
You must be signed in to change notification settings - Fork 39
Add ranged properties #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
c18cf0c
7dc50f2
6afcf9c
1a55c2a
b597a67
37db878
a832751
16aba07
b1d69a8
edbfc25
14de45d
906db81
1feb4a9
a96dffe
15f599c
73905dc
c6834f3
b0cc94c
0cee1e6
d7c8a9c
d1e8d74
139c70e
916d6f2
ee3651e
1f794b6
169a1f4
f513596
65b8ad1
94db38c
033ea11
3762d30
8332567
a87b301
7e9b4f4
4b298c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,7 @@ OPTIMADE API specification v1.2.0~develop | |
|
|
||
| entry : names of type of resources, served via OPTIMADE, pertaining to data in a database. | ||
| property : data item that belongs to an entry. | ||
| ranged_property : A property that can be returned in pieces and that supports slicing. | ||
| val : value examples that properties can be. | ||
| :val: is ONLY used when referencing values of actual properties, i.e., information that belongs to the database. | ||
| type : data type of values. | ||
|
|
@@ -67,6 +68,8 @@ OPTIMADE API specification v1.2.0~develop | |
|
|
||
| .. role:: property(literal) | ||
|
|
||
| .. role:: ranged-property(literal) | ||
|
|
||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| .. role:: val(literal) | ||
|
|
||
| .. role:: type(literal) | ||
|
|
@@ -442,6 +445,163 @@ For example, the following query can be sent to API implementations `exmpl1` and | |
|
|
||
| :filter:`filter=_exmpl1_band_gap<2.0 OR _exmpl2_band_gap<2.5` | ||
|
|
||
|
|
||
| Ranged Properties | ||
| ----------------- | ||
|
|
||
| - **Description**: Ranged properties support slicing, so the client can request that only some of the values need to be returned. | ||
| Likewise, the server can use paging to return the property in multiple parts. | ||
| This can be useful for properties that are so large that it can be inconvenient to return them in a single response. | ||
| If an entry is too large to be returned in a single response a link is provided, as described in `JSON Response Schema: Common Fields`_ under the `links.next` field, from which the remainder of the requested data can be retrieved. | ||
| Ranged properties also provide a method to correlate the values of two ranged properties via a :property:`range_ids`. | ||
| The metadata is returned by default, the data is only returned when specifically requested via the :query-param:`ranged property` query parameter as described under `Entry Listing URL Query Parameters`_. | ||
|
|
||
| When a client does not use query parameters to select a range for the ranged property, the server returns a dictionary with meta data about the range property with the following format: | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| - **Type**: dictionary with keys: | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
|
|
||
| - :property:`serialization_format`: string (REQUIRED) | ||
| - :property:`values`: List of any data type (OPTIONAL) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, do I understand correctly that the present design is for the member So, correlated with the thoughts above about pagination: how about making responses always contain two fields for ranged properties, one metadata field with only metadata, and one for the data. That way, even the initial response that contain a ranged property can pass the first page of data the way it would if this was just a regular (non-ranged) property. This means that if the ranged property happens to be small, the client gets all the data it needs in the first response without having to do anything else. Only in the case all data does not fit in one response does the client need to engage with the specifics of accessing ranged properties. This will actually greatly help collections (#386): I felt we were going to end up with an ugly design to force clients to fiddle with range access for all collections, when in many cases of smaller collections this would be unnecessary. So, I propose:
For the field
The field
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, the dictionary would also contain data when the property is listed within the
I do like that all the data is grouped together in a single field. We could also make a separate sub dictionary to group the metadata separate from the data. But overall, it is not that important how the fields are organized.
I am not sure, I like your proposal to return the first “page” of data of a ranged property by default. If a user wants the behaviour you described, they could simply use the If a client would retrieve only a single property, it would make more sense to already provide some of the data, but having a different behaviour for the single entry endpoint could also be confusing/complicate things.
Another solution could perhaps be to define the collection field flexibly so that you could have a regular and a ranged version of this property. I already wrote that if the part after the ranged prefix is the name of a regular property, it should meet all the requirements of that property.
In the JSON API v1.0 links is a reserved key word, so we would have to give this field a different name. (or drop support for json API 1.0) see section 5.2.3 of the JSON API specification. It is quite a big change you are suggesting. Instead of retrieving the data per entry, you want to retrieve data per property. A bit as if each property is a resource in itself.
I do not think returning errors to the user is very nice when the user can’t know in advance that something is wrong. We should probably also think about creating streaming responses, so sending back large data items is less odd to the client. Now we first prepare the response and then send it. But for large responses this would be odd as the user does not get any data back for a while. If we do this, we can send larger responses, and we would not need ranged properties as often. But this is probably more of an issue for the optimade python tools rather than a change in the optimade specification. Sara and Gian-Marco have asked me to prioritize writing a tutorial for the old trajectory proposal, so it may take a while before I get back to this topic.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Where does the text of this PR presently say this, i.e., that the behavior of
I think there is a point in separating data and metadata, especially when we consider the possibility of searching for data. But I agree that whether the data goes inside or outside the dictionary is not the most crucial design decision. Perhaps we can discuss it on the web meeting.
The first page doesn't need to be very large, it is up to the implementation - it could be, say, 20 values. If the client wants the data it asks for the field (and should then be happy to also get the data), if the client does not want the data it should not ask for the field.
I'm trying to move this in the direction where implementations that do not support random access of the data does not have to implement I hadn't even considered that the way you've written this is meant to support a range selection over every entry that matches a filter. I do not think implementations should be forced to support that even if they do support range selection over a single entry.
Can you cite the part of the JSON:API specification that you read to say one cannot have
No, that is not what I meant, sorry if I was unclear. This was meant to be a small adjustment in behavior - the URL in links is meant to give you a single entry response with the next page of data for every property using the same range id as the one your are paging in. You can implement it by a URL that adds "property_ranges" selecting the next range (but you don't have to). I see now that I had missed how you want to allow multidimensional indexing with multiple range_ids; one would in that case have to work out how paging works in the multidimensional case. (But, do we need to support multidimensional ranges?)
Well, as you also conclude, that cannot generally be done given the degrees of freedom supported by
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
On line 526/527 it is mentioned that the values are only returned when the property is listed in the property_ranges query parameter.
In that case, the client would still need a way to specify which fields it wants to have. So I think you would still need the property_range query parameter.
Most of the code for the single entry endpoint and the multiple entry endpoint is the same. So I do not think this makes the implementation more difficult. It may actually be more work to create different behaviour for the single and multiple entry points.
“Complex data structures involving JSON objects and arrays are allowed as attribute values. However, any object that constitutes or is contained in an attribute MUST NOT contain a relationships or links member, as those members are reserved by this specification for future use.”
So, if I understand it correctly, you want to give each resource/entry its own next link to retrieve the remainder of this entry, and use the standard JSON API next link to point to the next entry that matches the filter.
The current proposal already supports paging in multiple dimensions. (The range_ids have nothing to do with indexing. It only shows that two properties use the same range, so that values with the same index belong to each other.)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't get what you mean here, sorry. Which fields to return in a response is requested via
I'm trying to propose to make it so that the single mandatory feature to get all data in a ranged property is via a next-like paging system, which wouldn't require interpreting
Oh, wow, I had missed this completely. It is a really strange rule to have for attribute data at arbitrary depth, and it seems they removed it in v1.1. Well, ok then, you are right that we would have to name that inner field something different than
No, the way you describe it does not sound right. Lets me try again. The standard JSON API next link would work as it always has: it pages over a range of returned entries. This is an absolute necessity to adhere to JSON:API. A ranged property would, among its metadata, communicate a "special" next link. Following that next link would give you that entry (that whole entry with all attributes/properties as requested by Maybe the most clear way to express it is that this "feature" is meant to be completely compatible with your design of
Why would this extension belong in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the discussion on the trajectory endpoint, it was concluded that it would be confusing to determine whether the values should be returned based on whether the property was present in the response_fields. If the whole property (metadata + data) is present under a single field, there would be no way to specify with the response_fields parameter that you only want the metadata. I now understand why you wanted to split the data and metadata into two separate fields. That way, you could add the data field to the response_fields to get the data. And not return it otherwise.
Ok, I think I understand you now. I’ll adjust in the proposal.
The cartesian_site_positions field is an example of multidimensional data. |
||
| - :property:`nvalues`: integer (OPTIONAL) | ||
| - :property:`range_ids`: list of strings (OPTIONAL) | ||
| - :property:`n_dim`: integer (REQUIRED) | ||
| - :property:`dim_size`: list of integers (REQUIRED) | ||
| - :property:`offset_linear`: float (OPTIONAL) | ||
| - :property:`step_size_linear`: list of floats (OPTIONAL) | ||
| - :property:`offset_regular`: list of integers (OPTIONAL) | ||
| - :property:`step_size_regular`: list of integers (OPTIONAL) | ||
| - :property:`indexes`: list of lists of integers (OPTIONAL) | ||
|
|
||
| - **Requirements/Conventions**: | ||
|
|
||
| - **Support**: OPTIONAL support in implementations. | ||
| - Ranged properties can be identified by the prefix "_ranged_". If it is a database specific field, the prefix of the database comes first. | ||
| - If the part of the property name after the "_ranged_" prefix matches the name of a OPTIMADE field for the entry point, the values in the list of the :property:`values` MUST follow the rules of this property. | ||
| - By default, only the metadata SHOULD be returned (i.e. all the fields except :property:`values` and :property:`indexes`). | ||
| The :property:`values` and :property:`indexes` fields SHOULD only be returned when requested via the :query-param:`ranged property` as described under `Entry Listing URL Query Parameters`_. | ||
|
|
||
| - **Query**: Queries on the dictionary fields SHOULD be supported, except for the :property:`values` and :property:`indexes` fields for which querying is OPTIONAL. | ||
| - **Keys**: | ||
|
|
||
| - **serialization_format**: To improve the compactness of the data, there are several ways to show to which index a value belongs. | ||
| This is specified by the :property:`serialization_format`. | ||
|
|
||
| - **Type**: string | ||
|
|
||
| - **Requirements/Conventions**: This field MUST be present. | ||
| - **Possible values**: | ||
|
|
||
| - **linear**: The value is a linear function of the indexes. | ||
| This function is defined by :property:`offset_linear` and :property:`step_size_linear`. | ||
| - **regular**: The value is set for one out of every :property:`step_size_sparse` indexes, with :property:`offset_sparse` indicating the index of the first value. | ||
| - **custom**: A separate list with indexes is defined in the field :property:`indexes` to indicate to which index each value belongs. | ||
|
|
||
| - **values**: The values belonging to this property. | ||
| The format of this field depends on the property for which data is stored. | ||
|
|
||
| - **Type**: List of Any | ||
| - **Requirements/Conventions**: The property :property:`values` MUST be present when :property:`serialization_format` is not set to :val:`"linear"`. | ||
|
|
||
| - **nvalues**: The number of values in the field :property:`values`. | ||
|
|
||
| - **Type**: integer | ||
| - **Requirements/Conventions**: The value MUST be present when :property:`serialization_format` is not set to :val:`"linear"`. | ||
|
|
||
| - **range_ids**: A list with an identifier for each dimension of the range. It shows that that dimension correlates to the same dimension of another range. | ||
| For example, when data of an MD trajectory is shared, it could be used to indicate that the energies and the cartesian_site_positions of the index in a certain dimension are correlated. i.e. which energy belongs to which set of cartesian_site_positions. | ||
|
|
||
| - **Type**: list of strings | ||
| - **Requirements/Conventions**: OPTIONAL | ||
|
|
||
| - **n_dim**: The number of dimensions this property has. | ||
| - **Type**: integer | ||
| - **Requirements/Conventions**: REQUIRED | ||
| It MUST be equal to the length of the dim_size field. | ||
|
|
||
| - **dim_size**: The dimensions of the range in each dimension. | ||
|
|
||
| - **Type**: list of integers | ||
| - **Requirements/Conventions**: REQUIRED | ||
|
|
||
| - **offset_linear**: If :property:`serialization_format` is set to :val:`"linear"` this property gives the value at the origin, i.e. where the index in all dimensions is 1. | ||
|
|
||
| - **Type**: float | ||
| - **Requirements/Conventions**: The value MAY be present when :property:`serialization_format` is set to :val:`"linear"`, otherwise the value SHOULD NOT be present. | ||
| The default value is 0 in each dimension. | ||
|
|
||
| - **step_size_linear**: If :property:`serialization_format` is set to :val:`"linear"`, this value gives the change in the value of the property per step along each of the dimensions of the range. | ||
| e.g. If the value :property:`offset_linear` = 0.5 and the value of :property:`step_size_linear` = [0.2,0.3] than at index[3,4] the value of the property will be 1.8. | ||
|
|
||
| - **Type**: list of float | ||
| - **Requirements/Conventions**: The value MUST be present when :property:`serialization_format` is set to "linear". | ||
| Otherwise, it SHOULD NOT be present. | ||
|
|
||
| - **offset_regular**: If :property:`serialization_format` is set to :val:`"regular"` this property gives the indexes of the first value. | ||
|
|
||
| - **Type**: list of integers | ||
| - **Requirements/Conventions**: The value MAY be present when :property:`serialization_format` is set to :val:`"regular"`, otherwise the value SHOULD NOT be present. | ||
| The default value is 1 in every dimension. | ||
|
|
||
| - **step_size_regular**: If :property:`serialization_format` is set to :val:`"regular"`, this value indicates that a value is defined one out of every :property:`step_size_regular` steps in each dimension. | ||
|
|
||
| - **Type**: list of integers | ||
| - **Requirements/Conventions**: The value MUST be present when :property:`serialization_format` is set to :val:`"regular"`. | ||
| Otherwise, it SHOULD NOT be present. | ||
|
|
||
| - **indexes**: If :property:`serialization_format` is set to :val:`"custom"`, this field holds the indexes to which the values in the value field belong. | ||
|
|
||
| - **Type**: List of lists of integers | ||
| - **Requirements/Conventions**: The value MUST be present when :property:`serialization_format` is set to "custom". | ||
| Otherwise, it SHOULD NOT be present. The order of the values must be the same as those in :property:`values`. | ||
|
|
||
| - **Example**: | ||
|
|
||
| .. code:: jsonc | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
|
|
||
| { | ||
| "_ranged_cartesian_site_positions": { | ||
| "n_dim": 3, | ||
| "dim_size": [100, 3, 3], | ||
| "range_ids": ["mdsteps","particles","xyz"], | ||
| "serialization_format": "regular", | ||
| "offset_regular": [1, 1, 1], | ||
| "step_size_regular": [1, 1, 1], | ||
| "nvalues": 900, | ||
| "values": [[[2.36, 5.36, 9.56],[7.24, 3.58, 0.56],[8.12, 6.95, 4.56]], | ||
| [[2.38, 5.37, 9.56],[7.24, 3.57, 0.58],[8.11, 6.93, 4.58]], | ||
| [[2.39, 5.38, 9.55],[7.23, 3.57, 0.59],[8.10, 6.93, 4.57]], | ||
| // ... | ||
| ], | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| }, | ||
| "_ranged_species_at_sites": { | ||
| "n_dim": 1, | ||
| "dim_size": [3], | ||
| "range_ids": ["particles"], | ||
| "serialization_format": "regular", | ||
| "offset_regular": [0], | ||
| "step_size_regular": [1], | ||
| "nvalues": 3, | ||
| "values": ["He", "Ne", "Ar"], | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| }, | ||
| "_exmpl_ranged_time":{ | ||
| "n_dim": 1, | ||
| "dim_size": [100], | ||
| "range_ids": ["mdsteps"], | ||
| "serialization_format": "linear", | ||
| "step_size_linear": 0.2, | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| }, | ||
| "_exmpl_ranged_thermostat": { | ||
| "n_dim": 1, | ||
| "dim_size": [100], | ||
| "range_ids": ["mdsteps"], | ||
| "serialization_format": "custom", | ||
| "nvalues": 3, | ||
| "values": [20, 40, 60] | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| "indexes": [[0], [20], [80]] | ||
| } | ||
| } | ||
|
|
||
|
|
||
|
|
||
| Responses | ||
| ========= | ||
|
|
||
|
|
@@ -880,6 +1040,25 @@ Standard OPTIONAL URL query parameters not in the JSON API specification: | |
| If provided, these fields MUST be returned along with the REQUIRED fields. | ||
| Other OPTIONAL fields MUST NOT be returned when this parameter is present. | ||
| Example: :query-url:`http://example.com/optimade/v1/structures?response_fields=last_modified,nsites` | ||
| - **property\_ranges**: specifies which ranges should be returned for ranged properties. | ||
| It MUST be supported by databases having ranged properties. | ||
| It consists of a property name directly followed by the range that should be returned. | ||
| A range is written as a list with a list for each dimension. | ||
| Each dimensions list has three values. | ||
| The first value of the range specifies the first index in that dimension for which values should be returned. | ||
| The second value specifies the last index for which values should be returned. | ||
| The third value specifies the step size. | ||
| Ranges can be specified for multiple properties by separating them with a comma. | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| Databases MUST return the values belonging to properties listed and SHOULD use the ranges in this query parameter. | ||
| For properties with :property:`serialization_format` :val:`custom` indexes that fall in the requested range but for which there is no value defined should not be returned. | ||
| For properties with :property:`serialization_format` :val:`regular` indexes that fall in the requested range but for which there is no value defined should have the value :val:`null`. | ||
| The ranges are 1 based, i.e. the first value has index 1, and inclusive i.e. for a the range :val:`[10,20,1]` the last value returned belongs to index 20. | ||
|
JPBergsma marked this conversation as resolved.
Outdated
|
||
| Example: | ||
|
|
||
| If there would be a structure with id: id_12345 and a property :ranged-property:`_ranged_test_field` with the values :val:`[[9.64, 7.52, 0.69, 5.69], [4.82, 8.35, 3.26, 3.25], [4.82, 2.78, 7.87, 7.42], [5.49, 3.48, 1.65, 0.75]` the query: :query-url:`http://example.com/optimade/v1/structures/id_12345?property_ranges=_ranged_test_field[[1, 3, 2], [2, 3, 1]]` | ||
| will return the value: :val:`[[7.52, 0.69], [2.78, 7.87]]` | ||
| Multiple ranges can be requested in one query. e.g. :query-param:`property_ranges=_ranged_test_field[[1, 3, 2], [2, 3, 1]], _ranged_other_field[[1,100,1]]` | ||
|
|
||
|
|
||
| Additional OPTIONAL URL query parameters not described above are not considered to be part of this standard, and are instead considered to be "custom URL query parameters". | ||
| These custom URL query parameters MUST be of the format "<database-provider-specific prefix><url\_query\_parameter\_name>". | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.