Conversation
These imports are written in a small DSL like:
```rust
#[pymodule(stubs = {
from datetime import datetime as dt, time;
from uuid import UUID;
})]
```
Then parsed, sent as an AST inside the introspection data (following the same AST format as the type hints) and serialized by the introspection crate that merges these imports with the auto generated ones
The `#[pymodule]` parameter is named `stub` because we might include some other features in the future like protocols
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks, I have a feeling it'll take a couple of rounds of review to figure out my preferred design on this one! Please bear with me 🙏
| from datetime import datetime as dt, time; | ||
| from uuid import UUID; |
There was a problem hiding this comment.
I can't decide how I feel about having full Python syntax embedded within the rust source like this. What are the alternatives? e.g. could be text, then it would be possible to use include_str! to maintain the stub as external Python file. That would also avoid the limitations of formatting & Rust syntax.
Possibly also interesting is https://github.com/m-ou-se/inline-python
There was a problem hiding this comment.
I see two advantages of the current design:
- having the imports close to the code (including the custom type annotation) makes it easier to get a single source of truth
- having it parsed by a macro means we validate the syntax and send it to
pyo3-introspectionas an AST. This makes quite easy for the stub generation code to add to these custom imports the extra ones that are needed to for the type annotations generated by PyO3 and get a clean output.
I fear that skipping parsing step would make imports management quite painful and if we have to pay the parsing price, then it's nice to have such an embedding.
But I don't feel strongly on that, glad to pivot the MR toward something else if it's better.
if we go the embedding way, I am not sure about the ;
Pros:
- if we allow more and more Python syntax including top level expressions it is required if we want to keep being able to use
syn - it might make autoformatting possible
Cons:
- departs from Python and confusing for the users
There was a problem hiding this comment.
having it parsed by a macro means we validate the syntax and send it to pyo3-introspection as an AST. This makes quite easy for the stub generation code to add to these custom imports the extra ones that are needed to for the type annotations generated by PyO3 and get a clean output.
This is compelling and makes a lot of sense. Let's keep it inline in the Rust source.
Regarding the ;, I would be keen to drop the requirement for it now, I think it'll make adding future possibilities like Protocol statements hard. I don't think the autoformatting is a real possibility without a lot of future extensions for rustfmt.
Looking at inline-python's source, I think we might be able to use span locations to properly detect whitespace to skip the ; requirement on 1.88+
That might be a good middle ground? I think we can probably still use syn to parse, we definitely wouldn't be able to use Punctuated, but a custom Parse implementation might still be able to collect the Python statements to a Vec...
There was a problem hiding this comment.
Thank you! Make sense! I'm going to experiment in this direction
There was a problem hiding this comment.
Done in 2494f61 Just parsing until end of input works because each python statement is well delimited (no trailing punctuation). Care about line jump will be required when parsing protocols.
I have added a check to ensure that imports are on a single line in 7a2aace
I have also fixed dotted names support: d97c135
There was a problem hiding this comment.
I'm generally opposed to anything resembling "code" inside attributes.
some more cons are:
- no auto formatting inside attributes
- if the user gets something wrong the error messages can get horrific
- additional complexity of having and maintaining a parser for all that stuff
What does an alternative design look like?
There was a problem hiding this comment.
@mejrs Great question!
I see two alternative designs:
- mandate users to write full import paths in their custom type annotations. So, no
Iterator[datetime]butcollections.abc.Iterator[datetime.datetime]. However to generate the imports we need to split in the dotted-names (likemodule.submodule.Class.NestedClass) into the module part (module.submodule) the class-name (Class) and the nested class part if present (NestedClass). I see multiple sub-options:
a. not supporting nested classes at all
b. an heuristic based on upper vs lower-case (that will likely fail on some painful modules likePIL.Image)
c. mandating the user to split it themselves with some Pyo3-bespoke DSL.
I am not sure to like any option. - Let people write this in some
module_extra_stubs.pyi(name TBD) file and then concatenate the file with the stubs generated by PyO3. However, we need there some logic to avoid name conflicts if e.g. the extra stubs contain afrom this_module import foofor a custom type annotation but the auto-generate stubs create afrom other_module import foofor an other auto-generated type annotation. I don't see how to escape that without parsingmodule_extra_stubs.pyi, going back to "additional complexity of having and maintaining a parser for all that stuff". And we loose the nice property of having these small extra stubs colocated with the code.
What do you think? If you have a better idea, I would love that, this bespoke parser is indeed gonna be a pain.
There was a problem hiding this comment.
maybe like
#[pymodule(stubs(dt = datetime::datetime, UUID = uuid::UUID))]perhaps less convenient, but it is valid MetaItem syntax (i believe), so that would ease my concern about horrific error messages
There was a problem hiding this comment.
Good idea! Thanks!
but it is valid MetaItem syntax (i believe)
syn::MetaList value is an arbitrary TokenStream so I am not sure
#[pymodule(stubs(dt = datetime::datetime, UUID = uuid::UUID))]would return better errors than
#[pymodule(stubs = { import datetime.datetime as dt; import uuid.UUID })]because the parsing of whatever is inside of #[pymodule()] will rely on a syn-based parser anyway. So, not sure departing from Python syntax brings us much.
There was a problem hiding this comment.
You would use https://docs.rs/syn/latest/syn/struct.Attribute.html#method.parse_nested_meta I believe.
because the parsing of whatever is inside of #[pymodule()] will rely on a syn-based parser anyway. So, not sure departing from Python syntax brings us much.
Whatever the user writes still has to be somewhat valid, e.g. no unbalanced braces and curly braces. If they make a typo but it looks somewhat like MetaItem syntax then the compiler might still recover and put out a decent error message. That is what I meant with horrific error messages.
| /// A Python statement | ||
| enum PyStatement { | ||
| ImportFrom(PyImportFrom), | ||
| Import(PyImport), | ||
| } |
There was a problem hiding this comment.
I'm curious why limit the stubs to just these statements - is that just to keep initial implementation scope under control?
There was a problem hiding this comment.
Yes! I guess we will expend this list to support things like protocol
Merging this PR will degrade performance by 10.73%
Performance Changes
Comparing Footnotes
|
This feature is required to allow user-provided custom type hint to import types
These imports are written in a small DSL on the module like:
Then parsed, sent as an AST inside the introspection data (following the same AST format as the type hints) and serialized by the introspection crate that merges these imports with the auto generated ones
The
#[pymodule]parameter is namedstubbecause we might include some other features in the future like protocols