-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide #48619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide #48619
Conversation
|
Converting this PR to draft till I figure out what would be the best way to run RST doctest on 3.12 Sphinx Documentation CI job and not on the Python 3.10 Sphinx & Numpydoc. |
|
@raulcd the main pain point why I think the easiest solution would be to run Sphinx & Numpydoc on Python 3.11, or even Python 3.12 (I am not aware of any reason we would need the olderst Python version we support here. Sphinx Documentation runs on docs changes only while Sphinx & Numpydoc runs on any Python or C++ changes and validates the docstrings). |
|
Thanks for checking that @AlenkaF ! So currently we are providing a snippet on our documentation: that will fail for some users as we are still supporting Python 3.10, right? Is it worth for the example to add the I am ok to just bump the Python version of the job but we probably should not provide examples that will fail on some of the supported versions. |
|
Yeah, you are right. Changing to |
|
Ha ha, the example would fail anyways as the year changed in the meantime 🤣 |
Yes, we don't want to have to update this every year because the data changes 😄 |
f417177 to
1fb6f0a
Compare
|
@github-actions crossbow submit preview-docs |
|
Revision: 1fb6f0a Submitted crossbow builds: ursacomputing/crossbow @ actions-b1a47e0770
|
|
Hi @rmnskb @tadeja @zhengruifeng @HyukjinKwon! In case anybody fancies giving a review, it would be much appreciated. Link to the preview: https://s3.amazonaws.com/arrow-data/pr_docs/48619/python/index.html This PR also adds a doctest of the |
rmnskb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🔥 Thanks for working on that! I can imagine it was a tremendous amount of work. Left some general comments about some smaller things that I picked up while looking at the PR, otherwise I think it's good to merge.
| .. code-block:: python | ||
| >>> import pyarrow as pa | ||
| >>> import pyarrow.compute as pc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you decide to opt out from the explicit imports? Does the documentation still compile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to not duplicate imports per page. Meaning once on the top should suffice (see lines above, approx line 32). Yes, the compilation of docs and doctest should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we keep the explicit imports, the examples will be copy-paste able which might be more friendly to users.
|
ack. taking a look now |
|
Thanks for the review @rmnskb! 🎈 |
| | naive| aware| | ||
| +-------------------+-------------------+ | ||
| |2019-01-01 00:00:00|2019-01-01 08:00:00| | ||
| |2018-12-31 23:00:00|2019-01-01 08:00:00| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 2019-01-01 00:00:00 became 2018-12-31 23:00:00 here cuz I suspect you or CI (?) is somewhere in GMT+1. datetime(2019, 1, 1, 0) is assumed as local time (yes it's up to the system to interpret but Spark thinks so). So, Spark thought that it's a local time but the timezone was set as UTC so it decreased one hour.
I think we should probably just skip all here cuz now it seems depending on local timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to just keep the original input/output here and skip all. I will take a separate look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree - these tests should already be skipped. I can remove the diff and keep it as it was before?
The example is good altogether and shows the timezone conversion behaviour. Maybe we could add a note? Claude suggests:
.. note::
The examples above demonstrate timezone conversion behaviour.
The exact output may differ depending on your system's local timezone, as Spark
interprets naive timestamps relative to the local timezone when converting to UTC.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the diff and keep it as it was before?
yupyup. whichever easier. I will take a separate look after this gets merged.
| <FileInfo for 'test.arrow': type=FileType.File, size=3250> | ||
| .. code-block:: python | ||
| >>> local.get_file_info('test.arrow') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a really nitpick .. feel free to ignore it for now as I don't want this kind of comment to block the PR.
Should we remove the file after the test somewhere somehow? Seems like test.arrow file will be created but not removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All saved files should be cleaned up thanks to the fixture here: https://github.com/apache/arrow/pull/48619/files#diff-de7516be9fdc98a7bfc2fe897bd93bff7c7d0a5d62ea50759956c9745082a310.
I have tested this locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: not a nitpick, I think this is a very valid comment!
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: LGTM
How much does it save time BTW? Some might argue that IPython input/output are better and the speed of building could be considered as a secondary (e.g., PySpark doc build takes super duper long - it generates things a lot). For myself, I prefer faster build in any event so my take is on this change.
My main aim was to unify the docs and have the possibility of running doctest on the examples separately. But am curious if there is any change in performance so I will try it out now 😄 (not sure if the amount of IPython directives has been that big before this change, though). |
| .. code-block:: python | ||
| >>> import pyarrow as pa | ||
| >>> import pyarrow.compute as pc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we keep the explicit imports, the examples will be copy-paste able which might be more friendly to users.
Rationale for this change
In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (
InandOutduring the doc build). This can lead to slower builds.What changes are included in this PR?
IPython directives are converted to runnable code-block (with
>>>and...) and pytest doctest support for.rstfiles is added to theconda-python-docsCI job. This means the code in the Python User Guide is tested separately to the building of the documentation.Are these changes tested?
Yes, with the CI.
Are there any user-facing changes?
Changes to the Python User Guide examples will have to be tested with
pytest --doctest-glob='*.rst' docs/source/python/file.rst