-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(python): Fix LazyFrame fetch method references #18033
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #18033 +/- ##
==========================================
+ Coverage 79.87% 79.89% +0.01%
==========================================
Files 1519 1521 +2
Lines 205839 206678 +839
Branches 2898 2906 +8
==========================================
+ Hits 164416 165120 +704
- Misses 40875 41010 +135
Partials 548 548 ☔ View full report in Codecov by Sentry. |
We actually don't want to put the |
In that case should it be excluded/delisted entirely somehow or should I just put it back and leave it as-is? Having the method listed but not documented can be confusing. What about the other changes and the user-guide? |
It should probably be removed from the notes as it is not true anymore. |
Removed fetch docs reference again. As well as the fetch section in the user-guide. |
Can you do a rebase? @rodrigogiraoserrao can you review this one? |
57f1470
to
ff415b1
Compare
Co-authored-by: Rodrigo Girão Serrão <[email protected]>
Are we good to go now? @rodrigogiraoserrao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me; thanks for the help.
@@ -53,27 +53,18 @@ We look at [streaming in more detail here](streaming.md). | |||
|
|||
While you're writing, optimizing or checking your query on a large dataset, querying all available data may lead to a slow development process. | |||
|
|||
You can instead execute the query with the `.fetch` method. The `.fetch` method takes a parameter `n_rows` and tries to 'fetch' that number of rows at the data source. The number of rows cannot be guaranteed, however, as the lazy API does not count how many rows there are at each stage of the query. | |||
Instead, you can scan a subset of your partitions or use `.head`/`.collect` at the beginning and end of your query, respectively. | |||
Keep in mind that the results of aggregations and filters on subsets of your data may not be representative of the result you would get on the full data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
head
s that are after an aggregation don't influence an aggregation. I don't think this sentence makes much sense. Of course head
followed by aggregation influences a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is to have head
at the beginning of your query so in front of the aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. in the given example
q9 = (
pl.scan_csv(f"docs/assets/data/reddit.csv")
.head(10)
.with_columns(pl.col("name").str.to_uppercase())
.filter(pl.col("comment_karma") > 0)
.fetch(n_rows=int(100))
.collect()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it too obvious? Maybe a little, but I have had these types of operations fail in ways I didn't expect. Calling head(10000)
and returning 0 rows might confuse someone. It is just single sentence so I feel like it is justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, with the accompanied query it makes sense. 👍
Thanks all! |
Fixes #17675 and other similar problems
Removing fetch from the miscellaneous.rst listing meant that the method was still listed as a method under LazyFrame, but appeared as a dead link and the method could not be found in the docs at all. This adds it back and fixes the links. Generally depreciated functions are still in the docs and there does not seem to be an elegant way of hiding a "public" function from autodocs entirely.
What fetch looks like under LazyFrame now:
Removed some other fetch references.
I am not sure what do about the fetch part in the user-guide. Should it be removed entirely or replaced with something else?
polars/docs/user-guide/lazy/execution.md
Lines 52 to 58 in 618a710