Conversation
…allow someone to conveniently know what is being deprecated, and once fixed, switch to CSV.File for a clean upgrade
Start work on supporting custom types
Codecov Report
@@ Coverage Diff @@
## master #656 +/- ##
==========================================
- Coverage 84.47% 82.95% -1.53%
==========================================
Files 8 8
Lines 1701 1748 +47
==========================================
+ Hits 1437 1450 +13
- Misses 264 298 +34
Continue to review full report at Codecov.
|
When parsing, CSV.File/CSV.Rows require an `AbstractVector{UInt8}`. For
`Cmd` and generic `IO` arguments, it's always been a little awkward
because we basically just tried to read the whole thing or had this
weird slurp function that tried to read it all in chunks. This led to a
few surprises for users when their system ran out of memory.
This PR deprecates passing `Cmd` and generic `IO` arguments as inputs,
stating the user should instead do `read(x)` first or find another way
to pass a filename in. I think this will overall lead to less surprises
for users and requiring them to do `read(x)` themselves isn't too
onerous and gives them the chance to read it themselves if that works
for their use-case or find a more efficient way to pass the data in.
This also deprecates the `use_mmap` keyword argument, since we'll mmap
every time a filename is passed in. You can "avoid" this by calling
`read(open(file))` yourself if so needed, but `CSV.File` doesn't hold on
to a reference of the mmapped file any more (unless `lazystrings=true`),
so we shouldn't run into some of the issues we have in the past.
We also cleanup some dependencies here by moving FilePathsBase.jl and
WeakRefStrings.jl to test-only dependencies, since they're not used
internally in CSV.jl anymore.
One last note is that `IOBuffer` is still allowed to be passed as an
input argument since it's basically just a byte buffer underneath
anyway, so we can use the data directly.
Deprecate CSV.read
Deprecate writeheader
Switch from Threads-at-threads to Threads.at-spawn. Fixes #657
Deprecate Cmd and generic IO inputs to CSV.File/CSV.Rows
Deprecate the categorical keyword argument
Looks like we accumulated some performance regressions for CSV.Rows; this fixes most of them. There's still some extra allocations that happen when passing custom types, but it's not too bad and can be solve another day.
Improve CSV.Rows performance
* Added some code comments to help clarify things * Update out-dated variable name usage (e.g. tapes) * Cleaned up dependencies (WeakRefStrings is test-only now) * Added lots of tests to increase coverage * Made multithreaded chunk identification more robust by checking we have correct # of columns for 5 consecutive rows instead of just 1 * Made sure we sync Int64 sentinels in multithreaded parsing * Removed some unused functions * Made sure we're testing type promoting when multithreaded parsing * Add a `tasks::Integer` keyword argument to allow controlling how many tasks will be spawned for multithreaded parsing * Clean up keyword arg docs
Lots of cleanup
Member
Author
|
@NHDaly, sorry for the slightly random ping, but I'm actually diving back in to the code here and want to make a push. would you mind taking a look here if you have a moment? Happy to jump on a quick call too if that'd be easier to do a quick review of what's going on here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.