This is my first project in a Cloudera Quickstart Container. This is a low level approach of getting multiple large (100GB) files and combining them into hdfs. Uses multiprocessing and runs quickly. At most, this uses 7-8GB of memory.
-
Notifications
You must be signed in to change notification settings - Fork 0
novaferg/cloudera-python-test
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Moving large files into hdfs in python using multiprocessing
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published