You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 10, 2021. It is now read-only.
from hdfs3 import HDFileSystem
hdfs = HDFileSystem(host=HDFS_HOST, port=HDFS_PORT)
df = hdfs.read_stockquantitylogs(input_path)
df = process_df(df, process_stock_quantity_log, stack_hour=False)
output_path = input_path.replace('/arch', '/clean', 1)
hdfs.makedirs(dirname(output_path))
with hdfs.open(output_path, 'wb') as f:
df.to_parquet(f)
I didn't use dask for now, it is pandas. Here df is [31909929 rows x 3 columns] , I fond if I write 1000, it works. But print Killed when write the whole df.