pandas DataFrame.to_parquet get killed while writing to hdfs3

My code like below
```

from hdfs3 import HDFileSystem

hdfs = HDFileSystem(host=HDFS_HOST, port=HDFS_PORT)
df = hdfs.read_stockquantitylogs(input_path)
df = process_df(df, process_stock_quantity_log, stack_hour=False)

output_path = input_path.replace('/arch', '/clean', 1)

hdfs.makedirs(dirname(output_path))
with hdfs.open(output_path, 'wb') as f:
    df.to_parquet(f)
```

I didn't use dask for now, it is pandas. Here df is [31909929 rows x 3 columns] , I fond if I write 1000, it works. But print `Killed` when write the whole df.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pandas DataFrame.to_parquet get killed while writing to hdfs3 #168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

pandas DataFrame.to_parquet get killed while writing to hdfs3 #168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions