Skip to content

Performance Improvement for Writing Large Datasets #6

@michaelforsterivoflow

Description

@michaelforsterivoflow

Hi synchronal

First of all, I want to express my sincere appreciation for the work you have done on Exceed. The library is incredibly powerful and offers a unique set of features that make it stand out in the field. It has been a valuable tool for our projects, and we are truly grateful for your efforts.

However, we have encountered a performance issue when writing large datasets. While Exceed excels in many areas, it becomes quite slow when processing large amounts of data. In comparison, Elixlsx handles the same task much more efficiently, which has led us to consider using it as an alternative in scenarios where performance is critical.

We understand that optimizing performance for large datasets can be a complex task, but we believe that this improvement would significantly enhance the usability of Exceed in a broader range of applications.

Steps to Reproduce:
Create a dataset with 100,000 records, each with 10 columns.
Write this dataset to an Excel file using Exceed.
Observe the time it takes to complete the operation.

Expected Behavior: The operation should complete in a reasonable amount of time, comparable to Elixlsx.

Actual Behavior: The operation takes significantly longer than expected, making it impractical for real-world use cases with large datasets.

LOG Exceed:
[info] Batch size 1 completed in 14ms, rate: 71.43 rows/sec
[info] Benchmarking with batch size: 10
[info] Batch size 10 completed in 8ms, rate: 1250.0 rows/sec
[info] Benchmarking with batch size: 100
[info] Batch size 100 completed in 64ms, rate: 1562.5 rows/sec
[info] Benchmarking with batch size: 1000
[info] Batch size 1000 completed in 652ms, rate: 1533.74 rows/sec
[info] Benchmarking with batch size: 10000
[info] Batch size 10000 completed in 6401ms, rate: 1562.26 rows/sec
[info] Benchmarking with batch size: 20000
[info] Batch size 20000 completed in 12949ms, rate: 1544.52 rows/sec
[
{1, 14, 71.43},
{10, 8, 1250.0},
{100, 64, 1562.5},
{1000, 652, 1533.74},
{10000, 6401, 1562.26},
{20000, 12949, 1544.52}
]

LOG ELIXLSX:
[info] Processed 684771 rows at 5939.81 rows/sec

Best regards,

Michal Forster

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions