Skip to content

Option to include original csv line in resulting tuple returned by CSV.decode(stream, options) #78

@npac

Description

@npac

Hi,
Thanks for great library.
The CSV.decode(...) function returns a tuple, either {:ok, map()} in case of success or {:error, binary()} in case of failure to decode.

My usecase. Read a csv file. Insert a row in DB for each csv line. Invalid csv lines must be saved in a separate file for further analyses.

In my case i need to get original csv line in both cases (decode success or failure).

  • In case of decode failure, problematic lines are saved in a separate file for further analyses.
  • In case of successful decode, i still need original csv line since result of decoding must be inserted into a database table and if insert fails i also need to save that line in a file for further analyses.

Do you think that an option could be introduced in the library to return a tuple {:ok, map(), binary()} or {:error, binary(), binary()} where 3rd binary is raw data from input stream? I can try to submit a PR for that if it's ok.

So far i come up with next workaround for my case ...

path
  # stream line by line from csv file
  |> File.stream!()
  # Start stream transformation. We do a csv decode line by line.
  # Tradeoff: CSV.decode reports incorrect line number in case of failure to decode a line. Error will always refer to line 1 :( 
  |> Stream.transform(0, fn line, acc ->
    # Decode a CVS line. The result might be either {:ok, map} or {:error, reason}
    [result] = CSV.decode([line], separator: ?,, headers: [:a, :b, :c]) |> Enum.take(1)
    # we need to keep original line in resulting tuple.
    # in case of an error we must save this line in a separate file
    {[Tuple.append(result, line)], acc + 1}
  end)
  # Process result of decoding using a parallel stream
  # Here our stream contains a tuple 
  # either {:ok, %{...}, "foo, bar, baz"} in case of decode success 
  # or {:error, "Row has ... - expected .. line 1, "foo, bar, baz"} 
  |> ParallelStream.each(&process_decoded(&1))
  |> Stream.run()

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions