Skip to content

multiline value enclosed in doublequotes cannot be parsed #75

@0asp0

Description

@0asp0

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

  • Version: logstash 7.3.0
  • Operating System: Windows

csv rfc says that a value can contain multiple lines, broken by CRLF as long as the value is enclosed into double quotes:

https://tools.ietf.org/html/rfc4180

 6.  Fields containing line breaks (CRLF), double quotes, and commas
       should be enclosed in double-quotes.  For example:

       "aaa","b CRLF
       bb","ccc" CRLF
       zzz,yyy,xxx

We have multiline fields stored in elasticsearch like stacktraces.
I exported them in discovery via csv export. Then I tried to import them via logstash to another elasticsearch instance.

CSV filter is throwing exception that a quote is missing, because it does not find it on the next new line.

Here are my filters:

    input
    {
      file
      {
        path => ['C:/work/elastic/input/csv/*.csv']
        sincedb_path => "C:/work/elastic/input/csv/db"
        start_position => "beginning"
        codec => multiline
        {
          pattern   => '(^\"\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}.\d{3}\")|(^\"\@timestamp\")'
          negate    => "true"
          what      => "previous"
          max_bytes => "200 MiB"
          max_lines => 10000
          auto_flush_interval=> 2
          }
          }
    }

    filter
    {
      # workaround. Without gsub it will fail
      mutate
      {
        gsub =>  ["message", '\n"', '\\n"']
      }
      csv
      {
        autodetect_column_names => true
        autogenerate_column_names => true
        separator => ";"
        source => "message"
        skip_empty_columns => "true"
        target=> "mycsv"
      }
    }

I found the workaround with mutate's gsub to replace newlines with \n. But I would declare it as a bug which should be solved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions