Skip to content

Encoding Issue: Arabic Characters Conflict with Another Language in the Same Row #67

@SheikhThingsUp

Description

@SheikhThingsUp

I've noticed an issue with my script that queries a database and generates a CSV file. Specifically, when a row contains characters from languages other than Arabic and English, the Arabic characters in that row aren't encoded correctly. It seems that either the combine or string methods of CSV are causing this problem.

For example, in a line where the name "Pelé" is present, the Arabic word "نسيج" is transformed into random characters like "Ù�سÙ�ج,Ù�سÙ�ج". Interestingly, when I open the file in VI, I observe that the same word appears differently encoded in two different locations.

I've experimented with both the binary => 1 option and without it, but the issue persists.

my $csv = Text::CSV->new( { binary => 1 } );
open my $fh, ">:encoding(UTF-8)", "new.csv" or die "new.csv: $!";
print $fh "\x{feff}";

my $status = $csv->combine(@row);    # combine columns into a string
my $line   = $csv->string();
print $fh $line

When i take Text::CSV out of hte equation, and just write directly to the fine with minimum transformation (commas and quotes), it works fine.

The issue is also present in CSV_XS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions