Text::CSV_XS doesn't accept "accented" characters 
Author Message
 Text::CSV_XS doesn't accept "accented" characters

This test script fails:

    use Text::CSV_XS;
    my $csvp = Text::CSV_XS->new({ sep_char => ';'});
    $_ = "EN01;60x65x1,3;PUR HYTREL?;PAR;PROFILED;1,3000000000000E+00;";
    $csvp->parse($_) or print "Cannot parse line using XS\n";
    ($\, $,) = ("\n", "-*-");
    print $csvp->fields;
-->
    Cannot parse line using XS

Heh? Why not?

If I drop the "?" character (which should look like an "R" in a circle),
it does work. If I put quotes around that field, it works too.

    $_ = "EN01;60x65x1,3;PUR HYTREL;PAR;PROFILED;1,3000000000000E+00;";

  $_ = 'EN01;60x65x1,3;"PUR HYTREL?";PAR;PROFILED;1,3000000000000E+00;';

Why is CSV_XS so picky? Why aren't characters in the character code
range 128 .. 255 rejected? Is there any reason why only CSV files with
bare characters in the 32..126 range accepted?

Meanwhile, I'm pretty annoyed with this. Text::CSV, the plain Perl
module, has the same limitation; I could patch that, but it also is
limited to comma only as a delimiter.

Damn.

--
        Bart.



Fri, 14 Feb 2003 03:00:00 GMT  
 Text::CSV_XS doesn't accept "accented" characters

Quote:

>This test script fails:

>    use Text::CSV_XS;
>    my $csvp = Text::CSV_XS->new({ sep_char => ';'});
>    $_ = "EN01;60x65x1,3;PUR HYTREL?;PAR;PROFILED;1,3000000000000E+00;";
>    $csvp->parse($_) or print "Cannot parse line using XS\n";
>Why aren't characters in the character code range 128 .. 255 rejected?

     ^^^^^^
     are

I seem to have found a solution. It looks like specifying "binary" in
the attributes (in the anonymous hash passed to new(), value != false)
solves it.

This is contrary to the documentation:

     binary  If this attribute is TRUE, you may use binary characters in
             quoted fields, including line feeds, carriage returns and
             NUL bytes. (The latter must be escaped as `"0'.) By default
             this feature is off.

I am NOT using them in a quoted field.

Oh well. It looks like I can now continue with my work.

--
        Bart.



Sat, 15 Feb 2003 08:15:57 GMT  
 Text::CSV_XS doesn't accept "accented" characters
Damned I'm beginning to really dispise this module.

        use Text::CSV_XS;
        my $csvp = Text::CSV_XS->new({ sep_char => ';', binary => 1});
        $_ = '12;3/8";34';
        $csvp->parse($_) or print "Cannot parse line using XS\n";
-->
        Cannot parse line using XS

It's the quote character. But since the field doesn't start with a quote
character, this should not have been a problem. Excel has no problem
with it reading the file.

As I said, I didn't generate these datafiles. I just need to process
them. Apparently, Text::CSV_XS won't cut it. It being to picky about
what it accepts is no use.

--
        Bart.



Sat, 15 Feb 2003 03:00:00 GMT  
 Text::CSV_XS doesn't accept "accented" characters

Quote:

> Damned I'm beginning to really dispise this module.

I am sure that statement makes the author of the module feel really
anxious to rush out and change the module to get your approval.

Quote:

>         use Text::CSV_XS;
>         my $csvp = Text::CSV_XS->new({ sep_char => ';', binary => 1});
>         $_ = '12;3/8";34';
>         $csvp->parse($_) or print "Cannot parse line using XS\n";
> -->
>         Cannot parse line using XS

> It's the quote character. But since the field doesn't start with a quote
> character, this should not have been a problem.

Try reading the documentation for the module.  It says:

     $csv = Text::CSV_XS->new();

     is equivalent to

      $csv = Text::CSV_XS->new({
          'quote_char'  => '"',
          'escape_char' => '"',
          'sep_char'    => ',',
          'binary'      => 0
      });

In other words the double quote mark has two meanings by default, it is
the character used to quote fields and also the character to escape
characters.  If you want to use it as a plain text mark you need to
redefine the default quote_char and the default escape_char and then
your line will parse fine.

--
Jeff



Sat, 15 Feb 2003 03:00:00 GMT  
 Text::CSV_XS doesn't accept "accented" characters

Quote:


>> Damned I'm beginning to really dispise this module.

>I am sure that statement makes the author of the module feel really
>anxious to rush out and change the module to get your approval.

No, it means that I'm feeling like dumping this module and never look at
it again.

In the meantime, I've written a plain Perl module that can read CSV
files the way I want them to be read. I expect it to be a tad slower,
but speed seems acceptable.

Quote:
>Try reading the documentation for the module.  It says:
>In other words the double quote mark has two meanings by default, it is
>the character used to quote fields and also the character to escape
>characters.  If you want to use it as a plain text mark you need to
>redefine the default quote_char and the default escape_char and then
>your line will parse fine.

But that's not how I *want* it to behave. quote_char and escape_char
both should be '"'. But that *only* counts for quoted fields!

Just to make things absolutely clear: I want these fields to represent
the same thing:

        1/3"
        "1/3"""

The latter is quoted, the former isn't.

An opening quote says that this field is quoted, the closing quote marks
the end, and the '""' represents one quote ('"').

--
        Bart.



Sat, 15 Feb 2003 03:00:00 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. libwww Accept->"text/plain"

2. \b doesn't see accented characters

3. "character class ""bug""

4. /\|/ works but "\|" doesn't

5. TieRegistry doesn't "take"

6. "use" doesn't like variables

7. "can't locate auto/Text...."

8. Accept ("Interrupted system call")

9. sockets "accept" call and Tk

10. translating accented characters to non-accented chars

11. Pasting into Text or Scrolled("Text")

12. DBD::CSV Can't Find Text::CSV_XS

 

 
Powered by phpBB® Forum Software