help!: RE parse comma separated list 
Author Message
 help!: RE parse comma separated list

I'm trying to extract data from a comma separated variable format file.
Obviously dealing with foo,bar,wizz,bang is easy enough with split(/,/),
but when there are commas within fields the field is quoted, like:

        foo,"barr,roseanne",wizz,bang

So... I converted all commas to tabs with s/,/\t/g and, before
splitting on the tabs, converted back quoted fields with:

        s/"([^"]*)\t([^"]*)"/\1,\2/g;

But this doesn't work for fields with multiple commas like:
        foo,"bar,pub,tavern,hostelry",wizz,bang

This seems to call for some sort of recursive RE, which is beyond the
level of wizardry I can safely practice. Can someone help?

[Email copies of replies to newsgroup would be greatly appreciated as our
news server sometimes finds itself at a null point in the usenet news
space wave-function.]

--

Computer Services, University of Reading       http://www.*-*-*.com/ ~suqstmbl
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When love is gone, there's always justice
                             When justice is gone, there's always force...
(For Randal)



Mon, 16 Aug 1999 03:00:00 GMT  
 help!: RE parse comma separated list

[ emailed, posted]

: I'm trying to extract data from a comma separated variable format file.
: Obviously dealing with foo,bar,wizz,bang is easy enough with split(/,/),
: but when there are commas within fields the field is quoted, like:

:       foo,"barr,roseanne",wizz,bang

I guess you missed this in the Perl FAQ:

4.28) How can I split a [character] delimited string except when inside
    [character]?

That's probably OK, 'cause the FAQ is really old, and Jeffrey
Friedl now recommends:

---------------------


      "([^\"\\]*(?:\\.[^\"\\]*)*)",?  ## standard string, w/ possible comma
    | ([^,]+),?                       ## anything else, w/ possible comma
    | ,                               ## lone comma
   }gx;
   ## final empty field for trailing comma

---------------------

Let's try it out on your sample data:

---------------------
#!/usr/bin/perl -w

foreach $text ( 'foo,"barr,roseanne",wizz,bang',
                'foo,"bar,pub,tavern,hostelry",wizz,bang') {


      "([^\"\\]*(?:\\.[^\"\\]*)*)",?  ## standard string, w/ possible comma
    | ([^,]+),?                       ## anything else, w/ possible comma
    | ,                               ## lone comma
   }gx;
   ## final empty field for trailing comma

   # see if it worked...

      print "$_\n";
   }

Quote:
}

---------------------

Works for me.

: So... I converted all commas to tabs with s/,/\t/g and, before
: splitting on the tabs, converted back quoted fields with:

:       s/"([^"]*)\t([^"]*)"/\1,\2/g;

: But this doesn't work for fields with multiple commas like:
:       foo,"bar,pub,tavern,hostelry",wizz,bang

: This seems to call for some sort of recursive RE, which is beyond the
: level of wizardry I can safely practice. Can someone help?
                                           ^^^^^^^^^^^^^^^^

Jeffrey can!  ;-)

: [Email copies of replies to newsgroup would be greatly appreciated as our
: news server sometimes finds itself at a null point in the usenet news
: space wave-function.]

OK.

--
    Tad McClellan                          SGML Consulting
    Tag And Document Consulting            Perl programming



Tue, 17 Aug 1999 03:00:00 GMT  
 help!: RE parse comma separated list

Quote:

> I'm trying to extract data from a comma separated variable format file.
> Obviously dealing with foo,bar,wizz,bang is easy enough with split(/,/),
> but when there are commas within fields the field is quoted, like:

>    foo,"barr,roseanne",wizz,bang

> So... I converted all commas to tabs with s/,/\t/g and, before
> splitting on the tabs, converted back quoted fields with:

>    s/"([^"]*)\t([^"]*)"/\1,\2/g;

> But this doesn't work for fields with multiple commas like:
>    foo,"bar,pub,tavern,hostelry",wizz,bang

> This seems to call for some sort of recursive RE, which is beyond the
> level of wizardry I can safely practice. Can someone help?

This is an idea how you could do it, if the examples you gave are the
most difficult.

$" = ":";
while (<>)
        {


        }

The regexp has two options: either the field starts with " and then it
can contain comma, or it starts with something different and then it
cannot.

Hope this helps.

--
------------------------------------------------------------------------

                   I can take or leave it if I please
------------------------------------------------------------------------



Tue, 17 Aug 1999 03:00:00 GMT  
 help!: RE parse comma separated list

You should take a look at the Text::ParseWords module.
--


   Logic Analysis and Optimization
   IBM Rochester, Minnesota
   (507) 253-2304



Tue, 17 Aug 1999 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Getting comma separated number from comma separated file

2. Looking for code to parse records with comma separated values

3. Comma Separated Value (CSV) file, parsing

4. Looking for code to parse records with comma separated values

5. Parsing comma separated fields

6. Matching list of comma-separated ID's

7. comparing a folder full of files to a comma,separated list

8. help me keep commas in a comma delimited list

9. Help reading comma-separated values from file

10. OT? Parse::RecDescent - comma-list question

11. How do I split an list of text into separate lists of paragraphs

 

 
Powered by phpBB® Forum Software