Comma delimited fields with line returns inside a double quote 
Author Message
 Comma delimited fields with line returns inside a double quote

Hi,
 I have a script to rewrite a data file which is comma
delimited (sub openfile). Whenver there is a line return in a field, that
field
is wrapped with the double quote character.

I can't grab that field to write into the new data file properly.
In the pasted"sample.txt" file (bottom), line 2 and 3 are actually one
record.

I've attempted a work around (sub strip_wrap), trying to strip out any line
returns for
any field. This does not remove the line return.

Any suggestions

Below is the sub to open the Comma Delimited file:

sub openfile {
#work with each field in uploaded file

close DATA;

$Count2 = 0;

open (LIST, ">>$BaseDir/1_$foo");


if ($CField[5] ne "") {
$AdFile[$Count2] = $Ad;
## serial number $wum to renumber records
$wum = $Count2;
 $Count2++;
 &strip_wrap;

print LIST
qq!$wum|$CField[0]|$CField[1]|$CField[2]|$CField[3]|$CField[4]|$CField[5]|$C
Field[6]|$CField[7]|$CField[8]|$CField[9]|$CField[10]|$CField[11]|$CField[12
]|$CField[13]|$CField[14]|$CField[15]|$CField[16]|$CField[17]|$CField[18]|$C
Field[19]|$CField[20]|$CField[21]|$CField[22]|$CField[23]|$CField[24]|$CFiel
d[25]|0\n!;

Quote:
}
}
}

sub strip_wrap {

## need device to remove return from within a comma delimited field
$CField[0] =~ s/%0D%0A/ /go;
$CField[0] =~ s/\n/ /go;
$CField[0] =~ s/\r/ /go;

## ... and so on through all the fields

Quote:
}

--

sample.txt snip <<
Line 1: First Name,Last Name,Middle Name,Name,Nickname,E-mail Address,Home
Street,Home City,Home Postal Code,Home State,Home Country,Home Phone,Home
Fax,Mobile Phone,Personal Web Page,Business Street,Business City,Business
Postal Code,Business State,Business Country,Business Web Page,Business
Phone,Business Fax,Pager,Company,Job Title,Department,Office Location,Notes

Willow
Line 3: Ave",city,zip,state,Canada,phone,fax,cell,url,Bus
address,bcity,bzip,bprov,bcountry,,phone,fax,pager,ASM,btitle,bdept,office
num,test
Line 4: Adrian,Dan,,Adrian Dan,,,,,,,,(416) 245-1927,,,,,,,,,,,,,,,,,
Line 5: Adrian,Heaps,,Adrian

Line 6: Alan,Burt,,Alan Burt,,,,,,,,(416) 266-4682,,,,,,,,,,,,,,,,,

Quote:
>> end snip

Chris Bell

Tel: (416) 698-5763


Mon, 04 Oct 2004 16:18:17 GMT  
 Comma delimited fields with line returns inside a double quote
"Interesting Code" !

Maybe not your full answer but don't forget to check for \t in your sub as
well.

Quote:
> sub strip_wrap {

> ## need device to remove return from within a comma delimited field
> $CField[0] =~ s/%0D%0A/ /go;
> $CField[0] =~ s/\n/ /go;
> $CField[0] =~ s/\r/ /go;

> ## ... and so on through all the fields

> }



Mon, 04 Oct 2004 18:08:41 GMT  
 Comma delimited fields with line returns inside a double quote

Quote:

> Hi,
>  I have a script to rewrite a data file which is comma
> delimited (sub openfile). Whenver there is a line return in a field, that
> field
> is wrapped with the double quote character.

> I can't grab that field to write into the new data file properly.
> In the pasted"sample.txt" file (bottom), line 2 and 3 are actually one
> record.

> I've attempted a work around (sub strip_wrap), trying to strip out any line
> returns for
> any field. This does not remove the line return.

> Any suggestions

I think your problem is caused by the fact that you haven't told perl anything
about the "text text [newline] text" being different from just normal [newline]
so it thinks every newline it finds is the end of a record.   I'd suggest "slurp
mode". This means reading the entire file into a single scalar variable rather
than as a series of lines.  You can then substitute the newlines inside double
quoted strings and leave the real newlines alone.

I've tested this with a very small file and it appeared to work.

Hope this helps

Graham Wood

===================================================================
# open your file
open(INPUT,"input.csv") || die "Can't open input.csv $!\n";

# $/ is the Input record separator (see perldoc perlvar for details). If you
undef it
# perl doesn't split an input file into separate lines

undef $/;

# ... which means you can slurp the whole file into a single scalar variable,
newlines
#  and all
$input=<INPUT>;

close INPUT;

# You can then replace any newlines in between " characters with
# spaces (or anything else you want to) and remove the " characters
# This may need repeating for \r if \n doesn't catch everything.
#  Using [^"] in the pattern to match after the \n means the first " you hit
after the \n
# will terminate the pattern match.

$input=~ s/"(.*)\n([^"]*)"/$1 $2/g;

# then you can split $input at the remaining newlines and loop through
# them
$wum=1;
foreach(split /\n/,$input){
    #
    # print record number and fields in $_ delimited by "|" then a newline
    #
    print join("|", $wum++ , split(/,/,$_)    ),    "\n";

- Show quoted text -

Quote:
}



Mon, 04 Oct 2004 21:02:23 GMT  
 Comma delimited fields with line returns inside a double quote

Quote:

>Any suggestions

Other folks answered your question, but I have an unrelated
suggestion.  Instead of this:

Quote:
>print LIST
>qq!$wum|$CField[0]|$CField[1]|$CField[2]|$CField[3]|$CField[4]|$CField[5]|$C
>Field[6]|$CField[7]|$CField[8]|$CField[9]|$CField[10]|$CField[11]|$CField[12
>]|$CField[13]|$CField[14]|$CField[15]|$CField[16]|$CField[17]|$CField[18]|$C
>Field[19]|$CField[20]|$CField[21]|$CField[22]|$CField[23]|$CField[24]|$CFiel
>d[25]|0\n!;

Consider writing


It will be easier to read and understand, and you won't have to change

I hope this is helpful.



Mon, 04 Oct 2004 23:23:09 GMT  
 Comma delimited fields with line returns inside a double quote
Quote:

> Hi,
> I have a script to rewrite a data file which is comma delimited (sub
> openfile).

[snip stuff... want to turn comma seperated into pipe seperated]

# This assumes that if a field has either quotes and a newline in it,
# any quotes are doubled... that is, if the original was something
# like qq[This has a " quote and a \n newline], then the version
# in the file will be qq["This has a "" quote and a \n newline"].

my $quotepart = qq["[^"]*(?:""[^"]*)*"];
my  $csvregex = qq[(?:(?!")[^,\n]*|$quotepart)];
sub openfile {
   open( my($csv), "<", "$BaseDir/sample.txt")
      or die "Couldn't open $BaseDir/sample.txt: $!";
   open( my($psv), ">>", "$BaseDir/1_$foo")
      or die "Couldn't open $BaseDir/1_$foo: $!";
   my ($buf, $record, $line) = ("", 1, 1);
   while( sysread( $csv, $buf, 8192, length $buf ) ) {
      while( s/^((?:$csvregex,)*$csvregex\n)//, my $x = $1 ) {

         /^"/ ?
            # if quoted, remove quotes if unneeded.
            (s/^"([^"\n|]*)"$/$1/s && s/""/"/g) :
            # if not quoted, add quotes if needed.
            (/\|/ and (s/"/""/g, $_ = qq["$_"]))


         ++$records; $lines += $x =~ tr/\n//;
      }
      next unless $buf =~ /^$quotepart([^"\n,])/;
      die( "Malformed input, line $line record $record : q[$buf]\n" .
           "Unexpected char q[$1] at position $-[1].\n" );
   }
   warn "Incomplete last record q[$buf]\n" if length $buf;
   close $csv;
   close $psv or die "Couldn't close $BaseDir/1_$foo: $!";

Quote:
}

[untested]

--
print reverse( ",rekcah", " lreP", " rehtona", " tsuJ" )."\n";



Thu, 07 Oct 2004 17:56:31 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Comma delimited fields with line returns inside a double quote

2. Regexp: comma delimited number to point delimited only when within quotes

3. split on comma seperated fields, with quoted fields allowed

4. regexp: comma-delimited (csv) with "" inside

5. Reading Comma delimited, Quoted String records

6. Comma & Quotes delimited file

7. Reading Comma delimited, Quoted String records

8. Parsing comma and quote delimited files...

9. $x{$y} inside double quotes in print problem

10. expanding variable inside double quotes

11. split on comma except when embedded inside quotes

12. Help with comma seperated file with "'s inside fields

 

 
Powered by phpBB® Forum Software