Comma delimited fields with line returns inside a double quote
Author |
Message |
Chris Bel #1 / 5
|
 Comma delimited fields with line returns inside a double quote
Hi, I have a script to rewrite a data file which is comma delimited (sub openfile). Whenver there is a line return in a field, that field is wrapped with the double quote character. I can't grab that field to write into the new data file properly. In the pasted"sample.txt" file (bottom), line 2 and 3 are actually one record. I've attempted a work around (sub strip_wrap), trying to strip out any line returns for any field. This does not remove the line return. Any suggestions Below is the sub to open the Comma Delimited file: sub openfile { #work with each field in uploaded file
close DATA; $Count2 = 0;
open (LIST, ">>$BaseDir/1_$foo");
if ($CField[5] ne "") { $AdFile[$Count2] = $Ad; ## serial number $wum to renumber records $wum = $Count2; $Count2++; &strip_wrap; print LIST qq!$wum|$CField[0]|$CField[1]|$CField[2]|$CField[3]|$CField[4]|$CField[5]|$C Field[6]|$CField[7]|$CField[8]|$CField[9]|$CField[10]|$CField[11]|$CField[12 ]|$CField[13]|$CField[14]|$CField[15]|$CField[16]|$CField[17]|$CField[18]|$C Field[19]|$CField[20]|$CField[21]|$CField[22]|$CField[23]|$CField[24]|$CFiel d[25]|0\n!; Quote: } } }
sub strip_wrap { ## need device to remove return from within a comma delimited field $CField[0] =~ s/%0D%0A/ /go; $CField[0] =~ s/\n/ /go; $CField[0] =~ s/\r/ /go; ## ... and so on through all the fields Quote: }
-- sample.txt snip << Line 1: First Name,Last Name,Middle Name,Name,Nickname,E-mail Address,Home Street,Home City,Home Postal Code,Home State,Home Country,Home Phone,Home Fax,Mobile Phone,Personal Web Page,Business Street,Business City,Business Postal Code,Business State,Business Country,Business Web Page,Business Phone,Business Fax,Pager,Company,Job Title,Department,Office Location,Notes
Willow Line 3: Ave",city,zip,state,Canada,phone,fax,cell,url,Bus address,bcity,bzip,bprov,bcountry,,phone,fax,pager,ASM,btitle,bdept,office num,test Line 4: Adrian,Dan,,Adrian Dan,,,,,,,,(416) 245-1927,,,,,,,,,,,,,,,,, Line 5: Adrian,Heaps,,Adrian
Line 6: Alan,Burt,,Alan Burt,,,,,,,,(416) 266-4682,,,,,,,,,,,,,,,,, Quote: >> end snip
Chris Bell
Tel: (416) 698-5763
|
Mon, 04 Oct 2004 09:18:17 GMT |
|
 |
John Foge #2 / 5
|
 Comma delimited fields with line returns inside a double quote
"Interesting Code" ! Maybe not your full answer but don't forget to check for \t in your sub as well. Quote: > sub strip_wrap { > ## need device to remove return from within a comma delimited field > $CField[0] =~ s/%0D%0A/ /go; > $CField[0] =~ s/\n/ /go; > $CField[0] =~ s/\r/ /go; > ## ... and so on through all the fields > }
|
Mon, 04 Oct 2004 11:08:41 GMT |
|
 |
Graham Woo #3 / 5
|
 Comma delimited fields with line returns inside a double quote
Quote:
> Hi, > I have a script to rewrite a data file which is comma > delimited (sub openfile). Whenver there is a line return in a field, that > field > is wrapped with the double quote character. > I can't grab that field to write into the new data file properly. > In the pasted"sample.txt" file (bottom), line 2 and 3 are actually one > record. > I've attempted a work around (sub strip_wrap), trying to strip out any line > returns for > any field. This does not remove the line return. > Any suggestions
I think your problem is caused by the fact that you haven't told perl anything about the "text text [newline] text" being different from just normal [newline] so it thinks every newline it finds is the end of a record. I'd suggest "slurp mode". This means reading the entire file into a single scalar variable rather than as a series of lines. You can then substitute the newlines inside double quoted strings and leave the real newlines alone. I've tested this with a very small file and it appeared to work. Hope this helps Graham Wood =================================================================== # open your file open(INPUT,"input.csv") || die "Can't open input.csv $!\n"; # $/ is the Input record separator (see perldoc perlvar for details). If you undef it # perl doesn't split an input file into separate lines undef $/; # ... which means you can slurp the whole file into a single scalar variable, newlines # and all $input=<INPUT>; close INPUT; # You can then replace any newlines in between " characters with # spaces (or anything else you want to) and remove the " characters # This may need repeating for \r if \n doesn't catch everything. # Using [^"] in the pattern to match after the \n means the first " you hit after the \n # will terminate the pattern match. $input=~ s/"(.*)\n([^"]*)"/$1 $2/g; # then you can split $input at the remaining newlines and loop through # them $wum=1; foreach(split /\n/,$input){ # # print record number and fields in $_ delimited by "|" then a newline # print join("|", $wum++ , split(/,/,$_) ), "\n"; Quote: }
|
Mon, 04 Oct 2004 14:02:23 GMT |
|
 |
Mark Jason Domin #4 / 5
|
 Comma delimited fields with line returns inside a double quote
Quote:
>Any suggestions
Other folks answered your question, but I have an unrelated suggestion. Instead of this: Quote: >print LIST >qq!$wum|$CField[0]|$CField[1]|$CField[2]|$CField[3]|$CField[4]|$CField[5]|$C >Field[6]|$CField[7]|$CField[8]|$CField[9]|$CField[10]|$CField[11]|$CField[12 >]|$CField[13]|$CField[14]|$CField[15]|$CField[16]|$CField[17]|$CField[18]|$C >Field[19]|$CField[20]|$CField[21]|$CField[22]|$CField[23]|$CField[24]|$CFiel >d[25]|0\n!;
Consider writing
It will be easier to read and understand, and you won't have to change
I hope this is helpful.
|
Mon, 04 Oct 2004 16:23:09 GMT |
|
 |
Benjamin Goldber #5 / 5
|
 Comma delimited fields with line returns inside a double quote
Quote:
> Hi, > I have a script to rewrite a data file which is comma delimited (sub > openfile).
[snip stuff... want to turn comma seperated into pipe seperated] # This assumes that if a field has either quotes and a newline in it, # any quotes are doubled... that is, if the original was something # like qq[This has a " quote and a \n newline], then the version # in the file will be qq["This has a "" quote and a \n newline"]. my $quotepart = qq["[^"]*(?:""[^"]*)*"]; my $csvregex = qq[(?:(?!")[^,\n]*|$quotepart)]; sub openfile { open( my($csv), "<", "$BaseDir/sample.txt") or die "Couldn't open $BaseDir/sample.txt: $!"; open( my($psv), ">>", "$BaseDir/1_$foo") or die "Couldn't open $BaseDir/1_$foo: $!"; my ($buf, $record, $line) = ("", 1, 1); while( sysread( $csv, $buf, 8192, length $buf ) ) { while( s/^((?:$csvregex,)*$csvregex\n)//, my $x = $1 ) {
/^"/ ? # if quoted, remove quotes if unneeded. (s/^"([^"\n|]*)"$/$1/s && s/""/"/g) : # if not quoted, add quotes if needed. (/\|/ and (s/"/""/g, $_ = qq["$_"]))
++$records; $lines += $x =~ tr/\n//; } next unless $buf =~ /^$quotepart([^"\n,])/; die( "Malformed input, line $line record $record : q[$buf]\n" . "Unexpected char q[$1] at position $-[1].\n" ); } warn "Incomplete last record q[$buf]\n" if length $buf; close $csv; close $psv or die "Couldn't close $BaseDir/1_$foo: $!"; Quote: }
[untested] -- print reverse( ",rekcah", " lreP", " rehtona", " tsuJ" )."\n";
|
Thu, 07 Oct 2004 10:56:31 GMT |
|
|
|