comma delimited 
Author Message
 comma delimited

Does anyone have any canned routines for comma delimited files?
i.e., convert from
"field1","field2",num1,num2,"fieldnn"
to strip *quotes* and *commas*?
And, likewise, to convert tab delimited (no quotes) back to comma
delimited?
-Kim Goldsworthy


Thu, 09 Aug 2001 03:00:00 GMT  
 comma delimited


Quote:
>Does anyone have any canned routines for comma delimited files?
>i.e., convert from
>"field1","field2",num1,num2,"fieldnn"
>to strip *quotes* and *commas*?
>And, likewise, to convert tab delimited (no quotes) back to comma
>delimited?
>-Kim Goldsworthy

While one can do this with awk, I would suggest you use tr to
translate or remove characters.

man tr

For example:

tr -d '"' < infile | tr ',' ' ' > outfile

should delete the double quotes, and change the each comma to a
space.

You can do the same with awk/gawk:

gawk 'BEGIN{FS="," ;OFS=" "}{gsub(/"/,"");$1=$1;print}' infile > outfile

alter this as required.

with sed you would do it like this:

sed -e 's/"//g;s/,/ /g' infile > outfile

See the manual pages for details:

man sed
man awk
man gawk
man tr

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Thu, 09 Aug 2001 03:00:00 GMT  
 comma delimited


Quote:
> Does anyone have any canned routines for comma delimited files?
> i.e., convert from "field1","field2",num1,num2,"fieldnn" to strip
> *quotes* and *commas*?

As Mr. Bewig said, there's several possibilites for quoting/escaping
conventions and such: here's one quick-and-dirty routine I wrote a
while back that parses comma-separated variable files according the
following rules: A comma separates fields except when enclosed betwen
double-quote marks; a double-quote character preceded by a backslash
character is not considered to end a quoted field.

Call the function as arrleng=csvparse($0,array) for example, and
array[1] through array[arrleng] will contain the comma-separated
fields found in $0; the code snippet {if (arrleng) {for (i=1;
i<=arrleng; ++i) {printf "%s\t", array[i]}; printf "\n"}} will then
print the array as in tab-separated form.

#
#CSV Parse function
#
function csvparse(instring,outarr,flag,pos,cur) {
 flag=1;pos=1;cur=1;outarr[cur]=""
 if (length(instring)==0)
   {return 0}
  else {for (i=pos;pos<=length(instring);++pos)
           {if (flag==1)
              {if (substr(instring,pos,1)==",")
                 {++cur;outarr[cur]=""}
                else {if (substr(instring,pos,1)=="\042")
                        {flag=2};
                      outarr[cur]=(outarr[cur] substr(instring,pos,1))}}
             else {if (flag==2)
                     {if (substr(instring,pos,1)=="\042")
                        {flag=1}
                       else {if (substr(instring,pos,1)=="\134")
                               {flag=3}};
                      outarr[cur]=(outarr[cur] substr(instring,pos,1))}
                    else {if (flag==3)
                            {outarr[cur]=(outarr[cur] substr(instring,pos,1));
                             flag=2}}}}};
 return cur}
##EOF

--%!PS-Adobe
10 10 scale/M{rmoveto}def/R{rlineto}def 12 45 moveto 0 5 R 4 -1 M 5.5 0 R
currentpoint 3 sub 3 90 0 arcn 0 -6 R 7.54 10.28 M 2.7067 -9.28 R -5.6333
2 setlinewidth 0 R 9.8867 8 M 7 0 R 0 -9 R -6 4 M 0 -4 R stroke showpage
       % Henry Churchyard      http://uts.cc.utexas.edu/~churchh/



Fri, 10 Aug 2001 03:00:00 GMT  
 comma delimited

Quote:

>Does anyone have any canned routines for comma delimited files?
>i.e., convert from
>"field1","field2",num1,num2,"fieldnn"
>to strip *quotes* and *commas*?
>And, likewise, to convert tab delimited (no quotes) back to comma
>delimited?

Exactly what is the definition of comma delimited files?
For instance, can an unquoted field have an embedded quote?
Or an embedded comma, perhaps between embedded quotes?  Is
my assumption that quoted fields can have embedded commas
correct?  Can a quoted field contain an embedded quote?  Is
the escaping convention to protect the embedded quote with
a backslash?  Or by doubling the quote?  Can a newline be
escaped in any manner, or does it always mark the end of a
record?  Does the last record in the file have to end with
a newline?  Do two successive newlines mark a null record?
Do two successive (unquoted) commas mark a null field?
Are numeric fields always unquoted?  Can they be quoted?
Are string fields always quoted?  Can they be unquoted?
What is the format of a number?  Are minus signs leading or
trailing?  Are exponents allowed?  Marked with an "E" or a
"D" (for "double")?  What is the desired behavior on reading
an improperly-constructed record?  Silently ignore it?
Abort processing the entire file?  Log it to standard error
and continue?  Try to recover at the next comma?

If you define the file format precisely, I'll write the
routines to read and write it.
--



Sat, 11 Aug 2001 03:00:00 GMT  
 comma delimited


Quote:


>> Does anyone have any canned routines for comma delimited files?
>> i.e., convert from
>>"field1","field2",num1,num2,"fieldnn"
>> to strip *quotes* and *commas*? And, likewise, to convert tab
>> delimited (no quotes) back to comma delimited?
>tr -d '"' < infile | tr ',' ' ' > outfile
[...]
>gawk 'BEGIN{FS="," ;OFS=" "}{gsub(/"/,"");$1=$1;print}' infile > outfile
[...]
>sed -e 's/"//g;s/,/ /g' infile > outfile

[...]

What happens if you have to deal with commas (or commata?) and
double quotes that appear within text strings?

\bye
Peter



Sat, 11 Aug 2001 03:00:00 GMT  
 comma delimited

Quote:





>>> Does anyone have any canned routines for comma delimited files?
>>> i.e., convert from
>>>"field1","field2",num1,num2,"fieldnn"
>>> to strip *quotes* and *commas*? And, likewise, to convert tab
>>> delimited (no quotes) back to comma delimited?

>>tr -d '"' < infile | tr ',' ' ' > outfile
>[...]
>>gawk 'BEGIN{FS="," ;OFS=" "}{gsub(/"/,"");$1=$1;print}' infile > outfile
>[...]
>>sed -e 's/"//g;s/,/ /g' infile > outfile
>[...]

>What happens if you have to deal with commas (or commata?) and
>double quotes that appear within text strings?

>\bye
>Peter

then you should do something like this:

sed -e 's/","/ /g;s/^"//;s/"$/' infile > outfile

which will only change "," (qouble quote comma double quote) to space,
and delete leading double quotes at the beginning and end of the
line.  "," should not exist within a field, as I understand the
usage of punctuation.

You can do this in awk/gawk by defining the IFS to be "," (by sprinting
it to a variable, and then assigning it.  Quotes can get tricky
on this one if you try to do it on the command line.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Sat, 11 Aug 2001 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. getting fields NOT comma delimited with commas inside

2. Parsing Comma delimited files in J

3. Convert comma-delimited records to fixed length records

4. matching records in a comma delimited file

5. Comma delimited file problem

6. AWK & Comma Delimited Text

7. Reading Comma delimited, Quoted String records

8. 2.01 Comma delimited ASCII file

9. Import comma delimited text file

10. VW code to read comma-delimited text files??

11. importing from a comma delimited file

12. Export Clarion .DAT files to ASCII comma delimited files

 

 
Powered by phpBB® Forum Software