
Parsing Comma delimited files in J
Greetings,
There have been a number of messages in this group lately
about reading comma delimited *.csv files in J. This is something
I've been doing for years with the following verb. It works
quite well for small and modestly sized *.csv files. Large files will
have to be processed in chunks.
NB. J script begins *********************************************
CR=: 13 { a.
NB. assertion error with message
assertmsg=: [ 13!:8 ([: 0&e. ]) $ 12"_
NB. parse comma delimited *.csv files the y. argument
NB. is a comma delimited character list. The x. argument
NB. specifies alternate delimiters. Assumes LF , CRLF or LFCR
NB. delimited lines
parsecsv=: 3 : 0
',' parsecsv y.
:
'separater cannot be the " character' assertmsg -. x. -: '"'
NB. CRLF delimited *.csv text to char table
y. =. x. ,. ];._2 y. -. CR
NB. bit mask of unquoted " field delimiters
b =. -. }. ~:/\ '"' e.~ ' ' , , y.
b =. ($y.) $ b *. , x. = y.
NB. use masks to cut lines
b <;._1"1 y.
)
NB. J script ends ***********************************************
To use this verb do the following:
NB. read comma delimited text file and parse
x =. parsecsv 1!:1 <'c:\talks\rasc\mess2.csv'
NB. corner elements of parsed file
10 5 {. x
+-------------+--------+--------+--------+--------+
|<SEEN> |OBJECT_I|OBJ_ALT_|CONS_ABB|OBJ_TYPE|
+-------------+--------+--------+--------+--------+
|3/27/95 21:25|M1 |NGC1952 |Tau |PN | <-- wrong supernova
+-------------+--------+--------+--------+--------+ remnant, never
|5/25/96 1:30 |M10 |NGC6254 |Oph |GC | noticed this before
+-------------+--------+--------+--------+--------+
| |M100 |NGC4321 |Com |SG |
+-------------+--------+--------+--------+--------+
|7/21/96 0:30 |M101 |NGC5457 |UMa |SG |
+-------------+--------+--------+--------+--------+
|7/21/96 1:00 |M102 |NGC5457 |UMa |SG |
+-------------+--------+--------+--------+--------+
|9/20/96 0:30 |M103 |NGC581 |Cas |OC |
+-------------+--------+--------+--------+--------+
|3/25/95 23:40|M104 |NGC4594 |Vir |SG |
+-------------+--------+--------+--------+--------+
| |M105 |NGC3379 |Leo |EG |
+-------------+--------+--------+--------+--------+
| |M106 |NGC4258 |CVn |SG |
+-------------+--------+--------+--------+--------+
NB. top of raw data looks like:
<SEEN>,OBJECT_I,OBJ_ALT_,CONS_ABB,OBJ_TYPE,OBJ_RA,OBJ_DEC,OBJ_MAGV,OBJ_SIZE,
OBJ_BURN,OBJ_COMM,SITE,OPTICS
3/27/95 21:25,M1,NGC1952,Tau,PN,534.5,22.01,8.2,6x4,!!,SNR (1054) - Crab
Nebula,Home,125a/18e telescope
5/25/96 1:30,M10,NGC6254,Oph,GC,1657.1,-4.06,6.6,8.2,!,VII,Home,7*50 binoculars
,M100,NGC4321,Com,SG,1222.9,15.49,9.4,5.3x4.5,!,Sc - fine spiral,,
....
Hope this helps.
------------------------------------------------------------------------
"Natural selection: the ultimate focus group!
------------------------------------------------------------------------
John D. Baker