Rows2Columns reformat with uniq titles
Author Message
Rows2Columns reformat with uniq titles

Hi,

1./ Input
I have input file with following format:
time | xxx | 1234
time | yyy |  3234
time | xyz | 1231
.....

where time is time in format hh:mm ( 10 minutes interval- ..:00, ..:10
etc)
xxx etc. are 3 symbol string and 1234 etc are numbers.

2./ Output
How can I produce output file with format:

TIME,xxx, yyy, xyz ......
00:00,1234,3234,1234,...
00:10,.....
.....
23:50,321,4543,751,...

xxx, yyy, to be unique and sorted alphabeticaly.

3./ What I made for now:
\$1 - time
\$2 - 3 symbol string
\$3 - number
array strings - uniq strings
array numbers - numbers for each time-string pair
num - number of uniq strings ( column titles )

BEGIN { FS="|"; OFS=","; ORS="\r\n"; num=0 }
{
numbers[\$1"-"\$2]=\$3
if ( !strings[\$2]++) { num++; }

Quote:
}

END {
printf "%s,","TIME"
for (i=0; i<num; i++) {
printf "%s,",strings[i]
}
printf "\r\n";
for (h=0; h<24; h++ ) {
for (m=0; m<60; m=m+10) {
time = sprintf("%02d:%02d",h,m)
printf "%s,",time
for (i=0; i<num; i++) {
printf "%s,",numbers[time"-"strings[i]]
}
printf "\r\n";
}
}

Quote:
}

4./ Problem:
- I'm not sure is this the good algorithm
- HOW TO SORT COLUMN NAMES ( This is the main problem!)
so in the above example columns will be xxx,yyy,xyz but I need
xxx,xyz,yyy

Regards,
Sto

Sat, 15 Feb 2003 08:41:11 GMT
Rows2Columns reformat with uniq titles
Hi Stoyan,

I think this program does what you want.

BEGIN { FS="|" }

NF {
gsub(/[ \t]+/,"")
numbers[\$1,\$2]=\$3
#if (!strings[\$2]++) { num++ }
if ( !(\$2 in strings) ) {strings[\$2]++; num++}

Quote:
}

END {
i=0
for (s in strings) { col[++i]=s; indx[i]=i }

do {
swap=0
for (i=1; i<num; i++) {
j=indx[i]
k=indx[i+1]
if (col[j]>col[k]) {indx[i]=k; indx[i+1]=j; swap++}
}
} while (swap)

printf "%5s", "TIME"
for (i=1; i<=num; i++) { printf ",%5s" ,col[indx[i]] }
print ""
for (h=0; h<24; h++ ) {
for (m=0; m<60; m=m+10) {
time = sprintf("%02d:%02d",h,m)
printf "%s",time
for (i=1; i<=num; i++) {
s=col[indx[i]]
printf ", %4s",numbers[time,s]
}
print ""
}
}

Quote:
}

I used this input file:-

00:00 | xxx | 1234
00:10 | yyy | 3234
00:30 | xyz | 1231
00:00 | abc | 1234
00:00 | abc | 1234
00:10 | abc | 2345

I got this output:-

D:\Alan\temp>gawk -f r2c.awk r2c.dat
TIME,  abc,  xxx,  xyz,  yyy
00:00, 1234, 1234,     ,
00:10, 2345,     ,     , 3234
00:20,     ,     ,     ,
00:30,     ,     , 1231,
00:40,     ,     ,     ,
00:50,     ,     ,     ,
01:00,     ,     ,     ,
etc.

awk95 gives the same results as gawk

I created a sorted index pointing to the column titles and then used it
to print both the column titles and the strings in the correct order.
The sorting algorithm is simple and inefficient but I think it is
appropriate unless you have a very large number of columns.

Hope this helps

Quote:
>Hi,

>1./ Input
>I have input file with following format:
>time | xxx | 1234
>time | yyy |  3234
>time | xyz | 1231
>.....

>where time is time in format hh:mm ( 10 minutes interval- ..:00, ..:10
>etc)
>xxx etc. are 3 symbol string and 1234 etc are numbers.

>2./ Output
>How can I produce output file with format:

>TIME,xxx, yyy, xyz ......
>00:00,1234,3234,1234,...
>00:10,.....
>.....
>23:50,321,4543,751,...

>xxx, yyy, to be unique and sorted alphabeticaly.

>3./ What I made for now:
>\$1 - time
>\$2 - 3 symbol string
>\$3 - number
>array strings - uniq strings
>array numbers - numbers for each time-string pair
>num - number of uniq strings ( column titles )

>BEGIN { FS="|"; OFS=","; ORS="\r\n"; num=0 }
>{
>   numbers[\$1"-"\$2]=\$3
>  if ( !strings[\$2]++) { num++; }
>}
>END {
>   printf "%s,","TIME"
>   for (i=0; i<num; i++) {
>      printf "%s,",strings[i]
>   }
>   printf "\r\n";
>   for (h=0; h<24; h++ ) {
>      for (m=0; m<60; m=m+10) {
>         time = sprintf("%02d:%02d",h,m)
>         printf "%s,",time
>         for (i=0; i<num; i++) {
>            printf "%s,",numbers[time"-"strings[i]]
>         }
>         printf "\r\n";
>      }
>   }
>}

>4./ Problem:
>- I'm not sure is this the good algorithm
>- HOW TO SORT COLUMN NAMES ( This is the main problem!)
>  so in the above example columns will be xxx,yyy,xyz but I need
>  xxx,xyz,yyy

>Regards,
>Sto

--
Alan Linton

Wed, 26 Feb 2003 21:25:13 GMT
Rows2Columns reformat with uniq titles
10x Alan,
this is exactly that I want. I tried it and it works like a devil ;0)
I recieve also other suggestion - first to sort input data by
second field (\$2) and that to process them.

Fri, 28 Feb 2003 08:47:00 GMT
Rows2Columns reformat with uniq titles

Quote:
>10x Alan,
>this is exactly that I want. I tried it and it works like a devil ;0)
>I recieve also other suggestion - first to sort input data by
>second field (\$2) and that to process them.

Hi Stoyan,

I am delighted to hear that it worked.

The other suggestion sounds reasonable but you would have to restart
from scratch to implement it.

Here is an improved version of the program which should run a little
faster because I have added a better sorting function. This is based on
the book "The AWK Programming Language" but, instead of sorting an array
I create a sorted index. Using an index makes the program faster because
you don't have to swap whole columns, only column titles. The algorithm
is Quicksort.

BEGIN { FS="|" }

NF {
gsub(/[ \t]+/,"")
numbers[\$1,\$2]=\$3
if (!strings[\$2]++) { num++ }

Quote:
}

END {
i=0
for (s in strings) { col[++i]=s }
qsortindx(col,indx,1,num)

printf "%5s", "TIME"
for (i=1; i<=num; i++) { printf ",%5s" ,col[indx[i]] }
print ""
for (h=0; h<24; h++ ) {
for (m=0; m<60; m=m+10) {
time = sprintf("%02d:%02d",h,m)
printf "%s",time
for (i=1; i<=num; i++) {
s=col[indx[i]]
printf ", %4s",numbers[time,s]
}
print ""
}
}

Quote:
}

function qsortindx(A,indx,left,right,   i) {
for (i=left; i<=right; i++) indx[i] = i
qsortindx2(A, indx, left, right)

Quote:
}

function qsortindx2(A,indx,left,right,   i,last) {
if (left >= right)  # do nothing if array contains
return           # at most one element
swap(indx, left, left + int((right-left+1)*rand()))
last = left
for (i = left+1; i <= right; i++)
if (A[indx[i]] < A[indx[left]])
swap(indx, ++last, i)
swap(indx, left, last)
qsortindx2(A, indx, left, last-1)
qsortindx2(A, indx, last+1, right)

Quote:
}

function swap(B,i,j,   t) {
t = B[i]; B[i] = B[j]; B[j] = t

Quote:
}

Hope this helps
--
Alan Linton

Fri, 28 Feb 2003 17:32:54 GMT
Rows2Columns reformat with uniq titles
Here's an "if it were me" solution..
ditch the colon, so the times can be sorted in order, and
use the symbol field as a second sort key, then pass through
awk, acumulating entries on the row until a new time is found.
In unix, a command to do this would be:
sed "s/://" input_file | sort +0 -1n +1 -2 | \
gawk '\$1==""{oldtime=\$1};\$1!=oldtime {print buffer; \
buffer=substr(\$1,1,2) ":" substr(\$1,3,2);oldtime=\$1}; \
{buffer=buffer "," \$3};END {print buffer}'

now, this makes the assumption you have every symbol for every time.
if this is not the case, strip the symbols out first, sort them
and them pass them (seperately?) to the gawk for processing.
(if you want an example of this, just post back)
(<troll> *grin* I'd use an ARGIND or stdin with marker files here </troll>)

*sigh* back to data management
jen
--

Quote:

> Hi,

> 1./ Input
> I have input file with following format:
> time | xxx | 1234
> time | yyy |  3234
> time | xyz | 1231
> .....

> where time is time in format hh:mm ( 10 minutes interval- ..:00, ..:10
> etc)
> xxx etc. are 3 symbol string and 1234 etc are numbers.

> 2./ Output
> How can I produce output file with format:

> TIME,xxx, yyy, xyz ......
> 00:00,1234,3234,1234,...
> 00:10,.....
> .....
> 23:50,321,4543,751,...

> xxx, yyy, to be unique and sorted alphabeticaly.

> 3./ What I made for now:
> \$1 - time
> \$2 - 3 symbol string
> \$3 - number
> array strings - uniq strings
> array numbers - numbers for each time-string pair
> num - number of uniq strings ( column titles )

> BEGIN { FS="|"; OFS=","; ORS="\r\n"; num=0 }
> {
>    numbers[\$1"-"\$2]=\$3
>   if ( !strings[\$2]++) { num++; }
> }
> END {
>    printf "%s,","TIME"
>    for (i=0; i<num; i++) {
>       printf "%s,",strings[i]
>    }
>    printf "\r\n";
>    for (h=0; h<24; h++ ) {
>       for (m=0; m<60; m=m+10) {
>          time = sprintf("%02d:%02d",h,m)
>          printf "%s,",time
>          for (i=0; i<num; i++) {
>             printf "%s,",numbers[time"-"strings[i]]
>          }
>          printf "\r\n";
>       }
>    }
> }

> 4./ Problem:
> - I'm not sure is this the good algorithm
> - HOW TO SORT COLUMN NAMES ( This is the main problem!)
>   so in the above example columns will be xxx,yyy,xyz but I need
>   xxx,xyz,yyy

> Regards,
> Sto

Sat, 01 Mar 2003 08:08:44 GMT
Rows2Columns reformat with uniq titles
Maybe I forgot to see that input file IS ALREADY SORTED by time.
So no need to sort it. The probem is in output file column titles (\$2)
to be uniq and sorted.
Quote:

> Here's an "if it were me" solution..
> ditch the colon, so the times can be sorted in order, and
> use the symbol field as a second sort key, then pass through
> awk, acumulating entries on the row until a new time is found.
> In unix, a command to do this would be:
> sed "s/://" input_file | sort +0 -1n +1 -2 | \
> gawk '\$1==""{oldtime=\$1};\$1!=oldtime {print buffer; \
> buffer=substr(\$1,1,2) ":" substr(\$1,3,2);oldtime=\$1}; \
> {buffer=buffer "," \$3};END {print buffer}'

> now, this makes the assumption you have every symbol for every time.
> if this is not the case, strip the symbols out first, sort them
> and them pass them (seperately?) to the gawk for processing.
> (if you want an example of this, just post back)
> (<troll> *grin* I'd use an ARGIND or stdin with marker files here </troll>)

> *sigh* back to data management
> jen
> --

> > Hi,

> > 1./ Input
> > I have input file with following format:
> > time | xxx | 1234
> > time | yyy |  3234
> > time | xyz | 1231
> > .....

> > where time is time in format hh:mm ( 10 minutes interval- ..:00, ..:10
> > etc)
> > xxx etc. are 3 symbol string and 1234 etc are numbers.

> > 2./ Output
> > How can I produce output file with format:

> > TIME,xxx, yyy, xyz ......
> > 00:00,1234,3234,1234,...
> > 00:10,.....
> > .....
> > 23:50,321,4543,751,...

> > xxx, yyy, to be unique and sorted alphabeticaly.

> > 3./ What I made for now:
> > \$1 - time
> > \$2 - 3 symbol string
> > \$3 - number
> > array strings - uniq strings
> > array numbers - numbers for each time-string pair
> > num - number of uniq strings ( column titles )

> > BEGIN { FS="|"; OFS=","; ORS="\r\n"; num=0 }
> > {
> >    numbers[\$1"-"\$2]=\$3
> >   if ( !strings[\$2]++) { num++; }
> > }
> > END {
> >    printf "%s,","TIME"
> >    for (i=0; i<num; i++) {
> >       printf "%s,",strings[i]
> >    }
> >    printf "\r\n";
> >    for (h=0; h<24; h++ ) {
> >       for (m=0; m<60; m=m+10) {
> >          time = sprintf("%02d:%02d",h,m)
> >          printf "%s,",time
> >          for (i=0; i<num; i++) {
> >             printf "%s,",numbers[time"-"strings[i]]
> >          }
> >          printf "\r\n";
> >       }
> >    }
> > }

> > 4./ Problem:
> > - I'm not sure is this the good algorithm
> > - HOW TO SORT COLUMN NAMES ( This is the main problem!)
> >   so in the above example columns will be xxx,yyy,xyz but I need
> >   xxx,xyz,yyy

> > Regards,
> > Sto

Sat, 01 Mar 2003 11:40:15 GMT
Rows2Columns reformat with uniq titles
10x again. Your script work properly again ;0) I'm happy.
One beer from me ;0)
ww
C|    |
^^^^

Best Regards,
Sto

Quote:

> >10x Alan,
> >this is exactly that I want. I tried it and it works like a devil ;0)
> >I recieve also other suggestion - first to sort input data by
> >second field (\$2) and that to process them.

> Hi Stoyan,

> I am delighted to hear that it worked.

> The other suggestion sounds reasonable but you would have to restart
> from scratch to implement it.

> Here is an improved version of the program which should run a little
> faster because I have added a better sorting function. This is based on
> the book "The AWK Programming Language" but, instead of sorting an array
> I create a sorted index. Using an index makes the program faster because
> you don't have to swap whole columns, only column titles. The algorithm
> is Quicksort.

Sat, 01 Mar 2003 11:43:15 GMT
Rows2Columns reformat with uniq titles
*nod*  however, you have to preserve the time sorting when you
(also) sort by the symbol.  I realised I hadn't printed
a title row as you originally asked for - just add a chunk
in to accumulate the symbols during the load of the first output
line, then print that before dumping buffer. (code should be obvious)
Do you have every symbol on every time?

Glad to see you have solved the problem either way (always
lots of ways to do this stuff!)
Jen
--

Quote:

> Maybe I forgot to see that input file IS ALREADY SORTED by time.
> So no need to sort it. The probem is in output file column titles (\$2)
> to be uniq and sorted.

> > Here's an "if it were me" solution..
> > ditch the colon, so the times can be sorted in order, and
> > use the symbol field as a second sort key, then pass through
> > awk, acumulating entries on the row until a new time is found.
> > In unix, a command to do this would be:
> > sed "s/://" input_file | sort +0 -1n +1 -2 | \
> > gawk '\$1==""{oldtime=\$1};\$1!=oldtime {print buffer; \
> > buffer=substr(\$1,1,2) ":" substr(\$1,3,2);oldtime=\$1}; \
> > {buffer=buffer "," \$3};END {print buffer}'

> > now, this makes the assumption you have every symbol for every time.
> > if this is not the case, strip the symbols out first, sort them
> > and them pass them (seperately?) to the gawk for processing.
> > (if you want an example of this, just post back)
> > (<troll> *grin* I'd use an ARGIND or stdin with marker files here </troll>)

> > *sigh* back to data management
> > jen
> > --

> > > Hi,

> > > 1./ Input
> > > I have input file with following format:
> > > time | xxx | 1234
> > > time | yyy |  3234
> > > time | xyz | 1231
> > > .....

> > > where time is time in format hh:mm ( 10 minutes interval- ..:00, ..:10
> > > etc)
> > > xxx etc. are 3 symbol string and 1234 etc are numbers.

> > > 2./ Output
> > > How can I produce output file with format:

> > > TIME,xxx, yyy, xyz ......
> > > 00:00,1234,3234,1234,...
> > > 00:10,.....
> > > .....
> > > 23:50,321,4543,751,...

> > > xxx, yyy, to be unique and sorted alphabeticaly.

> > > 3./ What I made for now:
> > > \$1 - time
> > > \$2 - 3 symbol string
> > > \$3 - number
> > > array strings - uniq strings
> > > array numbers - numbers for each time-string pair
> > > num - number of uniq strings ( column titles )

> > > BEGIN { FS="|"; OFS=","; ORS="\r\n"; num=0 }
> > > {
> > >    numbers[\$1"-"\$2]=\$3
> > >   if ( !strings[\$2]++) { num++; }
> > > }
> > > END {
> > >    printf "%s,","TIME"
> > >    for (i=0; i<num; i++) {
> > >       printf "%s,",strings[i]
> > >    }
> > >    printf "\r\n";
> > >    for (h=0; h<24; h++ ) {
> > >       for (m=0; m<60; m=m+10) {
> > >          time = sprintf("%02d:%02d",h,m)
> > >          printf "%s,",time
> > >          for (i=0; i<num; i++) {
> > >             printf "%s,",numbers[time"-"strings[i]]
> > >          }
> > >          printf "\r\n";
> > >       }
> > >    }
> > > }

> > > 4./ Problem:
> > > - I'm not sure is this the good algorithm
> > > - HOW TO SORT COLUMN NAMES ( This is the main problem!)
> > >   so in the above example columns will be xxx,yyy,xyz but I need
> > >   xxx,xyz,yyy