bug in gawk 3.1.1? 
Author Message
 bug in gawk 3.1.1?

hello all,

I'm using the following script

BEGIN {RS="ti1\n(dwv,)?"; s=0; i=0}
{if ($1 != "") s = $1; print ++i, s}

to extract values from a file of the form

ti1
dwv,98.22
ti1
dwv,103.08
ti1
ti1
dwv,196.25
ti1
dwv,210.62
ti1
dwv,223.53

The desired result for this example looks like

1 0
2 98.22
3 103.08
4 103.08
5 196.25
6 210.62
7 223.53

The script work fine the most time, but when run on the attached file
(sorry for the size, but the error would not appear with less data) I
get some (three with the attached file) lines that look like

1262 dwv,212.97
1277 dwv,174.33
1279 dwv,151.79

I can't think of a other reason for this than a bug in gawk!

I'm running gawk 3.1.1 on winnt 4.0

best regards
     Lorenz



Sun, 13 Feb 2005 16:34:50 GMT  
 bug in gawk 3.1.1?
ups, forgot the file 8-)

<<<<< begin input file >>>>>
ti1
dwv,214.59
ti1
dwv,230.31
ti1
dwv,242.64
ti1
dwv,253.94
ti1
dwv,264.33
ti1
dwv,270.94
ti1
dwv,273.52
ti1
dwv,270.08
ti1
dwv,263.19
ti1
dwv,254.45
ti1
dwv,244.91
ti1
dwv,234.55
ti1
dwv,222.49
ti1
dwv,209.94
ti1
dwv,197.17
ti1
dwv,182.89
ti1
dwv,169.76
ti1
dwv,158.59
ti1
dwv,145.37
ti1
dwv,135.46
ti1
dwv,124.77
ti1
dwv,115.98
ti1
dwv,108.77
ti1
dwv,101.12
ti1
dwv,94.45
ti1
dwv,89.08
ti1
dwv,84.63
ti1
dwv,81.05
ti1
dwv,78.93
ti1
dwv,76.65
ti1
dwv,75.59
ti1
ti1
ti1
dwv,77.47
ti1
dwv,80.17
ti1
dwv,83.90
ti1
dwv,88.56
ti1
dwv,95.69
ti1
dwv,97.48
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,203.08
ti1
dwv,218.22
ti1
dwv,229.37
ti1
dwv,238.49
ti1
dwv,247.43
ti1
dwv,255.22
ti1
dwv,261.31
ti1
dwv,262.36
ti1
dwv,260.66
ti1
dwv,256.33
ti1
dwv,249.34
ti1
dwv,240.03
ti1
dwv,228.55
ti1
dwv,215.42
ti1
dwv,203.37
ti1
dwv,190.01
ti1
dwv,177.81
ti1
dwv,165.44
ti1
dwv,152.92
ti1
dwv,142.03
ti1
dwv,132.91
ti1
dwv,124.48
ti1
dwv,116.45
ti1
dwv,109.06
ti1
dwv,103.27
ti1
dwv,98.87
ti1
dwv,94.95
ti1
dwv,92.56
ti1
dwv,90.47
ti1
dwv,89.48
ti1
ti1
dwv,90.53
ti1
dwv,93.07
ti1
dwv,97.12
ti1
dwv,101.82
ti1
dwv,108.18
ti1
dwv,109.73
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,202.97
ti1
dwv,217.38
ti1
dwv,231.73
ti1
dwv,243.11
ti1
dwv,255.37
ti1
dwv,264.12
ti1
dwv,269.64
ti1
dwv,270.98
ti1
dwv,269.65
ti1
dwv,264.55
ti1
dwv,257.16
ti1
dwv,246.01
ti1
dwv,232.88
ti1
dwv,219.85
ti1
dwv,208.79
ti1
dwv,197.00
ti1
dwv,183.93
ti1
dwv,172.00
ti1
dwv,160.55
ti1
dwv,150.59
ti1
dwv,141.47
ti1
dwv,133.02
ti1
dwv,126.21
ti1
dwv,120.64
ti1
dwv,115.79
ti1
dwv,111.62
ti1
dwv,108.41
ti1
dwv,106.41
ti1
ti1
ti1
dwv,109.08
ti1
dwv,113.23
ti1
dwv,118.57
ti1
dwv,122.57
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,208.07
ti1
dwv,224.14
ti1
dwv,236.28
ti1
dwv,248.12
ti1
dwv,258.97
ti1
dwv,267.74
ti1
dwv,272.47
ti1
dwv,271.52
ti1
dwv,266.80
ti1
dwv,258.61
ti1
dwv,249.30
ti1
dwv,239.19
ti1
dwv,228.28
ti1
dwv,215.79
ti1
dwv,203.86
ti1
dwv,190.08
ti1
dwv,177.40
ti1
dwv,163.81
ti1
dwv,152.60
ti1
dwv,141.33
ti1
dwv,130.98
ti1
dwv,121.98
ti1
dwv,114.08
ti1
dwv,106.61
ti1
dwv,99.75
ti1
dwv,93.10
ti1
dwv,86.57
ti1
dwv,80.62
ti1
dwv,76.05
ti1
dwv,71.52
ti1
dwv,68.85
ti1
dwv,67.46
ti1
dwv,66.86
ti1
dwv,67.51
ti1
dwv,69.75
ti1
dwv,72.85
ti1
dwv,76.23
ti1
dwv,82.85
ti1
dwv,89.33
ti1
dwv,93.39
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,188.37
ti1
dwv,204.42
ti1
dwv,217.16
ti1
dwv,228.89
ti1
dwv,238.83
ti1
dwv,247.70
ti1
dwv,253.59
ti1
dwv,257.17
ti1
ti1
dwv,254.00
ti1
dwv,248.24
ti1
dwv,240.14
ti1
dwv,229.42
ti1
dwv,218.97
ti1
dwv,205.09
ti1
dwv,192.61
ti1
dwv,179.74
ti1
dwv,166.76
ti1
dwv,155.36
ti1
dwv,143.58
ti1
dwv,131.40
ti1
dwv,121.84
ti1
dwv,112.46
ti1
dwv,105.41
ti1
dwv,97.15
ti1
dwv,90.09
ti1
dwv,84.79
ti1
dwv,80.52
ti1
dwv,75.58
ti1
dwv,72.59
ti1
dwv,69.39
ti1
dwv,67.51
ti1
dwv,66.42
ti1
ti1
ti1
dwv,67.82
ti1
dwv,69.76
ti1
dwv,73.19
ti1
dwv,77.35
ti1
dwv,82.36
ti1
dwv,87.82
ti1
dwv,93.30
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,190.58
ti1
dwv,203.43
ti1
dwv,216.48
ti1
dwv,228.89
ti1
dwv,241.91
ti1
dwv,251.60
ti1
dwv,257.78
ti1
dwv,262.18
ti1
dwv,263.13
ti1
dwv,260.91
ti1
dwv,255.34
ti1
dwv,247.17
ti1
dwv,236.85
ti1
dwv,225.24
ti1
dwv,213.39
ti1
dwv,201.46
ti1
dwv,187.77
ti1
dwv,175.31
ti1
dwv,162.95
ti1
dwv,152.55
ti1
dwv,142.56
ti1
dwv,132.94
ti1
dwv,125.00
ti1
dwv,117.69
ti1
dwv,110.96
ti1
dwv,105.02
ti1
dwv,101.78
ti1
dwv,98.48
ti1
dwv,97.06
ti1
dwv,96.50
ti1
ti1
dwv,98.48
ti1
dwv,101.18
ti1
dwv,104.56
ti1
dwv,109.67
ti1
dwv,115.86
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,200.02
ti1
dwv,217.51
ti1
dwv,233.21
ti1
dwv,245.72
ti1
dwv,258.21
ti1
dwv,267.24
ti1
dwv,273.79
ti1
dwv,273.20
ti1
dwv,270.38
ti1
dwv,260.76
ti1
dwv,250.05
ti1
dwv,241.32
ti1
dwv,231.14
ti1
dwv,219.83
ti1
dwv,206.13
ti1
dwv,193.24
ti1
dwv,180.73
ti1
dwv,167.82
ti1
dwv,156.94
ti1
dwv,144.13
ti1
dwv,134.40
ti1
dwv,125.23
ti1
dwv,116.13
ti1
dwv,107.34
ti1
dwv,99.71
ti1
dwv,94.11
ti1
dwv,88.91
ti1
dwv,84.51
ti1
dwv,81.50
ti1
dwv,78.66
ti1
dwv,76.57
ti1
dwv,75.82
ti1
ti1
dwv,76.88
ti1
dwv,79.03
ti1
dwv,82.12
ti1
dwv,85.73
ti1
dwv,91.05
ti1
dwv,96.31
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,184.18
ti1
dwv,201.10
ti1
dwv,214.21
ti1
dwv,226.18
ti1
dwv,237.72
ti1
dwv,247.57
ti1
dwv,254.36
ti1
dwv,258.34
ti1
dwv,259.80
ti1
dwv,257.76
ti1
dwv,253.17
ti1
dwv,246.51
ti1
dwv,237.92
ti1
dwv,227.09
ti1
dwv,214.13
ti1
dwv,202.20
ti1
dwv,189.21
ti1
dwv,177.65
ti1
dwv,166.18
ti1
dwv,154.03
ti1
dwv,142.21
ti1
dwv,131.51
ti1
dwv,121.28
ti1
dwv,111.80
ti1
dwv,104.47
ti1
dwv,98.80
ti1
dwv,94.76
ti1
dwv,91.81
ti1
dwv,89.17
ti1
dwv,88.00
ti1
ti1
ti1
dwv,89.20
ti1
dwv,91.17
ti1
dwv,94.35
ti1
dwv,99.00
ti1
dwv,105.43
ti1
dwv,109.34
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,194.84
ti1
dwv,212.05
ti1
dwv,226.30
ti1
dwv,239.03
ti1
dwv,250.94
ti1
dwv,259.73
ti1
dwv,266.64
ti1
dwv,269.67
ti1
dwv,269.03
ti1
dwv,265.03
ti1
dwv,258.23
ti1
dwv,249.32
ti1
dwv,238.03
ti1
dwv,226.20
ti1
dwv,213.46
ti1
dwv,200.53
ti1
dwv,187.65
ti1
dwv,174.89
ti1
dwv,163.22
ti1
dwv,152.47
ti1
dwv,142.65
ti1
dwv,133.97
ti1
dwv,126.59
ti1
dwv,120.52
ti1
dwv,115.57
ti1
dwv,111.49
ti1
dwv,108.03
ti1
dwv,106.01
ti1
dwv,105.28
ti1
ti1
dwv,106.91
ti1
dwv,109.73
ti1
dwv,114.91
ti1
dwv,120.66
ti1
dwv,123.74
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,211.57
ti1
dwv,227.06
ti1
dwv,240.57
ti1
dwv,252.26
ti1
dwv,262.67
ti1
dwv,270.66
ti1
dwv,273.40
ti1
dwv,270.25
ti1
dwv,263.76
ti1
dwv,256.03
ti1
dwv,246.87
ti1
dwv,237.10
ti1
dwv,225.11
ti1
dwv,211.53
ti1
dwv,197.77
ti1
dwv,185.75
ti1
dwv,173.00
ti1
dwv,159.31
ti1
dwv,147.18
ti1
dwv,134.84
ti1
dwv,125.07
ti1
dwv,115.82
ti1
dwv,107.33
ti1
dwv,100.07
ti1
dwv,93.55
ti1
dwv,87.60
ti1
dwv,81.75
ti1
dwv,77.03
ti1
dwv,73.39
ti1
dwv,70.97
ti1
dwv,67.94
ti1
dwv,66.64
ti1
dwv,65.80
ti1
ti1
dwv,66.85
ti1
dwv,68.78
ti1
dwv,71.47
ti1
dwv,74.20
ti1
dwv,78.68
ti1
dwv,85.08
ti1
dwv,87.47
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,185.81
ti1
dwv,201.66
ti1
dwv,215.07
ti1
dwv,227.26
ti1
dwv,238.00
ti1
dwv,247.64
ti1
dwv,255.03
ti1
dwv,257.12
ti1
dwv,256.17
ti1
dwv,253.47
ti1
dwv,248.93
ti1
dwv,241.39
ti1
dwv,231.45
ti1
dwv,220.49
ti1
dwv,208.48
ti1
dwv,196.04
ti1
dwv,182.73
ti1
dwv,169.40
ti1
dwv,157.24
ti1
dwv,145.56
ti1
dwv,133.54
ti1
dwv,124.01
ti1
dwv,114.55
ti1
dwv,105.75
ti1
dwv,98.32
ti1
dwv,91.91
ti1
dwv,86.08
ti1
dwv,81.35
ti1
dwv,77.78
ti1
dwv,73.85
ti1
dwv,71.12
ti1
dwv,68.53
ti1
dwv,67.09
ti1
dwv,66.34
ti1
ti1
ti1
dwv,67.92
ti1
dwv,70.08
ti1
dwv,73.78
ti1
dwv,78.68
ti1
dwv,84.33
ti1
dwv,90.25
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,183.19
ti1
dwv,198.96
ti1
dwv,213.55
ti1
dwv,225.91
ti1
dwv,237.33
ti1
dwv,246.66
ti1
dwv,254.98
ti1
dwv,259.46
ti1
dwv,261.02
ti1
dwv,260.02
ti1
dwv,255.96
ti1
dwv,249.08
ti1
dwv,240.15
ti1
dwv,229.51
ti1
dwv,217.21
ti1
dwv,205.06
ti1
dwv,192.62
ti1
dwv,177.43
ti1
dwv,165.06
ti1
dwv,152.36
ti1
dwv,142.35
ti1
dwv,134.58
ti1
dwv,126.20
ti1
dwv,119.86
ti1
dwv,113.67
ti1
dwv,108.20
ti1
dwv,104.71
ti1
dwv,100.83
ti1
dwv,98.96
ti1
dwv,98.38
ti1
ti1
dwv,100.29
ti1
dwv,103.47
ti1
dwv,107.73
ti1
dwv,113.04
ti1
dwv,118.90
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,200.70
ti1
dwv,217.77
ti1
dwv,232.45
ti1
dwv,244.28
ti1
dwv,256.61
ti1
dwv,265.93
ti1
dwv,272.01
ti1
ti1
dwv,268.67
ti1
dwv,260.53
ti1
dwv,252.09
ti1
dwv,243.72
ti1
dwv,232.68
ti1
dwv,220.49
ti1
dwv,206.88
ti1
dwv,193.98
ti1
dwv,181.33
ti1
dwv,168.61
ti1
dwv,156.50
ti1
dwv,145.54
ti1
dwv,136.43
ti1
dwv,125.95
ti1
dwv,117.26
ti1
dwv,109.81
ti1
dwv,103.36
ti1
dwv,97.15
ti1
dwv,92.86
ti1
dwv,89.02
ti1
dwv,86.39
ti1
dwv,84.44
ti1
dwv,83.72
ti1
ti1
dwv,85.89
ti1
dwv,88.39
ti1
dwv,92.31
ti1
dwv,96.00
ti1
dwv,100.24
ti1
dwv,102.55
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,197.15
ti1
dwv,212.33
ti1
dwv,225.48
ti1
dwv,236.75
ti1
dwv,247.52
ti1
dwv,255.21
ti1
dwv,260.74
ti1
dwv,262.81
ti1
dwv,262.09
ti1
dwv,258.08
ti1
dwv,250.83
ti1
dwv,242.39
ti1
dwv,230.86
ti1
dwv,219.62
ti1
dwv,206.80
ti1
dwv,193.41
ti1
dwv,180.60
ti1
dwv,168.20
ti1
dwv,157.77
ti1
dwv,145.25
ti1
dwv,136.23
ti1
dwv,126.28
ti1
dwv,118.10
ti1
dwv,110.74
ti1
dwv,105.29
ti1
dwv,99.75
ti1
dwv,95.61
ti1
dwv,92.16
ti1
dwv,89.93
ti1
dwv,88.45
ti1
ti1
dwv,89.11
ti1
dwv,90.87
ti1
dwv,94.21
ti1
dwv,98.16
ti1
dwv,104.24
ti1
dwv,108.98
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,195.81
ti1
dwv,211.95
ti1
dwv,226.36
ti1
dwv,240.61
ti1
dwv,251.93
ti1
dwv,259.13
ti1
dwv,265.36
ti1
dwv,269.71
ti1
dwv,270.27
ti1
dwv,266.44
ti1
dwv,260.44
ti1
dwv,251.94
ti1
dwv,240.79
ti1
dwv,228.95
ti1
dwv,215.90
ti1
dwv,203.25
ti1
dwv,190.01
ti1
dwv,177.92
ti1
dwv,165.60
ti1
dwv,155.49
ti1
dwv,144.32
ti1
dwv,136.07
ti1
dwv,127.64
ti1
dwv,120.28
ti1
dwv,114.57
ti1
dwv,109.99
ti1
dwv,106.36
ti1
dwv,104.49
ti1
dwv,103.84
ti1
ti1
dwv,105.77
ti1
dwv,107.83
ti1
dwv,112.41
ti1
dwv,117.71
ti1
dwv,120.66
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,210.13
ti1
dwv,225.03
ti1
dwv,238.76
ti1
dwv,250.50
ti1
dwv,261.18
ti1
dwv,269.22
ti1
dwv,274.18
ti1
dwv,272.28
ti1
dwv,266.66
ti1
dwv,258.68
ti1
dwv,249.88
ti1
dwv,239.94
ti1
dwv,227.14
ti1
dwv,213.22
ti1
dwv,197.58
ti1
dwv,184.86
ti1
dwv,172.52
ti1
dwv,160.19
ti1
dwv,149.20
ti1
dwv,137.49
ti1
dwv,127.86
ti1
dwv,118.91
ti1
dwv,110.49
ti1
dwv,102.73
ti1
dwv,96.33
ti1
dwv,90.95
ti1
dwv,86.38
ti1
dwv,82.72
ti1
dwv,79.65
ti1
dwv,77.61
ti1
dwv,75.70
ti1
dwv,74.49
ti1
ti1
ti1
dwv,76.48
ti1
dwv,79.14
ti1
dwv,82.84
ti1
dwv,87.86
ti1
dwv,92.69
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,181.33
ti1
dwv,196.86
ti1
dwv,210.98
ti1
dwv,223.13
ti1
dwv,233.90
ti1
dwv,243.19
ti1
dwv,250.57
ti1
dwv,255.00
ti1
dwv,256.36
ti1
dwv,255.16
ti1
dwv,250.15
ti1
dwv,242.80
ti1
dwv,233.82
ti1
dwv,223.41
ti1
dwv,210.81
ti1
dwv,199.04
ti1
dwv,187.39
ti1
dwv,174.14
ti1
dwv,162.62
ti1
dwv,151.39
ti1
dwv,139.59
ti1
dwv,128.71
ti1
dwv,119.17
ti1
dwv,111.02
ti1
dwv,103.91
ti1
dwv,96.78
ti1
dwv,91.56
ti1
dwv,87.13
ti1
dwv,83.22
ti1
dwv,80.50
ti1
dwv,76.83
ti1
dwv,74.24
ti1
dwv,70.80
ti1
dwv,69.04
ti1
dwv,67.12
ti1
dwv,66.51
ti1
ti1
dwv,68.77
ti1
dwv,72.19
ti1
dwv,77.34
ti1
dwv,84.04
ti1
dwv,88.55
ti1
ti1
ti1
ti1
ti1
ti1
ti1
ti1
dwv,190.51
ti1
dwv,207.14
ti1
dwv,220.57
ti1
dwv,232.51
ti1
dwv,243.35
ti1
dwv,252.41
ti1
dwv,258.62
ti1
dwv,262.18
ti1
ti1
dwv,259.10
ti1
dwv,253.54
ti1
dwv,245.44
ti1 ...

read more »



Sun, 13 Feb 2005 16:54:30 GMT  
 bug in gawk 3.1.1?

Quote:
> hello all,

> I'm using the following script

> BEGIN {RS="ti1\n(dwv,)?"; s=0; i=0}
> {if ($1 != "") s = $1; print ++i, s}

> to extract values from a file of the form

> ti1
> dwv,98.22
> ti1
> dwv,103.08
> ti1
> ti1
> dwv,196.25
> ti1
> dwv,210.62
> ti1
> dwv,223.53

> The desired result for this example looks like

> 1 0
> 2 98.22
> 3 103.08
> 4 103.08
> 5 196.25
> 6 210.62
> 7 223.53

> The script work fine the most time, but when run on the attached file
> (sorry for the size, but the error would not appear with less data) I
> get some (three with the attached file) lines that look like

> 1262 dwv,212.97
> 1277 dwv,174.33
> 1279 dwv,151.79

> I can't think of a other reason for this than a bug in gawk!

> I'm running gawk 3.1.1 on winnt 4.0

> best regards
>      Lorenz

I am running gawk 3.1.0 on Win98.  I see similar anomolies, but in different
places--mostly in chains where "(dwv,)?" doesn't occur.  I thought it was a
problem with greed past newlines, so I changed all the "\n" to "a" (one big
line).

Since the field parser no longer has an "\n" to eat, but rather, an "a", I
get a whole bunch of records with a trailing "a", as I would expect.  But, I
see similar anomolies in this case too, in dfferent places than Lorenzo;
e.g.
    1109 88.34a
    1110 92.06a
    1111 98.22a
    1112 103.08a
    1113 103.08a
    1114 103.08a
    1115 103.08a
    1116 103.08a
    1117 103.08a
    1118 196.25a
    1119 210.62a
    1120 dwv,223.53a
    1121 235.01a
    1122 245.30a
There are far fewer of these when a newline isn't involved.  And, without
the newline, there don't seem to be problems with fulfillment of the
optional clause after a chain of non-fulfillments.

I don't see any coding anomolies (other than the unnecessary initialization
of "i"), or any irregularities in the data that would account for this.  So
maybe it really is a bug?

    - Dan



Sun, 13 Feb 2005 22:43:51 GMT  
 bug in gawk 3.1.1?

Quote:


> [original post and answer reporting similar results sniped]

in the meantime I experimented a little bit, adding + deleting lines
at the begin of the file.
In most cases this shifted the errors by the numbers of
inserted/deleted lines, but not always.
I have not found a rule yet 8-(

best regards
     Lorenz



Sun, 13 Feb 2005 23:29:53 GMT  
 bug in gawk 3.1.1?

Quote:


> > [original post and answer reporting similar results sniped]

> in the meantime I experimented a little bit, adding + deleting lines
> at the begin of the file.
> In most cases this shifted the errors by the numbers of
> inserted/deleted lines, but not always.
> I have not found a rule yet 8-(

Lorenz,

The problem with posting your data files (although they are text)
embedded in the posting is that mail programs and news readers may
reformat the data, including changing end of line characters, etc..  If
you would like to email me the source file as an attachment I'll take a
look at it and see if I can find where the problem lies.

Have you tried moving the data to a UNIX box by ftp and running the same
code there?  There could be a problem with the Windows port of gawk.

I have also (many years ago) experienced a problem on multi-user CP/M
systems where the C compiler's read routines got confused where a CRLF
crossed a sector boundary on disk.  Something similar may be happening
here.

Peter
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"
peter{dot}tillier<at>btinternet[dot]com
To reply direct to me please use the above address
not the "Reply To" which activates a spam trap.



Wed, 16 Feb 2005 04:54:42 GMT  
 bug in gawk 3.1.1?


...

Quote:
>Lorenz,

>The problem with posting your data files (although they are text)
>embedded in the posting is that mail programs and news readers may
>reformat the data, including changing end of line characters, etc..  If
>you would like to email me the source file as an attachment I'll take a
>look at it and see if I can find where the problem lies.

>Have you tried moving the data to a UNIX box by ftp and running the same
>code there?  There could be a problem with the Windows port of gawk.

I think the problem is real.

I was able to take his file from the NG post and the script and replicate
the problem (under Solaris, GAWK 3.1.0).  Here are some comments:
        1) It strikes me as a "Doctor, it hurts when I do this" sort of
thing - there are certainly more straightforward ways of achieving the
result (just do a few subs on the input lines to get rid of the text you
don't want).
        2) TAWK doesn't have the problem (running the same AWK code on the
same input file).



Wed, 16 Feb 2005 05:59:19 GMT  
 bug in gawk 3.1.1?

Quote:


> ...
> >Lorenz,

> >The problem with posting your data files (although they are text)
> >embedded in the posting is that mail programs and news readers may
> >reformat the data, including changing end of line characters, etc..  If
> >you would like to email me the source file as an attachment I'll take a
> >look at it and see if I can find where the problem lies.

> >Have you tried moving the data to a UNIX box by ftp and running the same
> >code there?  There could be a problem with the Windows port of gawk.

> I think the problem is real.

> I was able to take his file from the NG post and the script and replicate
> the problem (under Solaris, GAWK 3.1.0).  Here are some comments:
> 1) It strikes me as a "Doctor, it hurts when I do this" sort of
> thing - there are certainly more straightforward ways of achieving the
> result (just do a few subs on the input lines to get rid of the text you
> don't want).
> 2) TAWK doesn't have the problem (running the same AWK code on the
> same input file).

Kenny -

I see your point with the "It strikes me as a 'Doctor, it hurts when I do
this' sort of thing," but on the other hand, it's not so good for the GNU
folks to have to go around saying "Yeah, our regular expression engine
mostly works."

A friend of mine had an MSSQL problem...a query's column that should clearly
have evaluated to "true" was coming back "false".  Finally, 20 of us, from
different departments, had a conference call, and none of us could see
anything wrong with the code, and all agreed SQL wasn't acting like it
should.  My friend approached MS, and after three weeks, they came back with
"Yep, it's broken.  We fixed it by adding the clause ' and (1=1)' to your
expression."

That definitely left us with the dreaded "mostly works" issue.  How many
columns were we computing with boolean expressions that mostly worked?  Did
we have to plow through tens of thousands of queries, view definition,
stored procedures, and all of the code that created such things, just to add
"and (1=1)" to all of them?

Now that Lorenz has discovered an R/E bug that occurs on the order of 1 in
1000 evaluations, all regular expressions that have not been tested to that
extent are now suspect--until the bug is identified, or, at least, until
more research is done to discover what aspect of his R/E causes the failure.

So Lorenz's technique is just part of the issue.

    - Dan



Wed, 16 Feb 2005 10:11:09 GMT  
 bug in gawk 3.1.1?

Quote:


> ...
> >Lorenz,

> >The problem with posting your data files (although they are text)
> >embedded in the posting is that mail programs and news readers may
> >reformat the data, including changing end of line characters, etc..
If
> >you would like to email me the source file as an attachment I'll take
a
> >look at it and see if I can find where the problem lies.

> >Have you tried moving the data to a UNIX box by ftp and running the
same
> >code there?  There could be a problem with the Windows port of gawk.

> I think the problem is real.

So do I, but my comment still stands.  Dan has found "similar anomalies"
to Lorenz, but they're not the same which suggests to me that what is in
the post may not be exactly what is in Lorenz's original file.  Until an
independent test is performed on a copy of that file then "all bets are
off" IMO.

Using the posted data I found 4 identifiable errors (i.e., where the
output contains "dwv,") using gawk-3.1.1 (Cygwin, Win98SE) on lines 240,
638, 1120, and 1279 of the output (not the same as Lorenz's results).
If these aren't the same as yours and Dan's then I think we need the
original file to test.

Interestingly, mawk-1.3.3 also has problems with the data from the post,
but only on output lines 638 and 1279.

Perhaps there's a buffering problem when malloc'ing buffer space when
searching for the RE.

Quote:
> I was able to take his file from the NG post and the script and
replicate
> the problem (under Solaris, GAWK 3.1.0).

Did you get _exactly_ the same results as Lorenz?  If you did, but Dan
didn't then there may still be a diff problem between the post and the
original file.

Quote:
> Here are some comments:
> 1) It strikes me as a "Doctor, it hurts when I do this" sort of
> thing - there are certainly more straightforward ways of achieving the
> result (just do a few subs on the input lines to get rid of the text
you
> don't want).

Granted.  I wouldn't have coded the program in that way, but it ought to
work.

Quote:
> 2) TAWK doesn't have the problem (running the same AWK code on the
> same input file).

I guess it's had longer to mature with RS as RE than gawk.

Peter
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"
peter{dot}tillier<at>btinternet[dot]com
To reply direct to me please use the above address
not the "Reply To" which activates a spam trap.



Wed, 16 Feb 2005 15:35:36 GMT  
 bug in gawk 3.1.1?


...

Quote:
>> I think the problem is real.

>So do I, but my comment still stands.  Dan has found "similar anomalies"
>to Lorenz, but they're not the same which suggests to me that what is in
>the post may not be exactly what is in Lorenz's original file.  Until an
>independent test is performed on a copy of that file then "all bets are
>off" IMO.

Well, OK.  For whatever it may or may not be worth, I think the results
vary slightly based on (exact) version of GAWK, platform, compiler, and
phase of the moon (perhaps other factors are also involved).  As you say,
it is probably related to some internal thing (e.g., with the particular
system's implementation of malloc or whatever).

Quote:
>> I was able to take his file from the NG post and the script and
>> replicate the problem (under Solaris, GAWK 3.1.0).

>Did you get _exactly_ the same results as Lorenz?  If you did, but Dan
>didn't then there may still be a diff problem between the post and the
>original file.

No (I didn't check it very carefully).  As I've said, I see it as a
random-ish sort of thing.  The point of my post was to say, as Dan H had
already said, that it didn't look like a programmer error (as most things
in this NG are).  On this, I think we agree.

Quote:
>> Here are some comments:
>> 1) It strikes me as a "Doctor, it hurts when I do this" sort of thing
>>- there are certainly more straightforward ways of achieving the
>>result (just do a few subs on the input lines to get rid of the text
>>you don't want).

>Granted.  I wouldn't have coded the program in that way, but it ought to
>work.

Of course.  There is a general principle that it often takes obscure coding
practices to unearth obscure bugs.

Quote:
>> 2) TAWK doesn't have the problem (running the same AWK code on the
>> same input file).

>I guess it's had longer to mature with RS as RE than gawk.

Quite possibly so.


Wed, 16 Feb 2005 20:10:02 GMT  
 bug in gawk 3.1.1?

...

Quote:
>I see your point with the "It strikes me as a 'Doctor, it hurts when I do
>this' sort of thing," but on the other hand, it's not so good for the GNU
>folks to have to go around saying "Yeah, our regular expression engine
>mostly works."

See my other post - I'm certainly not saying that it isn't interesting.
(I'm as much of a bug hunter as anyone)

But, more to the point, I don't think we've established yet whether it is a
"regular expression engine" problem or an RS (and/or "RS as an RE") problem.

The next step should be for us to try to replicate Lorenzo's problem
outside of the RS context.



Thu, 17 Feb 2005 03:02:43 GMT  
 bug in gawk 3.1.1?

Quote:


> ...
> >I see your point with the "It strikes me as a 'Doctor, it hurts when I do
> >this' sort of thing," but on the other hand, it's not so good for the GNU
> >folks to have to go around saying "Yeah, our regular expression engine
> >mostly works."

> See my other post - I'm certainly not saying that it isn't interesting.
> (I'm as much of a bug hunter as anyone)

> But, more to the point, I don't think we've established yet whether it is
a
> "regular expression engine" problem or an RS (and/or "RS as an RE")
problem.

> The next step should be for us to try to replicate Lorenzo's problem
> outside of the RS context.

I see what you're saying.  Yes, I agree.
Maybe set RS to "\0", then
    1) try split(),
    2) try match()
    3) try "~"
?
Maybe also try /(...)*/ instead of /(...)?/ to see if it is an operator
problem?  And then maybe try /(...)+/ instead of /(...)*/ (which will mangle
output) to see if it is an operation problem.  ("Does it know what to do?"
vs. "Does it do it correctly?")

I should also re-run my implementation, because some of the line numbers
Peter reported sound familiar.

I did try eliminating "\n" from the mix, replacing it with "a" so it was one
long line, and still observed the problem.

    - Dan



Fri, 18 Feb 2005 02:07:31 GMT  
 bug in gawk 3.1.1?

Quote:

> The problem with posting your data files [...]

well I first tried to attach the zipped file, but my news server
reminded me that c.l.a is no binary group 8-)

I going to mail you the files.
If anyone else wants them, ...

Quote:
> Have you tried moving the data to a UNIX box by ftp and running the same
> code there?  There could be a problem with the Windows port of gawk.

I have no access to unix in the moment

I have found some interesting effects.
I'm able to delete up to 9 pairs of lines form the top of the file
without changing the line numbers where the error occurs (in my case
1262, 1277 & 1279).
But deleting only on digit in the second line will leave me with only
one error in line 1262.

best regards
     Lorenz



Fri, 18 Feb 2005 16:21:19 GMT  
 bug in gawk 3.1.1?
Hi all,

as everyone here seems to agree that my skript is "obscure" and no
good solution, can someone please point out what's wrong with it?

Any other solution I could think of would have needed (much) more
coding.

best regards
      Lorenz



Fri, 18 Feb 2005 17:23:12 GMT  
 bug in gawk 3.1.1?
Hello,

Quote:

> as everyone here seems to agree that my skript is "obscure" and no
> good solution, can someone please point out what's wrong with it?

> Any other solution I could think of would have needed (much) more
> coding.

we (as awk fans), surely think that awk is the best tool for the job ;-)

Your script is correct, so technically there is nothing wrong and
gawk should work.  We are glad that you helped us to find the bug
and be sure the bug will be fixed sooner or later.

Back to your problem.  With the data looking like this:

ti1
dwv,98.22
ti1
dwv,103.08
ti1
ti1
dwv,196.25
ti1
dwv,210.62

we'd use for example this program:

BEGIN {FS=","; s=0; i=0}
$0 == "ti1" {print ++i, s; next}
$1 == "dwv" {s = $2}

Fell free to ask us/me if there appears a problem when extending this
approach to the real-life situation.

HTH,
        Stepan



Fri, 18 Feb 2005 19:14:07 GMT  
 bug in gawk 3.1.1?

Quote:
> Hello,


> > as everyone here seems to agree that my skript is "obscure" and no
> > good solution, can someone please point out what's wrong with it?

> > Any other solution I could think of would have needed (much) more
> > coding.

> we (as awk fans), surely think that awk is the best tool for the job ;-)

> Your script is correct, so technically there is nothing wrong and
> gawk should work.  We are glad that you helped us to find the bug
> and be sure the bug will be fixed sooner or later.

> Back to your problem.  With the data looking like this:

> ti1
> dwv,98.22
> ti1
> dwv,103.08
> ti1
> ti1
> dwv,196.25
> ti1
> dwv,210.62

> we'd use for example this program:

> BEGIN {FS=","; s=0; i=0}
> $0 == "ti1" {print ++i, s; next}
> $1 == "dwv" {s = $2}

> Fell free to ask us/me if there appears a problem when extending this
> approach to the real-life situation.

> HTH,
> Stepan

Unless there is a trailing "ti1", the last value put into "s" won't be
output, unless you add...

    END {print ++i, s}

(Also, "i" doesn't have to be initialized, because it is pre-incremented
("s" does, because otherwise it would be blank on the first output line,
instead of 0.))

    - Dan



Sat, 19 Feb 2005 01:50:16 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 

1. bug in gawk? (found in 3.0.4)

2. Bug in GAWK 3.1.1?

3. bug in gawk 3.0.3

4. Gawk bug, gawk won't nawk.

5. Bug in GAWK 3.1.0 -> --dump doesn't work with extension()

6. Gawk for win32 slower than Gawk for Dos_32?

7. gawk bug with linux?

8. GAWK bug?

9. gawk 3 patch by A. Robbins for index(.,"")==0 bug

10. Bug: gawk 3.0 fails to index(.,"")

11. GAWK 3.0.95 - Bug (er, problem) report on extension stuff

12. gawk 3.0.95, beta for gawk 3.1.0, now available

 

 
Powered by phpBB® Forum Software