multiple separators 
Author Message
 multiple separators

Many thanks to all who have helped me with my awk problems so far.
I worked through most of my problems and got my script working, but I now
have one question.
Is it possible to have multiple filed separators at the same time?
If I change the field separator to : then a space is no longer a separator.
What I want to do is keep the space as a separator & have a : as a
separator too.
I am parsing a statements of the following form
hours: 44
but this is a manually constructed database that people just e-mail forms
to. They get the basic form which has "hours:" and fill it in.
There things in there like
hours: 44.0
hours: 22 hours
hours: 12.5 hours estimated including ....
My current script has no problems extracting the number, but what I am afraid
of is someone puting something like
hours:44...
and awk not seeing that as 2 separate fields.

Thanks

Mark

--
==========================================================================
                Mark Ayzenshteyn CE Major at UCSD      

                http://www.*-*-*.com/ ~marka              



Mon, 16 Apr 2001 03:00:00 GMT  
 multiple separators
....
Quote:
>Is it possible to have multiple filed separators at the same time?
>If I change the field separator to : then a space is no longer a separator.
>What I want to do is keep the space as a separator & have a : as a
>separator too.
>I am parsing a statements of the following form
>hours: 44
>but this is a manually constructed database that people just e-mail forms
>to. They get the basic form which has "hours:" and fill it in.
>There things in there like
>hours: 44.0
>hours: 22 hours
>hours: 12.5 hours estimated including ....
>My current script has no problems extracting the number, but what I am
afraid
>of is someone puting something like
>hours:44...
>and awk not seeing that as 2 separate fields.

....

Use  FS="[ :]+"  which will make any sequence of spaces and colons of any
length a single field separator.



Mon, 16 Apr 2001 03:00:00 GMT  
 multiple separators

Quote:

> Many thanks to all who have helped me with my awk problems so far.
> I worked through most of my problems and got my script working, but I now
> have one question.
> Is it possible to have multiple filed separators at the same time?
> If I change the field separator to : then a space is no longer a separator.
> What I want to do is keep the space as a separator & have a : as a
> separator too.
> I am parsing a statements of the following form
> hours: 44
> but this is a manually constructed database that people just e-mail forms
> to. They get the basic form which has "hours:" and fill it in.
> There things in there like
> hours: 44.0
> hours: 22 hours
> hours: 12.5 hours estimated including ....
> My current script has no problems extracting the number, but what I am afraid
> of is someone puting something like
> hours:44...
> and awk not seeing that as 2 separate fields.

> Thanks

> Mark

Assuming that what you are interested in is a pattern consisting of some
characters followed by a colon followed by some or no spaces followed by
the number of hours, then you can use a regular expression rather than
splitting the string into fields.

i.e.

{
        match($0,/[^a-zA-Z :]*[0-9.][0-9.]*/)
        if (RSTART > 0)
                print substr($0,RSTART,RLENGTH)

Quote:
}

briefly, what the above does is ignores (^) any lower case alphabetic
(a-z) or upper case (A-Z) or space ( ) or colon (:) as many times (*) as
these characters appear and then takes as a minimum 1 numeric (0-9) or
decimal point (.) and then allows as many consecutive numerics (0-9) or
decimal points (.) that may follow.

OK, there is a slight problem here because you could have

hours:   23.45.66

and this would return

23.45.66

but without a clearer picture of the format of your data it is difficult
to build a regular expression. For instance could you have

hours: .5

Anyway, the above should work for you. Tested with your sample data and
returned only the number of hours.

Cesar

--
Please remove the uppercase characters from my e-mail address for the
real thing



Tue, 17 Apr 2001 03:00:00 GMT  
 multiple separators


Quote:
>Many thanks to all who have helped me with my awk problems so far.
>I worked through most of my problems and got my script working, but I now
>have one question.
>Is it possible to have multiple filed separators at the same time?
>If I change the field separator to : then a space is no longer a separator.
>What I want to do is keep the space as a separator & have a : as a
>separator too.
>I am parsing a statements of the following form
>hours: 44
>but this is a manually constructed database that people just e-mail forms
>to. They get the basic form which has "hours:" and fill it in.
>There things in there like
>hours: 44.0
>hours: 22 hours
>hours: 12.5 hours estimated including ....
>My current script has no problems extracting the number, but what I am afraid
>of is someone puting something like
>hours:44...
>and awk not seeing that as 2 separate fields.

Gawk will let you use a regular expression as the field separator.

gawk -F":* *" '/hours/ {print $2}' infile

will insure finding two fields for the cases you cite.

Of course, you could always use the sub function to change
"hours:" to "hours: "

gawk '{sub(/hours:/,"hours: ");print $2' temp

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Tue, 17 Apr 2001 03:00:00 GMT  
 multiple separators


Quote:
>Many thanks to all who have helped me with my awk problems so far.
>I worked through most of my problems and got my script working, but I now
>have one question.
>Is it possible to have multiple filed separators at the same time?
>If I change the field separator to : then a space is no longer a separator.
>What I want to do is keep the space as a separator & have a : as a
>separator too.
>I am parsing a statements of the following form
>hours: 44
>but this is a manually constructed database that people just e-mail forms
>to. They get the basic form which has "hours:" and fill it in.
>There things in there like
>hours: 44.0
>hours: 22 hours
>hours: 12.5 hours estimated including ....
>My current script has no problems extracting the number, but what I am afraid
>of is someone puting something like
>hours:44...
>and awk not seeing that as 2 separate fields.

>Thanks

>Mark

My desk clock says 9:15 which reminds me - you may get a colon in the
number of hours (separating hours and minutes) so you need to be careful
about using a colon as a field separator. You might end up throwing away
the minutes and rounding the hours downward.

Assuming :-
1) the line always starts with "hours:"
2) a time of the form 44.1 means 44 and one tenth hours
3) a time of the form 44:06 means 44 hours and 6 minutes

I would suggest something like this

/^hours:/ {
  sub(/^hours:[ \t]*/,"") # delete the word "hours:"
  if ($1~/^[0-9]+:/) {
    sub(/:/," ")
    h=$1+$2/60
  } else {
    h=$1+0
  }
  print h,"hours" # or do what you want with variable h

Quote:
}

In the line h=$1+0 I add 0 to ensure that h is treated as a number, not
a string so that any trailing non-number characters will be removed.

Using this as input :-

hours: 44.0
hours: 22 hours
hours: 12.5 hours estimated including ....
My current script has no problems extracting the number, but what I am
afraid
of is someone puting something like
hours:44...
hours:41:25

I get this as output :-

44 hours
22 hours
12.5 hours
44 hours
41.4167 hours

I used gawk 3.0.3 but there is nothing gawk-specific here. Some very old
versions of awk might not reparse the line after "hours:" is removed. I
recommend [gnmt]awk.

Hope this helps.
--
Alan Linton



Tue, 17 Apr 2001 03:00:00 GMT  
 multiple separators

Quote:

> hours: 44.0
> hours: 22 hours
> hours:12.5 hours estimated including ....

>  extracting the number from the script:

I would preprocess with sed or built the following
substitutions as gsubs into an awk:

cat infile |
sed 's/^hours:[     ]*\([0-9][0-9,]*\.\{0,1\}[0-9]*\)/hours: \1 /
s/^hours:[     ]*\(\.[0-9]\{1,\}\)/hours: \1 /' |
awk '/^hours: (([0-9][0-9,]*\.?[0-9]*)|(\.[0-9]+\))/ {
                         print "Hours used: " $2 }' >outfile

not tested
 LMS
free sed/awk book:
      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.ps.gz
      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz



Thu, 19 Apr 2001 03:00:00 GMT  
 multiple separators

Quote:

> hours: 44.0
> hours: 22 hours
> hours:12.5 hours estimated including ....

>  extracting the number from the script:

I would preprocess with sed or build the following
substitutions as gsubs into an awk:

cat infile |
sed 's/^hours:[     ]*\([0-9][0-9,]*\.\{0,1\}[0-9]*\)/hours: \1 /
s/^hours:[     ]*\(\.[0-9]\{1,\}\)/hours: \1 /' |
awk '/^hours: (([0-9][0-9,]*\.?[0-9]*)|(\.[0-9]+\))/ {
                         print "Hours used: " $2 }' >outfile

not tested
 LMS
free sed/awk book:
      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.ps.gz
      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz



Fri, 20 Apr 2001 03:00:00 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Multiple field separators for paranthesis not working

2. multiple field separators in AWK

3. multiple lines matching without separator

4. Specifying multiple field separators

5. Bug: separators

6. problem: decimal separators in gawk on a German windows installation

7. Hexadecimal Field Separator

8. fixing backslash path separator in C #include statements

9. Field Separator

10. Add menu separator at run time?

11. Field Separator larger than one character

12. how to specify a blank line as record separator

 

 
Powered by phpBB® Forum Software