searching for $ in file 
Author Message
 searching for $ in file

I have a file that has a number of strings that look like this: $123
where 123 could be any number from 0-9999.  I want to print out these
strings, and only these strings.  Here is what I am trying:

BEGIN { FS="[ ,]" }
{
for ( i = 1; i <= NF; i++ )
  if ( $i ~ "[\$]\{1,4\}" ){
    printf("Field number:%d Line number:%d Dollarbit: %s\n",i,NR, $i)
  }

Quote:
}

My results:
Field number:21 Line number:71 Dollarbit: $
Field number:18 Line number:72 Dollarbit: $1
Field number:19 Line number:72 Dollarbit: $1
Field number:11 Line number:98 Dollarbit: $100
Field number:9 Line number:99 Dollarbit: $100
Field number:14 Line number:100 Dollarbit: $101
Field number:15 Line number:100 Dollarbit: $0;
Field number:13 Line number:111 Dollarbit: $233
Field number:14 Line number:111 Dollarbit: -$233

Notice that the first one has no digit, I thought my search expression
would have not allowed that one to be chosen, so I need a hint there.
Also, you can see ';' and '-' included, I would like to strip these
out and I am not sure how to do that.

Thanks!

--
fybar



Fri, 29 Jul 2005 02:37:30 GMT  
 searching for $ in file
I use TAWK which doesn't have all the regular expressions that you
might have available.

I would rewrite your if-statement as

if ($i ~ /^\$[0-9][0-9]?[0-9]?[0-9]?$/)

meaning that the field must begin with a dollar sign, must have at
least one digit between 0-9, can be followed by up to three more
digits between 0-9, and there can be no other characters in the field.

I believe that other AWK versions could use something like

if ($1 ~ /\$[0-9](1,4)/)

meaning that the field must have a dollar sign followed by 1-4 digits
between 0-9. This expression doesn't care if there is something before
or after the $nnnn you identified.

I'm not familiar with the syntax you are using, but where are you
specifying a numeric character in your match?

DKM

Quote:

>I have a file that has a number of strings that look like this: $123
>where 123 could be any number from 0-9999.  I want to print out these
>strings, and only these strings.  Here is what I am trying:

>BEGIN { FS="[ ,]" }
>{
>for ( i = 1; i <= NF; i++ )
>  if ( $i ~ "[\$]\{1,4\}" ){
>    printf("Field number:%d Line number:%d Dollarbit: %s\n",i,NR, $i)
>  }

>}

>My results:
>Field number:21 Line number:71 Dollarbit: $
>Field number:18 Line number:72 Dollarbit: $1
>Field number:19 Line number:72 Dollarbit: $1
>Field number:11 Line number:98 Dollarbit: $100
>Field number:9 Line number:99 Dollarbit: $100
>Field number:14 Line number:100 Dollarbit: $101
>Field number:15 Line number:100 Dollarbit: $0;
>Field number:13 Line number:111 Dollarbit: $233
>Field number:14 Line number:111 Dollarbit: -$233

>Notice that the first one has no digit, I thought my search expression
>would have not allowed that one to be chosen, so I need a hint there.
>Also, you can see ';' and '-' included, I would like to strip these
>out and I am not sure how to do that.

>Thanks!

To contact me directly, send EMAIL to (single letters all)
DEE KAY EMM AT CEE TEE ESS D0T CEE OH EMM


Fri, 29 Jul 2005 03:51:40 GMT  
 searching for $ in file


Quote:
>I use TAWK which doesn't have all the regular expressions that you
>might have available.

>I would rewrite your if-statement as

>if ($i ~ /^\$[0-9][0-9]?[0-9]?[0-9]?$/)

>meaning that the field must begin with a dollar sign, must have at
>least one digit between 0-9, can be followed by up to three more
>digits between 0-9, and there can be no other characters in the field.

>I believe that other AWK versions could use something like

>if ($1 ~ /\$[0-9](1,4)/)

I know nothing about the instant problem, but want to comment on "re
interval"s in the various AWKs.

Both TAWK and GAWK support the use of {n} notation - with GAWK, you have to
specify the (IMHO, silly) --re-interval option on the command line.  TAWK
also, incidentally supports back-referencing without any silliness.

And, speaking of silliness, AWK does not require the \ in front of the {,
as do the editors.  I've always found that silly in, e.g., vi or vim.



Fri, 29 Jul 2005 04:27:52 GMT  
 searching for $ in file

[...]

Quote:
> I know nothing about the instant problem, but want to comment on "re
> interval"s in the various AWKs.

> Both TAWK and GAWK support the use of {n} notation - with GAWK, you
have to
> specify the (IMHO, silly) --re-interval option on the command line.

With gawk this syntax is, I guess, for backward compatibility with
earlier gawks, which didn't, AFAIK, support re-intervals.  You also
enable this facility if you use the --posix command line option.

Quote:
> TAWK
> also, incidentally supports back-referencing without any silliness.

AFAIK, TAWK can do this because it uses an NFA engine, which can
provide positional information (and hence back-references).  The
current version of gawk uses a mixture of DFA and NFA.  DFA where
all that is needed is whether a match has been found or not (under
which circumstances DFAs are generally faster).  And NFA where any
positional information is required.

Generally awk implementations do not provide back-referencing in
REs.

Quote:
> And, speaking of silliness, AWK does not require the \ in front of the
{,
> as do the editors.  I've always found that silly in, e.g., vi or vim.

ISTR that Dennis Ritchie says that \( \) was used in editors for
grouping because programmers were more likely to be searching for
parentheses.  I guess that when re-intervals were added similar
considerations were thought to apply to { } and hence \{ \} were
chosen as the re-interval delimiters.  Or, possibly, the same
pattern as for parentheses seemed more sensible.

Regards,
Peter
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"



Fri, 29 Jul 2005 04:49:08 GMT  
 searching for $ in file

Quote:

> I have a file that has a number of strings that look like this: $123
> where 123 could be any number from 0-9999.  I want to print out these
> strings, and only these strings.  Here is what I am trying:

> BEGIN { FS="[ ,]" }
> {
> for ( i = 1; i <= NF; i++ )
>   if ( $i ~ "[\$]\{1,4\}" ){
>     printf("Field number:%d Line number:%d Dollarbit: %s\n",i,NR, $i)
>   }

> }

> My results:
> Field number:21 Line number:71 Dollarbit: $
> Field number:18 Line number:72 Dollarbit: $1
> Field number:19 Line number:72 Dollarbit: $1
> Field number:11 Line number:98 Dollarbit: $100
> Field number:9 Line number:99 Dollarbit: $100
> Field number:14 Line number:100 Dollarbit: $101
> Field number:15 Line number:100 Dollarbit: $0;
> Field number:13 Line number:111 Dollarbit: $233
> Field number:14 Line number:111 Dollarbit: -$233

> Notice that the first one has no digit, I thought my search expression
> would have not allowed that one to be chosen, so I need a hint there.

You forgot to specify digits in your regular expression, so you just
require between 1 and 4 dollar signs. You need: $i ~ /\$[0-9]{1,4}/

As Kenny McCormack points out elsethread, if you are using
Gnu awk, you need to include --re-interval on the command line
in order to use the {1,4} notation.

Quote:
> Also, you can see ';' and '-' included, I would like to strip these
> out and I am not sure how to do that.

Remove anything which is neither a digit nor a dollar sign:
gsub(/[^0-9$]/, "", $i)

John.



Fri, 29 Jul 2005 06:49:15 GMT  
 searching for $ in file


Quote:
>I use TAWK which doesn't have all the regular expressions that you
>might have available.

>I would rewrite your if-statement as

>if ($i ~ /^\$[0-9][0-9]?[0-9]?[0-9]?$/)

>meaning that the field must begin with a dollar sign, must have at
>least one digit between 0-9, can be followed by up to three more
>digits between 0-9, and there can be no other characters in the field.

>I believe that other AWK versions could use something like

>if ($1 ~ /\$[0-9](1,4)/)

Thanks for the reply.  The ^ and $ should have been obvious!  This is
my final expression:

if ($i ~ /^\$[0-9]{1,4}$/)

Gave me exactly what I wanted, thanks!

--
fybar



Fri, 29 Jul 2005 06:54:10 GMT  
 searching for $ in file

Quote:


> > Also, you can see ';' and '-' included, I would like to strip these
> > out and I am not sure how to do that.

> Remove anything which is neither a digit nor a dollar sign:
> gsub(/[^0-9$]/, "", $i)

But of course this is dangerous. It will convert $2+2 into $22 instead of $2.

But that might not be a problem given your input.
This would have been a lot easier if you posted a sample
of your input together with the output you desire.

If the $2+2 problem above is real, then you might want
to consider changing the order in which you do things.

First, convert anything apart from dollar signs or digits to *spaces*
(or whatever the field separator is). This can be done in the awk
program, or you could pre-process the input. As this is comp.lang.awk
we'll do it in the awk program.
        gsub(/[^0-9$]+/, " ", $0)

Then, re-split the line into an array, then test each array element (or field)
against your regular expression. Essentially, this bit is the same as
your current program.
        NumDollarFields = split($0, A, " ")
        for (i = 1; i <= NumDollarFields; i++)
                if (A[i] ~ /\$[0-9]{1,4}/)
                        print A[i]

Depending on your data, you might be able to save time by
only processing records that include dollar fields.

John.



Fri, 29 Jul 2005 07:05:23 GMT  
 searching for $ in file

Quote:

> First, convert anything apart from dollar signs or digits to *spaces*
> (or whatever the field separator is). This can be done in the awk
> program, or you could pre-process the input. As this is comp.lang.awk
> we'll do it in the awk program.
>         gsub(/[^0-9$]+/, " ", $0)

> Then, re-split the line into an array, then test each array element (or field)
> against your regular expression. Essentially, this bit is the same as
> your current program.
>         NumDollarFields = split($0, A, " ")
>         for (i = 1; i <= NumDollarFields; i++)
>                 if (A[i] ~ /\$[0-9]{1,4}/)
>                         print A[i]

Of course, this could be fooled by strings like 23$43 or $2$4, but
the fix is quite simple once you know whether the correct thing to
do is output $43, $2, and $4 (before the split, add a space before
each dollar sign -- gsub(/\$/, " $", $0) -- ) or output nothing (anchor the
regular expression to the start and end of each field -- /^\$[0-9]{1,4}$/ --) .
Only the OP knows which is appropriate.

John.



Fri, 29 Jul 2005 16:03:52 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Search for a file

2. grep-like search with multiple file output?

3. Searching Large TPS Files

4. Search for a file on a drive

5. Text search on DBF file

6. searching for a file.

7. SEARCH.BAT - BBS file listings searcher.

8. searching aircraft vrml files

9. algorithim for searching each *.com file

10. Searching Disk for Files

11. intelligent search in indexed file

12. Search subdirectories for file spec

 

 
Powered by phpBB® Forum Software