sed - extracting "*.cpp" from a file 
Author Message
 sed - extracting "*.cpp" from a file

Hi all,

hope I won't annoy you.
A friend of mine wanted to extract all
the cpp file names from a file
(cpp stands for c++, the file names are
constructed with a-z 1-9 and _, we don't have
spaces).

He asked me if we could do it with sed
(because I use "sed" as a nickname).

So here is a little sed program I wrote.
It seems to work. Can one tell me if
it is good ? if it works ? if it can be done better ?
faster ?

See you,
Cedric.

How does it work ?
We read a line after a line.
If the beginning of the line is <filename>.cpp, we take away
all the rest and do a print.
Then, in every cases, we take away the first word (a word
is everything but space) and keep going with the resulting line
until it is empty.
All done with s, t and b as control structures, which are enough !
(even if we lose a lot {*filter*}tically speaking ; the code is not clear
at all)

------

/[^ ][^ ]*\.cpp/{
: here
#take away leading spaces
s/^[ ]//
t here

#save the line in hold buffer
h
#does it start with "filename.cpp" ?
#(I like those if ... then goto ... else goto .... construct :-))
s/^[^ ][^ ]*\.cpp/&/
t print
b next

#if yes, we display it (by taking away all what is
#after the first ".cpp") (so file names can't be anything)
: print
s/\.cpp.*/\.cpp/
#just a little t for if the previous s did something
#we want to forget it (this was a bug at first, I did
#not do this t, so another t was running, even if the
#corresponding s (as I first though it was) did nothing...)
t st1
: st1
p

#if not (or after printing the file name) we take away the first word
#note that we copy back from the hold buffer because in the print
#we destroyed a bit the pattern buffer...
: next
g
s/^[^ ][^ ]*//
#if some more words (meaning we could take the first away)
#we start all again (if it was the last word, we do another run too)
t here

Quote:
}



Sat, 20 Sep 2003 23:44:05 GMT  
 sed - extracting "*.cpp" from a file
Hi Cedric
Find below a better alternate.  It is better because
of not using the hold space or the pattern space.
Also the code does not do multiple substitutions.
The flow is logical, read any line, make substitution
to eliminate anything other than "filename.cpp".
If a substitution is made, print proceed to next line
or just read the next line.  Hope it works for you.

#n
:begin
/^.*/{
s/[     ]\([a-zA-Z0-9_]*\.cpp\)/\1/
t print

Quote:
}

n
b begin
:print
p
n
b begin

Regards
Raman

Quote:

> Hi all,

> hope I won't annoy you.
> A friend of mine wanted to extract all
> the cpp file names from a file
> (cpp stands for c++, the file names are
> constructed with a-z 1-9 and _, we don't have
> spaces).

> He asked me if we could do it with sed
> (because I use "sed" as a nickname).

> So here is a little sed program I wrote.
> It seems to work. Can one tell me if
> it is good ? if it works ? if it can be done better ?
> faster ?

> See you,
> Cedric.

> How does it work ?
> We read a line after a line.
> If the beginning of the line is <filename>.cpp, we take away
> all the rest and do a print.
> Then, in every cases, we take away the first word (a word
> is everything but space) and keep going with the resulting line
> until it is empty.
> All done with s, t and b as control structures, which are enough !
> (even if we lose a lot {*filter*}tically speaking ; the code is not clear
> at all)

> ------

> /[^ ][^ ]*\.cpp/{
> : here
> #take away leading spaces
> s/^[ ]//
> t here

> #save the line in hold buffer
> h
> #does it start with "filename.cpp" ?
> #(I like those if ... then goto ... else goto .... construct :-))
> s/^[^ ][^ ]*\.cpp/&/
> t print
> b next

> #if yes, we display it (by taking away all what is
> #after the first ".cpp") (so file names can't be anything)
> : print
> s/\.cpp.*/\.cpp/
> #just a little t for if the previous s did something
> #we want to forget it (this was a bug at first, I did
> #not do this t, so another t was running, even if the
> #corresponding s (as I first though it was) did nothing...)
> t st1
> : st1
> p

> #if not (or after printing the file name) we take away the first word
> #note that we copy back from the hold buffer because in the print
> #we destroyed a bit the pattern buffer...
> : next
> g
> s/^[^ ][^ ]*//
> #if some more words (meaning we could take the first away)
> #we start all again (if it was the last word, we do another run too)
> t here
> }



Sun, 21 Sep 2003 02:17:30 GMT  
 sed - extracting "*.cpp" from a file

Quote:

> A friend of mine wanted to extract all
> the cpp file names from a file
> (cpp stands for c++, the file names are
> constructed with a-z 1-9 and _, we don't have
> spaces).

> He asked me if we could do it with sed
> (because I use "sed" as a nickname).

> So here is a little sed program I wrote.
> It seems to work. Can one tell me if
> it is good ? if it works ? if it can be done better ?
> faster ?

Well, it can certainly be done more straightforwardly in Perl:

    perl -lne 'print $& while m/\S+\.cpp\b/g' filelist

The regular expression \S+ matches one or more non-whitespace
characters. The regex \b matches a word boundary, which prevents
matching file names such as foo.cppbar. The -n command option turns
on "awk mode": it causes perl to wrap the script in an input loop,
but does not print the lines automatically. The special variable
$& holds the substring matched by the regular expression pattern.
Most Perl programmers would actually do this instead:

    perl -lne 'print $1 while m/(\S+\.cpp\b)/g' filelist

--
Jim Monty

Tempe, Arizona USA



Mon, 22 Sep 2003 01:03:14 GMT  
 sed - extracting "*.cpp" from a file
If I've understood the problem correctly, you could shorten it to this:

#n
/\.cpp/{
s/[     ]*//
s/\.cpp.*//
p

Quote:
}

or

/\.cpp/!n
s/[     ]*\([[:alphanum:]_\)\.cpp/\1/

if your sed version supports the [[:whatever:]] regex construct.
IIRC, if the n directive does not cause the current pattern to be
printed, but you'll have to check this.

-Ed

--
Did you know that the oldest known rock is the famous |u98ejr

old?                                                  |eng.ox
                -The Hackenthorpe Book of Lies        |.ac.uk



Wed, 24 Sep 2003 05:13:26 GMT  
 sed - extracting "*.cpp" from a file

Quote:

> the cpp file names from a file

Hallo,
        I'd like to suggest a sed variant of Jim Monty's perl solution.
But even though I haven't fall for perl yet, I have to admit that the perl
one liner is much more elegant.

#
# | is our special character. Delete it from the input stream:
s/|//g
# mark all filenames by |'s
s/[A-Za-z0-9_]\+\.cpp/|&|/g
# test if a cpp filename was found:
/|/ ! d
# delete everything from the beginning:
s/^[^|]*|//
# ... and from the end:
s/|[^|]*$//
# replace ``things between'' by newlines:
s/|[^|]*|/
/g

Comments:
If your version doesn't recognize the \+ construct, you have to use XX* trick.

And if your version of sed isn't able to eat the last two-line command, you
can replace it by
        s/|[^|]*|/|/g
and running it through a pipe like this:
        sed -f the_script.sed | tr '|' '\n'
(Assuming UNIX-like environment with UNIX-style endlines.)

HTH,
        Stepan Kasal



Thu, 25 Sep 2003 22:36:16 GMT  
 sed - extracting "*.cpp" from a file

Quote:


> > the cpp file names from a file

> Hallo,
>         I'd like to suggest a sed variant of Jim Monty's perl solution.
> But even though I haven't fall for perl yet, I have to admit that the perl
> one liner is much more elegant.

> #
> # | is our special character. Delete it from the input stream:
> s/|//g
> # mark all filenames by |'s
> s/[A-Za-z0-9_]\+\.cpp/|&|/g
> # test if a cpp filename was found:
> /|/ ! d
> # delete everything from the beginning:
> s/^[^|]*|//
> # ... and from the end:
> s/|[^|]*$//
> # replace ``things between'' by newlines:
> s/|[^|]*|/
> /g

> Comments:
> If your version doesn't recognize the \+ construct, you have to use XX* trick.

> And if your version of sed isn't able to eat the last two-line command, you
> can replace it by
>         s/|[^|]*|/|/g
> and running it through a pipe like this:
>         sed -f the_script.sed | tr '|' '\n'
> (Assuming UNIX-like environment with UNIX-style endlines.)

> HTH,
>         Stepan Kasal

Hi, it does not work on my host (the /|/ ! d is not recognized and I had troubles
with
the + construct), but the following one does :

#
# | is our special character. Delete it from the input stream:
s/|//g
# mark all filenames by |'s
s/[A-Za-z0-9_][A-Za-z0-9_]*\.cpp/|&|/g
# test if a cpp filename was found:
/|/{
# delete everything from the beginning:
s/^[^|]*|//
# ... and from the end:
s/|[^|]*$//
# replace ``things between'' by newlines:
s/|[^|]*|/\
/g
p

Quote:
}

I run it with the -n flag.

Maybe the newline stuff won't be recognized on all implementations (the
last s///g) ?
We could use tr, but the idea is to have only sed (yes, it's some form
of perversion :-)).

I did not understand the perl version, but this one is very pretty, very clever
I must admit.

Thank you for your response. I guess I have a lot to learn :)
Cedric.



Fri, 26 Sep 2003 22:39:52 GMT  
 sed - extracting "*.cpp" from a file

Quote:

> [snip]

> I did not understand the perl version, but this one is very pretty,
> very clever I must admit.

Huh? You mean you understood this

  | s/|//g
  | s/[A-Za-z0-9_][A-Za-z0-9_]*\.cpp/|&|/g
  | /|/{
  | s/^[^|]*|//
  | s/|[^|]*$//
  | s/|[^|]*|/\
  | /g
  | p
  | }

but you couldn't figure this out, even with the accompanying
explanation?

  |     perl -lne 'print $& while m/\S+\.cpp\b/g' filelist
  |
  | The regular expression \S+ matches one or more non-whitespace
  | characters. The regex \b matches a word boundary, which prevents
  | matching file names such as foo.cppbar. The -n command option turns
  | on "awk mode": it causes perl to wrap the script in an input loop,
  | but does not print the lines automatically. The special variable
  | $& holds the substring matched by the regular expression pattern.

The Perl script globally matches (m/.../g) C++ source file names
that conform to your specification and prints each one. Simple. No
needless destructiveness (i.e., substitution operations). Only one
regular expression pattern match operation per line of input.

In my experience, for the purpose of extracting substrings that
match a regular expression pattern from arbitrary text, Perl is
better than sed and awk. Perl does it directly, while sed and awk
can only do it indirectly.

--
Jim Monty

Tempe, Arizona USA



Sat, 27 Sep 2003 02:35:23 GMT  
 sed - extracting "*.cpp" from a file
I didn't see the original post, but if I understand what's going on then
what about this as an alternative to the perl solution (and the headache
sed solution):
    xargs -n1 echo < filelist | awk '/.\.cpp$/'

or similarly using sed:
    xargs -n1 echo < filelist | sed -n '/.\.cpp$/p'

Ben

Quote:


> > [snip]

> > I did not understand the perl version, but this one is very pretty,
> > very clever I must admit.

> Huh? You mean you understood this

>   | s/|//g
>   | s/[A-Za-z0-9_][A-Za-z0-9_]*\.cpp/|&|/g
>   | /|/{
>   | s/^[^|]*|//
>   | s/|[^|]*$//
>   | s/|[^|]*|/\
>   | /g
>   | p
>   | }

> but you couldn't figure this out, even with the accompanying
> explanation?

>   |     perl -lne 'print $& while m/\S+\.cpp\b/g' filelist
>   |
>   | The regular expression \S+ matches one or more non-whitespace
>   | characters. The regex \b matches a word boundary, which prevents
>   | matching file names such as foo.cppbar. The -n command option turns
>   | on "awk mode": it causes perl to wrap the script in an input loop,
>   | but does not print the lines automatically. The special variable
>   | $& holds the substring matched by the regular expression pattern.

> The Perl script globally matches (m/.../g) C++ source file names
> that conform to your specification and prints each one. Simple. No
> needless destructiveness (i.e., substitution operations). Only one
> regular expression pattern match operation per line of input.

> In my experience, for the purpose of extracting substrings that
> match a regular expression pattern from arbitrary text, Perl is
> better than sed and awk. Perl does it directly, while sed and awk
> can only do it indirectly.

> --
> Jim Monty

> Tempe, Arizona USA



Mon, 29 Sep 2003 03:44:30 GMT  
 sed - extracting "*.cpp" from a file

Quote:

> I didn't see the original post, but if I understand what's going on then
> what about this as an alternative to the perl solution (and the headache
> sed solution):
>     xargs -n1 echo < filelist | awk '/.\.cpp$/'

> or similarly using sed:
>     xargs -n1 echo < filelist | sed -n '/.\.cpp$/p'

  or similarly using grep:
      xargs -n1 echo < filelist | grep '.\.cpp$'

It's always fun to see xargs put to good use, but beware both single
and double quotes in the input. Perhaps tr is better here:

  tr -s ' \t' '[\n*]' <filelist | grep '.\.cpp$'

--
Jim Monty

Tempe, Arizona USA



Mon, 29 Sep 2003 13:12:59 GMT  
 sed - extracting "*.cpp" from a file

Quote:


> > [snip]

> > I did not understand the perl version, but this one is very pretty,
> > very clever I must admit.

> Huh? You mean you understood this

>   | s/|//g
>   | s/[A-Za-z0-9_][A-Za-z0-9_]*\.cpp/|&|/g
>   | /|/{
>   | s/^[^|]*|//
>   | s/|[^|]*$//
>   | s/|[^|]*|/\
>   | /g
>   | p
>   | }

> but you couldn't figure this out, even with the accompanying
> explanation?

> [snip]
> --
> Jim Monty

> Tempe, Arizona USA

Sorry, after posting I realized I could be bad understood.

I just wanted to say I don't know perl :-) so I did not take time
to read the proposed solution.

Of course, it's much simpler than the sed script ! and after
reading it, without knowing too much about perl, I must
say I understand it. But my primary goal was to use sed and
only sed (yes, it's strange, but I am a human being).

Sorry again,
Cedric.



Mon, 29 Sep 2003 22:55:16 GMT  
 sed - extracting "*.cpp" from a file

Quote:


>> # test if a cpp filename was found:
>> /|/ ! d
>Hi, it does not work on my host (the /|/ ! d is not recognized and I

Possibly your sed does not like unnecessary spaces, so try is as
/|/!d
and you should find that works okay.
--
John Savage            (for email, replace "ks" with "k" and delete "n")


Tue, 07 Oct 2003 17:48:30 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. Looking for MS-DOS "sed"

2. URGENT: "Internal error: tpsbt.cpp"

3. Failure "undo.cpp"

4. Error#3:"memory.cpp"line 563 Labview version 6.1

5. Failure : "load.cpp", line 4081

6. How to solve the failure: "menu.cpp", line579

7. Failure : "resource.cpp", line 2477

8. Error #3 "memory.cpp"

9. Error: "memory.cpp in line 876"

10. Failure : "datatype.cpp", line 4200

11. Failure "osupport.cpp" line 4163

12. Failure: "image.cpp", error 11602

 

 
Powered by phpBB® Forum Software