System Bug, Or Am I Really Dense? 
Author Message
 System Bug, Or Am I Really Dense?

Last week, I wrote in high distress to this forum when a series of
substitution commands that I had written to a sed file kept sending up a
message to the effect that the commands were incoherent, evey time I tried
to run it as a program. I later found a way round what was bugging me, but
in the meantime two forum members had managed to run my original program on
their systems without a hitch.

Today, I find myself again in the same situation. I have been working
through the exercises in my O'Reilly, and something that it seems to me
really should work just won't. I'm beginning to think that there may be some
obscure bug in the system or sed version that we use here at the office (a
SCO5 Unixware). Or maybe I'm missing something really obvious.

I was trying to practice the use of the "&" (ampersand) in sed. It says in
the book that, used in a remplacement string, it copies the pattern or
address string at the place indicated. So, as an example, I typed in the
familiar jingle about the weather, like this:

Wh the w be hot
Or wh the w be cold,
We'll w the w whatever the w
Wh we like it or not.

The sed program is intended to substitute all the wh's with "whether", and
all the w's with "weather". It looks like this:

s/ *[Ww]h +/ &ether /g
s/ *[Ww]h$/ &ether/g
s/ *[Ww] +/ &eather /g
s/ *[Ww]$/ &eather/g

My own reading of the first line, for example, would be: "Substitute any
string containing zero or any number of blank spaces, followed by a small or
capital "w", followed by an "h" and one or more blank spaces, with the same
thing plus the string "ether", and finally a blank space.

But the output effects only one substitution correctly:

Wh the w be hot
Or wh the w be cold,
We'll w the w whatever the  weather<
Wh we like it or not.

Either the computer or myself is simply refusing to see reason. Many thanks
for your suggestions as to which!

EFR



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:
> The sed program is intended to substitute all the wh's
> with "whether", and all the w's with "weather". It looks
> like this:

> s/ *[Ww]h +/ &ether /g
> s/ *[Ww]h$/ &ether/g
> s/ *[Ww] +/ &eather /g
> s/ *[Ww]$/ &eather/g

> My own reading of the first line, for example, would be:
> "Substitute any string containing zero or any number of
> blank spaces, followed by a small or capital "w", followed
> by an "h" and one or more blank spaces, with the same
> thing plus the string "ether", and finally a blank space.

You're getting caught by the fact that regular expression grammars differ
from one utility to the next (and unfortunately, from one implementation to
the next!)  Grep and sed are intended to be the same but egrep and awk each
implement their own slightly different grammars.  The "+" operator is
supported in egrep and awk but not usually in grep and sed.  What you get
instead in grep and sed (at least, in the newest versions) is a
"replication range" postfix operator, which is not supported by awk.
Usually it's defined this way:

   r\{n\}   Match exactly n occurrences of r, where n is an
            unsigned decimal integer.
   r\{n,\}  Match at least n occurrences of r.
   r\{n,m\} Match at least n, but not more than m occurrences
            of r.
   r\{,m\}  Match at most m occurrences of r.

but some implementations may expect the braces to be typed without the
backslashes.

But probably the easiest way to do mimic the "+" operator in a situation
like yours is to type the character to be matched twice, appending the "*"
operator onto the second occurrence.  You could rewrite your rules as:

s/ *[Ww]h  */ &ether /g
s/ *[Ww]h$/ &ether/g
s/ *[Ww]  */ &eather /g
s/ *[Ww]$/ &eather/g

But I suspect even that will fall short of what you intend, as this is
output you'd get:

 Wh ether the  w eather be hot
Or  wh ether the  w eather be cold,
We'll  w eather the  w eather whatever the  weather
 Wh ether we like it or not.

You'll notice you've got a lot of extra spaces that I'm sure you didn't
want.  What sed provides in addition, that could be used, is something
called tagged expressions, which are similar to the "&" operator, but a bit
more powerful:

   \(r\)    Tagged regular expression.  Match the pattern
            inside the \(...\), and remember the literal
            text that matched.
   \n       Match whatever literal text the n'th tagged
            \(...\) expression matched.

This would let you "remember" how much space appeared on either side of the
object you're matching and put exactly that amount of space back into the
output.  Using tags and deleting a couple spurious spaces in the replace
expression in lines 2 and 4, we could rewrite your script as:

s/\( *[Ww]h\)\(  *\)/\1ether\2/g
s/ *[Ww]h$/&ether/g
s/\( *[Ww]\)\(  *\)/\1eather\2/g
s/ *[Ww]$/&eather/g

giving the following result, which I think is what you intended:

Whether the weather be hot
Or whether the weather be cold,
We'll weather the weather whatever the weather
Whether we like it or not.

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?
Thank you so much, Nicki, for such a kind an informative answer. I will save
it to a file for consultation when I get round to learning the tagged
expressions (which should be soon, but I want to do things in order).
Meanwhile, I'll see if I can't modify the orginal program to give better
output. I hadn't cottoned at all that the "+" was not echt in sed!

EFR



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

>Last week, I wrote in high distress to this forum when a series of
>substitution commands that I had written to a sed file kept sending up a
>message to the effect that the commands were incoherent, evey time I tried
>to run it as a program. I later found a way round what was bugging me, but
>in the meantime two forum members had managed to run my original program on
>their systems without a hitch.

>Today, I find myself again in the same situation. I have been working
>through the exercises in my O'Reilly, and something that it seems to me
>really should work just won't. I'm beginning to think that there may be some
>obscure bug in the system or sed version that we use here at the office (a
>SCO5 Unixware). Or maybe I'm missing something really obvious.

>I was trying to practice the use of the "&" (ampersand) in sed. It says in
>the book that, used in a remplacement string, it copies the pattern or
>address string at the place indicated. So, as an example, I typed in the
>familiar jingle about the weather, like this:

>Wh the w be hot
>Or wh the w be cold,
>We'll w the w whatever the w
>Wh we like it or not.

>The sed program is intended to substitute all the wh's with "whether", and
>all the w's with "weather". It looks like this:

>s/ *[Ww]h +/ &ether /g
>s/ *[Ww]h$/ &ether/g
>s/ *[Ww] +/ &eather /g
>s/ *[Ww]$/ &eather/g

>My own reading of the first line, for example, would be: "Substitute any
>string containing zero or any number of blank spaces, followed by a small or
>capital "w", followed by an "h" and one or more blank spaces, with the same
>thing plus the string "ether", and finally a blank space.

>But the output effects only one substitution correctly:

>Wh the w be hot
>Or wh the w be cold,
>We'll w the w whatever the  weather<
>Wh we like it or not.

>Either the computer or myself is simply refusing to see reason. Many thanks
>for your suggestions as to which!

>EFR

Sed does not like the + in regular expressions, it interprets it as
merely a character, so write c+ as cc* instead.

i.e.

s/ *[Ww]h +/ &ether /g

as

s/ *[Ww]h  */ &ether /g

Also, $ in a RE stands for the end of a line, if you want the
end of a word, use \> instead.

i.e. write:

s/ *[Ww]h$/ &ether/g

as

s/ *[Ww]h\>/ &ether/g

Those lines in your sed script were not finding any matches,
so they didn't change anything.

Try making the changes I suggested one at a time, so you can see
which does what, and in what order you really want things done.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

> Also, $ in a RE stands for the end of a line, if you want
> the  end of a word, use \> instead.

The \> construct is not usually supported in sed though I suppose it might
be in some implementations.

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:


>> Also, $ in a RE stands for the end of a line, if you want
>> the  end of a word, use \> instead.

>The \> construct is not usually supported in sed though I suppose it might
>be in some implementations.

>Nicki

Seems to work on my version of unix.  Some windows versions I can't
speak for.  Maybe they use only >.

The unix ed man page specifically mentions the \< and \> in dealing
with regular expressions, so I'd think that it should be in most
versions of sed (as it's derived from ed), at least on unix and unix
like systems.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

> Seems to work on my version of unix.  Some windows
> versions I can't  speak for.  Maybe they use only >.

> The unix ed man page specifically mentions the \< and
> \> in dealing  with regular expressions, so I'd think that
> it should be in most  versions of sed (as it's derived from
> ed), at least on unix and unix  like systems.

It's entirely a question of whose implementation you're running.  For
example, on the FreeBSD system at my ISP (tiac.net), it's definitely not
supported:

% cat words
x y
x>y
% sed 's/x\>/hello/' words
x y
helloy

I can also promise that the \> construct is definitely not in the latest
POSIX standard definition for sed (or ed, for that matter.)

The problem is simply that there are quite a number of implementations
floating around and not all coders are willing to accept the limits of
POSIX compliance, particularly if it means leaving out what they consider
to be a good and useful feature that's gained some currency since the
standard was last issued.

Where you'll see that most strongly tends to be in the GNU versions of the
standard utilities (because the source is available to all) and the
definitions of what's supported in GNU tend to change rather quickly.  My
guess is that if you observe \> is supported in your sed, it's likely it
was built on the GNU code.

Another example of the GNU phenomenon is tar.  Tar is a POSIX format that
for 20 years was pretty darn stable.  Everyone knew pretty clearly what tar
meant and what the variations were (principally, just bytesex).  But in
just the last few years, tar has become quite a moving target as many new
ideas have been incorporated into the header structures to handle problems
like continuation tapes, long filenames, etc.  If someone hands you a tape
written using one of these new header variations and your tar doesn't
support them, the answer is you're SOL short of messy answers like
extracting the data by hand.

I think, on the whole, it's a good thing to see new ideas and improvements
incorporated into old tools, but it does introduce something of a caveat
emptor situation, especially for anyone who'd like to write a script (or a
tar tape or whatever) that they'd like to work the same even on another
machine.

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:


>> Seems to work on my version of unix.  Some windows
>> versions I can't  speak for.  Maybe they use only >.

>> The unix ed man page specifically mentions the \< and
>> \> in dealing  with regular expressions, so I'd think that
>> it should be in most  versions of sed (as it's derived from
>> ed), at least on unix and unix  like systems.

>It's entirely a question of whose implementation you're running.  For
>example, on the FreeBSD system at my ISP (tiac.net), it's definitely not
>supported:

>% cat words
>x y
>x>y
>% sed 's/x\>/hello/' words
>x y
>helloy

Well, telnet over to sunspot.tiac.net, which is SunOS and you'll
find it works as I described. :-)

Also, on FreeBSD, the construct uses > instead of \>, as I suggested
above in my original post.

so try

sed 's/x>/hello/' words

in the above, and you'll get the results using end of word recognition,
as I suggested.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

> Well, telnet over to sunspot.tiac.net, which is SunOS and
> you'll find it works as I described. :-)

> Also, on FreeBSD, the construct uses > instead of \>, as
> I suggested above in my original post.

> so try

> sed 's/x>/hello/' words

> in the above, and you'll get the results using end of word
> recognition, as I suggested.

Actually, I had tried that already, just out of my own curiosity, but here
it is again:

% sed 's/x>/hello/' words
x y
helloy

Still doesn't work. This was run on shell1.tiac.net.  I agree it works on
sunspot.tiac.net.

Please be assured I was never questioning that you observe the behavior you
describe (namely that \> or > works for you).  Nor am I offering judgment
on the value or usefulness or even the breadth of support for this
construct, merely pointing out that the support is not universal and not
(currently) in the POSIX standard.

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

> ... or even the breadth of support for this construct, merely
> pointing out that the support is not universal and not
> (currently) in the POSIX standard.

Sloppy phrasing.  My original remark was that the \> construct is not
usually supported in sed, which certainly does sound like a comment about
the breadth of support.  What I'm really trying to say is, I don't know how
broad the support is.  I know it's not in the POSIX standard, I know it's
not supported everywhere, I know you won't find it on older UNIX systems.
I'm not trying to be picky about what percentage of seds support this
construct so much as to warn that it's one of those features that should be
verified as supported before one assumes it can be used.

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

> You're getting caught by the fact that regular expression grammars differ
> from one utility to the next (and unfortunately, from one implementation to
> the next!)  Grep and sed are intended to be the same but egrep and awk each
> implement their own slightly different grammars.  The "+" operator is
> supported in egrep and awk but not usually in grep and sed.  What you get
> instead in grep and sed (at least, in the newest versions) is a
> "replication range" postfix operator, which is not supported by awk.

Let me add that GNU awk does have the "replication range" operator, if
it (gawk) is called with the --posix or --re-interval options.

And GNU sed just like GNU grep supports the "+" operator, but it has to
be escaped with a backslash ("\+").

But it's interesting to hear that some sed implementations don't have
it. Hopefully I'll remember it the next time I come across that.

Regards...
                Michael



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:


>> Well, telnet over to sunspot.tiac.net, which is SunOS and
>> you'll find it works as I described. :-)

>> Also, on FreeBSD, the construct uses > instead of \>, as
>> I suggested above in my original post.

>> so try

>> sed 's/x>/hello/' words

>> in the above, and you'll get the results using end of word
>> recognition, as I suggested.

>Actually, I had tried that already, just out of my own curiosity, but here
>it is again:

>% sed 's/x>/hello/' words
>x y
>helloy

>Still doesn't work. This was run on shell1.tiac.net.  I agree it works on
>sunspot.tiac.net.

>Please be assured I was never questioning that you observe the behavior you
>describe (namely that \> or > works for you).  Nor am I offering judgment
>on the value or usefulness or even the breadth of support for this
>construct, merely pointing out that the support is not universal and not
>(currently) in the POSIX standard.

>Nicki

From the man page for ed on shell1:

|
|     \<      Anchor the single character regular expression or subexpression
|            immediately following it to the beginning of a word.  (This may
|            not be available)
|
|     \>      Anchor the single character regular expression or subexpression
|            immediately following it to the end of a word.  (This may not be
|            available)
|

If the POSIX version of sed doesn't have a method of easily identifying
word boundaries, that is a failing of that version of sed.  YMMV

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:

> From the man page for ed on shell1:

Obviously, the system administrator for shell1 has installed a set of man
files that don't match the binaries he's got there.  I presume you easily
verified that the sed actually running there does not support the \>
construct.

Quote:
> If the POSIX version of sed doesn't have a method of
> easily identifying  word boundaries, that is a failing of
> that version of sed.  YMMV

There's no question that \> is useful but a clarification, perhaps:  POSIX
is an international standards committee.  There is no such thing a POSIX
version of sed, per se, merely a definition that's agreed upon through a
democratic process that involves representatives from various vendors, user
groups and government organizations.  Anyone interested can order a copy of
the standard in this country through the IEEE (Institute of Electrical and
Electronics Engineers).  It's a multi-volume set of books that, I dunno, I
think cost me about $150.

Perhaps the biggest reason why many vendors consider POSIX persuasive (even
if it means "dumbing down" their products) is because the Federal
Government has mandated POSIX compliance in certain procurement situations.

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:
> Obviously, the system administrator for shell1 has installed
> a set of man files that don't match the binaries he's got there.

Again, I'm getting sloppy.  One can easily take the position the man files
=do= match what's there given that the man page says "this may not be
available."  It's saying what I've already said, which is that you need to
check because \> is not supported everywhere.  What's unusual is that most
people expect man pages to explain how a utility works, not say, "We don't
know, you need to check." :)

Nicki



Mon, 22 Apr 2002 03:00:00 GMT  
 System Bug, Or Am I Really Dense?

Quote:


>> From the man page for ed on shell1:

>Obviously, the system administrator for shell1 has installed a set of man
>files that don't match the binaries he's got there.  I presume you easily
>verified that the sed actually running there does not support the \>
>construct.

Actually, I though I had tested it before my second post, but when I
went back and tested it again, it no longer seemed to work. :~(

Since I use SunOS on my tiac and shore shell accounts and
Linux on my freeshell.org shell account, and they all use
the \< \> constructs, I don't think I'll have a problem, but
I'll keep it in mind.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 22 Apr 2002 03:00:00 GMT  
 
 [ 34 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. Okay I'm really dense but....

2. MSWLGO32 bug is really bugging me!

3. I am really confused!!!!!

4. I am really sorry

5. a question about a feature of really old cobol systems

6. I am writing a operating system

7. Computer Systems Support Analyst III - Dallas, TX 7/2/98 10:43:35 AM

8. CW2.0 - bug that is REALLY annoying

9. popen with Threads lock bug? Really unusual

10. *now* I have found the bug in cgi.rb, really :)

11. A potentially *really* obscure WATCOM bug.

12. Really wierd bug

 

 
Powered by phpBB® Forum Software