using substr inside regexp for substitutions 
Author Message
 using substr inside regexp for substitutions

I am just learning Perl and am having trouble trying to do the following
(I have checked the FAQ lists and a Perl reference without success).

I would like to take an expression like

<a href="abcdefghijk.htm">

and truncate the htm file name to just 8 characters to get

<a href="abcdefgh.htm">. I want to do something like

s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;

but this substitutes the entire expression substr(\1,1,8) rather than
evaluating the substr function and substituting the result. The
substitution also needs to work on expressions like

<a href=" http://www.*-*-*.com/ ;>

and just truncate the abcdefghijk part to 8 letters. The method above
doesn't work on this because it picks up everything from the first
quotemark, not just the abcdefghijk.htm part. Any help would be
appreciated. Please respond by email also if you send a response back to
the newsgroup. Thank you.


--

Mathematics Dept    | 404-638-6222, 404-638-6177 (fax)
Agnes Scott College | http://www.*-*-*.com/
Decatur, GA 30030   |         math/welcome.htm



Fri, 14 Jan 2000 03:00:00 GMT  
 using substr inside regexp for substitutions


 > I am just learning Perl and am having trouble trying to do the following
 > (I have checked the FAQ lists and a Perl reference without success).
 >
 > I would like to take an expression like
 >
 > <a href="abcdefghijk.htm">
 >
 > and truncate the htm file name to just 8 characters to get
 >
 > <a href="abcdefgh.htm">. I want to do something like
 >
 > s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;
 >
 > but this substitutes the entire expression substr(\1,1,8) rather than
 > evaluating the substr function and substituting the result. The
 > substitution also needs to work on expressions like
 >
 > <a href="http://www.xxx.yyy/zzz/abcdefghijk.htm">
 >
 > and just truncate the abcdefghijk part to 8 letters. The method above
 > doesn't work on this because it picks up everything from the first
 > quotemark, not just the abcdefghijk.htm part. Any help would be
 > appreciated. Please respond by email also if you send a response back to
 > the newsgroup. Thank you.

Here're some possibilities:

    s/(")(.+?)(\.htm\b)/$1 . substr($2,1,8) . $3/eg;

  or,


But, avoiding the eval would be even faster:

   s/"(.{1,8}).*?(\.htm\b)/"$1$2$3/g;

HTH,
--
Charles DeRykus



Fri, 14 Jan 2000 03:00:00 GMT  
 using substr inside regexp for substitutions

[ Posted and mailed. ]

Quote:

> I would like to take an expression like
>       <a href="abcdefghijk.htm">
> and truncate the htm file name to just 8 characters to get
>       <a href="abcdefgh.htm">.
> The substitution also needs to work on expressions like
>       <a href="http://www.xxx.yyy/zzz/abcdefghijk.htm">
> and just truncate the abcdefghijk part to 8 letters.

        $tag =~ s%\"(.*/)([^.]{1,8})[^.]*\.htm\b%\"$1$2.htm%g;

Just match only the first eight characters of the file name and pull them
off seperately, discarding the rest of the filename to the period for the
extension.  Far easier than trying to use substr().

Quote:
> I want to do something like
>       s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;

If you really want to try something like this, you'll need to use the /e
modifier on your s/// expression.  See man perlop for the details.

--
#!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker





Fri, 14 Jan 2000 03:00:00 GMT  
 using substr inside regexp for substitutions

Quote:

> <a href="abcdefgh.htm">. I want to do something like

> s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;

Try again, but use the /e modifier and make the replacement an expression
which returns the replacement string. (And you don't need to backwhack
quote marks inside a regular expression or the quote or dot in the
replacement. And you don't want to use \1 on the replacement side. The
docs have a good explanation.)

Hope this helps!

--
Tom Phoenix           http://www.teleport.com/~rootbeer/

Randal Schwartz Case:  http://www.rahul.net/jeffrey/ovs/



Fri, 14 Jan 2000 03:00:00 GMT  
 using substr inside regexp for substitutions

: I am just learning Perl and am having trouble trying to do the following
       ^^^^^^^^^^^^^^^^^^

We could already tell that because you get -w warnings from your
code, and all experienced Perl programmers use the -w switch (hint #1)  ;-)

: (I have checked the FAQ lists and a Perl reference without success).

Excellent!

We appreciate that.

: I would like to take an expression like
: <a href="abcdefghijk.htm">
: and truncate the htm file name to just 8 characters to get
: <a href="abcdefgh.htm">. I want to do something like

OK, but be warned that if you have two such URLs that are not
unique in the first eight chars, you may get unexpected results...

: s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;
    ^             ^               ^

double quotes are not special in regex or replacement part, so
you don't need to escape them. Dot is not special in the replacement
part either.

: s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;
                           ^^

-w switch would warn you that it is not done this way anymore (though
it still works). Nowadays you use $1 in the replacement part. Use \1
only in the regex part.

: s/\"(.+)\.htm\b/\"substr(\1,1,8)\.htm/g;
                              ^

String indices start with zero, not one.

: but this substitutes the entire expression substr(\1,1,8) rather than
: evaluating the substr function and substituting the result. The
  ^^^^^^^^^^

'cause you didn't tell it to evaluate it. If you don't tell it to, perl
will only do interpolation.

You use s///e to tell perl to evaluate the replacement part.

: substitution also needs to work on expressions like
                                     ^^^^^^^^^^^

[ it would be less confusing if you called it a string rather than
  an expression
]

: <a href="http://www.xxx.yyy/zzz/abcdefghijk.htm">

: and just truncate the abcdefghijk part to 8 letters. The method above
: doesn't work on this because it picks up everything from the first
: quotemark, not just the abcdefghijk.htm part. Any help would be

So write your expression so that it does NOT pick up everything
from the first quotemark  ;-)

: appreciated. Please respond by email also if you send a response back to
: the newsgroup. Thank you.

This might do it:

-------------------------
#! /usr/bin/perl -w

$_ = '<a href="abcdefghijk.htm">';
s/"(.+)\.htm\b/'"' . substr($1,0,8) . ".htm"/eg;
print "'$_'\n";

# a different URL, first 8 chars are the same. Is that gonna be a problem?

$_ = '<a href="abcdefghijklmnopqrstuvwxyz.htm">';
s!([^/"]+)\.htm\b!substr($1,0,8) . ".htm"!eg;
print "'$_'\n";

$_ = '<a href="http://www.xxx.yyy/zzz/abcdefghijk.htm">';

s!([^/"]+)                      # any chars except slash and double quote
  \.                            # a literal dot/full stop/period
  htm                           # three literal chars
  \b                            # a word boundary, could look for the
                                #    closing double quote instead
 !substr($1,0,8)                # keep the first 8 chars
  .                             # concatenated with
  ".htm"                        # these four literal chars
 !egx;                          # e => Evaluate the replacement part
                                # g => global (ie. more than one on a line)
                                # x => lets me put these comments here

print "'$_'\n";
-------------------------

--
    Tad McClellan                          SGML Consulting
    Tag And Document Consulting            Perl programming



Fri, 14 Jan 2000 03:00:00 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. regexp in substr?

2. How to invoke a function inside regular expression substitution

3. Substitution inside array

4. STRING SUBSTITUTION INSIDE A TEXT FILE

5. difference in substr($_,34,35) vs substr($_,034,35)

6. Regexp inside a split

7. regexp match inside a time interval ?

8. regexp: comma-delimited (csv) with "" inside

9. Equation inside regexp quantifier

10. using substr within file?

11. mutli-line substitution regexp

12. characters in substitution in regexp

 

 
Powered by phpBB® Forum Software