help a neophyte? 
Author Message
 help a neophyte?

Executive Summary:

I've been using grep and sed to do this

input line                                output line      
--------------                            --------------
<pattern1> <pattern1>                     <pattern1>
xxxx yyy zzzzz
aaa<pattern2>bbb <pattern3>c<pattern4>    <pattern2>

I'd like to use Perl to instead do this

input line                                output line      
--------------                            --------------
<pattern1> <pattern1>                    <pattern1> <pattern1>
xxxx yyy zzzzz
aa<pattern2>bb <pattern3>c<pattern4>     <pattern2> <pattern3> <pat.4>

I have what seems a simple problem, one which I thought would be the
motivation to finally learn proper Perl and give up my jury-rigging in
sed, awk, and the C-shell. The problem is thus: multi-megabytes of
files in various ASCII mark-up formats (SGML, Framemaker MIF, etc.)
have been translated into Japanese; I want to check that certain
strings, such as mark-up tags and some English terms, remain unchanged.
My simple-minded approach has been to use grep to find lines that
contain a match of a regular expression, then use sed to substitute
the match for the input line, and finally use diff to compare the
output from original and translated files. This works, but ignores any
match after the first on the line.

I now need to search for multiple matches per line. I suppose I could
make multiple passes with sed, replacing the input line with
"line-before-match newline match newline line-after-match" until every
match was on a line by itself, but there must be a less brutal way (and
I'd rather preserve the line structure of the matches as transparently
as possible). Given that awk has no "memory operator" like sed's \(
\) and depends heavily on field delimiters, which are hard to define in
Japanese text since itlookslikethis (words are not normally separated
by white space, even if one of them happens to be English), I turned to
Perl.

While I appreciate the llama's gentle disposition and the camel's
"chock-full-o'-goodies-ness," I've found the only way to really digest,
let alone assimilate, any of Perl is to go through the "Gory Details"
and "Functions" one morsel at a time, trying them in little programs of
mine own intention (though usually adapted from an example found in
LEARNING PERL) -- trying again and again until something works, and
something clicks.

In index I thought I'd found the function I needed. Given a loop that
reassigns the value of POSITION when a match occurs, it scoots straight
through the input line and finds all matches.  Groovy, but not
terribly useful for my purposes I realized after a bit of reflection,
since index only works for static strings, not regular expressions, and
returns the position of the match, not the match itself. Next I tried
looping as long as $_ matches an expression, using the concatenation of
$` and $& to delete everything up to the end of the match from the line
at each iteration. Works fine as long as neither $` nor $& contain any
of Perl's rich set of metacharacters. Another loop to check each
character and escape it if necessary?

Thinking there must be a simpler way, I've gone back to trying and
trying and trying. Yet as much as I like solving my own problems, I
must admit I am confused by Perl's operatic expansiveness. Is there a
kind soul who could suggest where this neophyte might look in Perl
(4.035) to find a simple way to accomplish such modest aims?

And if anybody wants to point out how slow I am for not seeing
straightaway how to solve this problem in sed or awk, my mail box
awaits your superior erudition.

David Thompson

Alone in the press of people traveled he,  |       David S. Thompson



Sat, 03 May 1997 01:22:40 GMT  
 help a neophyte?

Quote:

>Executive Summary:

>I've been using grep and sed to do this

>input line                                output line      
>--------------                            --------------
><pattern1> <pattern1>                     <pattern1>
>xxxx yyy zzzzz
>aaa<pattern2>bbb <pattern3>c<pattern4>    <pattern2>

>I'd like to use Perl to instead do this

>input line                                output line      
>--------------                            --------------
><pattern1> <pattern1>                    <pattern1> <pattern1>
>xxxx yyy zzzzz
>aa<pattern2>bb <pattern3>c<pattern4>     <pattern2> <pattern3> <pat.4>

[deletia]

If I understand your problem right, the following should work:

    while ( <INPUT> ) {
      chop;


    }

You can use any regex instead of the fixed-string patterns above.

Anno



Sat, 03 May 1997 19:30:48 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Perl neophyte needs help!!

2. Perl Neophyte Needs Help

3. Help for a perl neophyte

4. Help for a PERL neophyte

5. Neophyte silly question: last if /^--$/;

6. Pass by ref - neophyte question

7. Neophyte Question

8. Neophyte cannot compile Perl for Solaris 2.4

9. Neophytes alert: Nested foreach dangers

10. Neophyte: setting variable from separate file

11. Perl Neophyte has a Question

12. HELP HELP HELP HELP HELP

 

 
Powered by phpBB® Forum Software