This might be a sed question 
Author Message
 This might be a sed question

Hello,

I have a text file in which italics, subscripts, and superscripts are done
like this respectively:

{iText}
{-2}
{+2}

I need to convert these to html tags, e.g.

<i>Text</i>
<sub>2</sub>
<sup>2</sup>

This appears to be a simple sed substitution problem except that the
substitution for the trailing } differs in each case.  Is there a way to say
for example "search for a {i, replace it with <i>, and replace the next }
with </i>?"

Thanks,

Rich Lent

--
Richard A. Lent
Harvard University, Harvard Forest
P. O. Box 68, Petersham, MA  01366 USA
978-724-3302 extension 242 (voice)
978-724-3595 (fax)

http://www.*-*-*.com/



Sat, 23 Dec 2000 03:00:00 GMT  
 This might be a sed question


Quote:
>{iText}
>{-2}
>{+2}

>I need to convert these to html tags, e.g.

><i>Text</i>
><sub>2</sub>
><sup>2</sup>

s:{i\(.*\)}:<i>\1</i>:
s:{-\(.*\)}:<sub>\1</sub>:
s:{+\(.*\)}:<sup>\1</sup>:

Assuming your sed accepts substitute delimiters other than slash and
that greedy matching is good enough and you have no pathological data.
If not, the following makes fewer assumptions:
s/{i}//;s/{\([^}]*\)/<i>\1<\/i>/
--
Walter Briscoe



Sat, 23 Dec 2000 03:00:00 GMT  
 This might be a sed question
The following awk script does what you need.  E-mail me if you want
any clarification on how it works.  Note especially that in the third
line, a backslash is required in front of each "+" because "+" is a
special character in a regular expression.

/{i/  {sub(/{i/, "<i>"); sub(/}/, "</i>"); print $0}
/{-/  {sub(/{-/, "<sub>"); sub(/}/, "</sub>"); print $0}
/{\+/  {sub(/{\+/, "<sup>"); sub(/}/, "</sup>"); print $0}


Quote:
>>Hello,

>>I have a text file in which italics, subscripts, and superscripts are done
>>like this respectively:

>>{iText}
>>{-2}
>>{+2}

>>I need to convert these to html tags, e.g.

>><i>Text</i>
>><sub>2</sub>
>><sup>2</sup>

>>This appears to be a simple sed substitution problem except that the
>>substitution for the trailing } differs in each case.  Is there a way to say
>>for example "search for a {i, replace it with <i>, and replace the next }
>>with </i>?"

>>Thanks,

>>Rich Lent

--
Greg

http://www.mastnet.net/~jupiter


Sun, 24 Dec 2000 03:00:00 GMT  
 This might be a sed question


Quote:


>>{iText}
>>{-2}
>>{+2}

>>I need to convert these to html tags, e.g.

>><i>Text</i>
>><sub>2</sub>
>><sup>2</sup>

>s:{i\(.*\)}:<i>\1</i>:
>s:{-\(.*\)}:<sub>\1</sub>:
>s:{+\(.*\)}:<sup>\1</sup>:

>Assuming your sed accepts substitute delimiters other than slash and
>that greedy matching is good enough and you have no pathological data.
>If not, the following makes fewer assumptions:
>s/{i}//;s/{\([^}]*\)/<i>\1<\/i>/

I'd suggest using the g (global flag) as there might be more than
one tag per line.

s:{i\(.*\)}:<i>\1</i>:g
s:{-\(.*\)}:<sub>\1</sub>:g
s:{+\(.*\)}:<sup>\1</sup>:g

If your tags span more than one line, you will have different
problems, as sed is primarily line oriented.   This is a very real
scenario, especially with the italic tag.

Best to check after you've finished that you've got all the
tags converted.

grep '[{}]' infile

will show all lines that remain with curly braces.

I just thought I'd mention a possible/likely problem, so you can plan
for it.  :-)

Might as well solve that problem with awk.  It could probably be done
with sed , but it would no doubt be messy, and not very clear what's
going on.

One can do this problem better using gawk, using } as the input record
separator, "" as the output record separator, and the sub function.

Since there will be only one substitution per record, and that will
be of a type identified by the pattern, appending the line should be
trivial.

Here's a sample gawkscript:

# beginning of gawkscript
BEGIN{RS="}";ORS=""}
/{i/ {sub(/{i/,"<i>"); print $0 "</i>" ;next}
/{-/ {sub(/{-/,"<sub>"); print $0 "</sub>" ;next}
/{\+/ {sub(/{\+/,"<sup>"); print $0 "</sup>" ;next}
!/{/ {print}
# end of gawkscript

FWIW, it converts this file:

{iText} {-2} {+2} {iText
that continues on more
than two lines} {-2} {+2}
just plain text
{iText} {-2} {+2} {iText} {-2} {+2}
more plain text

to give this:

<i>Text</i> <sub>2</sub> <sup>2</sup> <i>Text
that continues on more
than two lines</i> <sub>2</sub> <sup>2</sup>
just plain text
<i>Text</i> <sub>2</sub> <sup>2</sup> <i>Text</i> <sub>2</sub> <sup>2</sup>
more plain text

which as you may notice, handles multiple tags and tags that are split
across line boundaries.  :-)

I think this is more like what you'll really need.  :-)

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Sun, 24 Dec 2000 03:00:00 GMT  
 This might be a sed question

Quote:

>Hello,

>I have a text file in which italics, subscripts, and superscripts are done
>like this respectively:

>{iText}
>{-2}
>{+2}

>I need to convert these to html tags, e.g.

><i>Text</i>
><sub>2</sub>
><sup>2</sup>

>This appears to be a simple sed substitution problem except that the
>substitution for the trailing } differs in each case.  Is there a way to
say
>for example "search for a {i, replace it with <i>, and replace the next }
>with </i>?"

This is not as simple as it appears to be, atleast not
with sed. For the restrained case where there is only
one 'substitution' per line and the text between the
markers don't span lines it really is easy. The following
command will perform it:

    sed 's/{i\(.*\)}/<i>\1<\/i>/;s/{-\(.*\)}/<sub>\1<\/sub>/' foo.txt

To handle cases with many format tags per line it
becomes really hary and I'm not even sure if it is possible
to accomplish in a one pass sed operation. I really think
awk is much more suitable for this problem.



Sun, 24 Dec 2000 03:00:00 GMT  
 This might be a sed question

Quote:
> I have a text file in which italics, subscripts, and superscripts are done
> like this respectively:

> {iText}
> {-2}
> {+2}

> I need to convert these to html tags, e.g.

> <i>Text</i>
> <sub>2</sub>
> <sup>2</sup>

Can they be nested? Can they spread over multiple lines?
Is there some escape mechanism for {'s or }'s within?

If no to all of those, it's easy with sed:

s={i\([^}]*\)}=<i>\1</i>=g
s={-\([^}]*\)}=<sub>\1</sub>=g
s={+\([^}]*\)}=<sup>\1</sup>=g

If you want to handle nested but within one line only you
could try something like this:

:loop
s={i\([^}]*\)}=<i>\1</i>=g
s={-\([^}]*\)}=<sub>\1</sub>=g
tloop
s={+\([^}]*\)}=<sup>\1</sup>=g
tloop

If {}'s preceded with, say, \ should be ignored, you could begin
by changing them into something that is unlikely to occur in
the line and change them back at the end.

If you need to handle multiple lines it becomes rather hard to do
with sed; awk would probably be easier but still somewhat messy.
If you need that, ask and I or someone may help there.

--
Tapani Tarvainen



Sun, 24 Dec 2000 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Newbie awk (sed??) question, regular expressions

2. Awk/Sed Filehandler question

3. A very simple question on SED or AWK for a GURU, and an enjoyable problem

4. How to do this.. (SED question)

5. A question about sed

6. a sed question

7. SED question

8. awk or sed: basic? question

9. Sed Question

10. 2 questions from book sed and awk programming

11. Urgent VI/SED question

12. Stupid question...Is sed in gawk??

 

 
Powered by phpBB® Forum Software