Multiline record - how to detect? 
Author Message
 Multiline record - how to detect?

I have a normal text-file. I wish to detect the text between
two (') marks, and use it as one record.

Text example:

bla-bla-.....'this_would_be_one_record...
.........
...here_ends_the_record'....bla-bla..

How can I do this in awk?
Can I use another mark (e.g. backtick?)

System: gawk 3.0.5

--
Mihaly Gyulai

http://www.*-*-*.com/

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Mon, 27 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?

Quote:

> I have a normal text-file. I wish to detect the text between
> two (') marks, and use it as one record.

BEGIN{
  RS="'"

Quote:
}
> How can I do this in awk?
> Can I use another mark (e.g. backtick?)

Can you guess?  
Look for RS and related predefined variables in the manual.

--
  All true believers shall break their eggs at the convenient end.



Mon, 27 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?


Quote:
>> I have a normal text-file. I wish to detect the text between
>> two (') marks, and use it as one record.
> BEGIN{ RS="'" }

My problem is: how can I dynamically change the 'RS' variable?

There is a normal text (actually a book). I'm developing an awk
script to transform this text into HTML code.

So I need to handle every line in the text (RS="\n"). However,
when there is a (') in a line, the RS should change to RS="'",
until the next ('). Then it should be swithed back to RS="\n".

How can I do this?

--
Mihaly Gyulai

http://www.freeyellow.com/members5/gyulai/

Sent via Deja.com http://www.deja.com/
Before you buy.



Tue, 28 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?

Quote:

> There is a normal text (actually a book). I'm developing an awk
> script to transform this text into HTML code.
> So I need to handle every line in the text (RS="\n"). However,
> when there is a (') in a line, the RS should change to RS="'",
> until the next ('). Then it should be swithed back to RS="\n".

If you care to explain better the problem maybe we could help.
What about using split() with "'", or even gsub(/'/,"\n"),
it depends on what you intend to achieve.

--
  All true believers shall break their eggs at the convenient end.



Tue, 28 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?


Quote:
> If you care to explain better the problem maybe we could help.

Input text example:

<newline>
bla...bla... 'text_to_be_italized...
... some text ...
....end_of_text'...bla...bla...
<newline>

Desired output:

&nbsp
bla...bla... <I>text_to_be_italized... until ...
end_of_text</I>...bla...bla... <BR>
&nbsp

I already can handle the newlines and normal text lines, but I also
need to convert text into italics.

Maybe related problem: I wish to do the following conversion also.

Input text example:

This is a link: http://freshmeat.net, have a look!

Desired output:

<a href="http://freshmeat.net">This is a link: http://freshmeat.net,
have a look!</a> <br>

--
Mihaly Gyulai

http://www.freeyellow.com/members5/gyulai/

Sent via Deja.com http://www.deja.com/
Before you buy.



Tue, 28 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?

Quote:

> ...
> Input text example:
>   <newline>
>   bla...bla... 'text_to_be_italized...
>   ... some text ...
>   ....end_of_text'...bla...bla...
>   <newline>
> Desired output:
>   &nbsp
>   bla...bla... <I>text_to_be_italized... until ...
>   end_of_text</I>...bla...bla... <BR>
>   &nbsp

Isn't it much simpler just to:

  BEGIN{
    initalics=0
  }
  ...
  {
    while($0 ~ /'/){    
      sub(/'/,initalics?"</i>":"<i>")
      initalics=!initalics
    }
    print $0 "<br>"
  }

... which also handles the case of italic/non italic text
on the same line:  
  aaaa ' bbbb-italics ' cccc ' dddd-italics...

... and perhaps you should consider that some ' in the original
text might be true ' and not italizing-marks depending whether
spaces before or after etc?

Quote:
> Maybe related problem: I wish to do the following conversion also.
> Input text example:
>   This is a link: http://freshmeat.net, have a look!
> Desired output:
>   <a href="http://freshmeat.net">This is a link: http://freshmeat.net,
>   have a look!</a> <br>

You are lucky, this one is easy using gsub() and & place holder:

  gsub(/http:\/\/[\/\-._a-zA-Z0-9]+/,"<a href=\"&\">&</a>")

that is, match an "url" and tag it adequately, where an "url" might
be defined as the string "http://" followed by a series of one or
more chars in the set "/-._", or alphabetic, or digit (a definition
which might not be accurate enough for your case if Hungarian characters
or other punctuation is used as well).
You also might want to consider "mailto:", "ftp:", or even html's
<img src="...">

The problem is *interesting*.  Jo programot kivanok!

--
  All true believers shall break their eggs at the convenient end.



Tue, 28 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?


Quote:
> Isn't it much simpler just to:

>   BEGIN{
>     initalics=0
>   }
>   ...
>   {
>     while($0 ~ /'/){
>       sub(/'/,initalics?"</i>":"<i>")
>       initalics=!initalics
>     }
>     print $0 "<br>"
>   }

I'm confused with these '{'... (before the 'while', and after the
'print').

Anyway, it gives me a big fat 'parse error' at 'while'!

(I have an END block also, where to put it?? Just after the 'print'
in the 'while' loop, or after the last '}' ?).

Quote:
> ... and perhaps you should consider that some ' in the original
> text might be true ' and not italizing-marks depending whether
> spaces before or after etc?

Maybe. But I intend to use this code for Hungarian texts, and they
don't contain those (') marks.

(this mark can be 'backtick', but I can't type this mark now... :-(  ).
(editor=joe)

Quote:
>> Input text example:
>>   This is a link: http://freshmeat.net, have a look!
>> Desired output:
>>   <a href="http://freshmeat.net">This is a link: http://freshmeat.net,
>>   have a look!</a> <br>
> You are lucky, this one is easy using gsub() and & place holder:

>   gsub(/http:\/\/[\/\-._a-zA-Z0-9]+/,"<a href=\"&\">&</a>")

Where to put the 'print' command?? Please, I'm still a newbie to awk!
This 'gsub' gave me an output to the screen... I need to redirect the
output to a file.

Quote:
> "http://" followed by a series of one or more chars in the set
> "/-._", or alphabetic, or digit (a definition which might not be
> accurate enough for your case if Hungarian characters or other
> punctuation is used as well).

As I know, in URLs the English alphabet can be used only, and _no_
accented characters. So, I don't need to check Hungarian chars...

Quote:
> You also might want to consider "mailto:", "ftp:", or even html's
> <img src="...">

Yes, of course! (I just didn't want to ask too much...)

Quote:
> The problem is *interesting*.  Jo programot kivanok!

Muchas gracias!  :)

Can we switch to Hungarian??? :-D

--
Mihaly Gyulai

http://www.freeyellow.com/members5/gyulai/

Sent via Deja.com http://www.deja.com/
Before you buy.



Fri, 31 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?

Quote:


> > ...

> I'm confused with these '{'... (before the 'while', and
> after the 'print').

You seen to be a realy newbie to awk.
An awk program consists in a series of pairs of:

  pattern { action }

A 'pattern' can be any 'conditional expression', including
the most habitual /regexp/, which in fact is a shorthand
for /regexp/ ~ $0, and the special cases BEGIN or END.
If no 'pattern' is specified, the default pattern is
always true, that is, its action gets always executed.

An 'action' is a series of 'statements' enclosed between
{ } that are executed for each input line that matches
its pattern.
If no 'action' is specified, the default action is 'print',
that is, output the same input record.

So the code: /regexp/ emulates unix 'grep', or 'print to
output every input record that matches regexp'.
And the code: { print } emulates unix 'cat', or 'print to
output all input records'.
So the code:
  {
    ....whatever
  }
does execute ...whatever for each input line, as the
condition is always true.
The BEGIN and END pattern actions may appear anywhere
in the awk code, they will be executed when adequate:
BEGIN before reading any input, and END after having
read all input.

Quote:
> > You are lucky, this one is easy using gsub() and & place holder:
> >   gsub(/http:\/\/[\/\-._a-zA-Z0-9]+/,"<a href=\"&\">&</a>")

> Where to put the 'print' command?? Please, I'm still a newbie to awk!
> This 'gsub' gave me an output to the screen... I need to redirect the
> output to a file.

To replace something in $0 (input record), and output
it once altered:

  {
    gsub(/.../,"...")
    print
  }

To output to a file you can:

. inside awk code you can specify destination file:
  print ...whatever > "filename"

. or outside, you can redirect from the shell:
  gawk -fprogram.awk < inputfile > outputfile

Quote:
> > The problem is *interesting*.  Jo programot kivanok!
> Muchas gracias!  :)
> Can we switch to Hungarian??? :-D

I'm afraid not, igazi, nem ertem.
Btw, in Catalan 'ko"szo"no"m szepen' is "moltes gracies",
that above is Spanish.

--
  All true believers shall break their eggs at the convenient end.



Fri, 31 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?

Quote:


> > ...

> I tried to manipulate the code around 'while', but it still
> does not work. Here it is, what's wrong?

>   while ($0 ~ /'/) {
>     sub(/'/,initalics?"</I>":"<I>") initalics=!initalics
>     print $0, "<br>" >> fn
>   }

> The code is in the body, and gives me a 'parse error' at 'while'.
> Why?

'while...' is not a valid pattern, 'while...' are statements.  
Statements of 'action' parts MUST be inside { ... }, so the
whole action block gets executed IF its pattern is true (or
if there is no 'pattern', that is 'always').
Enclose the statements block between { ... }

Also, there must be a line break missing between the sub(...)
and the initalics=... assignment.

Quote:
> >>>   gsub(/http:\/\/[\/\-._a-zA-Z0-9]+/,"<a href=\"&\">&</a>")
> > To replace something in $0 (input record), and output
> > it once altered:

> >   {
> >     gsub(/.../,"...")
> >     print
> >   }

> The 'print' seems to work, however its output is not perfect. The
> link is just the '&' mark, and not the desired 'http://...'.

Your awk interpret does not recognize '&' placeholder.
Get a newer awk, or gawk.

Quote:
> > > Can we switch to Hungarian??? :-D
> > I'm afraid not, igazi, nem ertem.
> Ah,... very good! Did you study Hungarian? :)

I wouldn't word it that way.

Quote:
> I'm very, very sorry to mix Catalan and Spanish!
> Is Catalan very different from Spanish or is it rather
> a question of to be independent?

They are not as much different as Hungarian is.
The second part of your sentence is a silly question
which I decide to ignore.

Quote:
> PS. are there any NG for such language topics?
> I don't want to be OT here...

We are OT since the "Can we switch to Hungarian?" line above.  
These topics could be handled in sci.lang or in the respective
soc.culture.catalan or soc.culture.magyar groups.

--
  All true believers shall break their eggs at the convenient end.



Fri, 31 Jan 2003 03:00:00 GMT  
 Multiline record - how to detect?


% Here is a detail of the code:

% {
% while ($0 ~ /'/) {
%         sub(/'/,initalics?"</I>":"<I>"); initalics=!initalics
%         print $0, "<br>" >> fn
%         }
% }
%
% It works, but gives 3 lines of output, which only partially good!
%
% Actual bad output:
%
% # Minden sorban a '#' jel a sor elejen van. <br>
% # Minden sorban a <I>#' jel a sor elejen van. <br>
% # Minden sorban a <I>#</I> jel a sor elejen van. <br>
%
% Only the 3rd line contains the desired output.
% What can be wrong?

I get only two lines of output. I think perhaps your first line
is actually the input line (assuming you typed rather than reading
from a file). Anyway, you get the partially finished line because
you put the print statement inside the while loop. CHange it to
 while ($0 ~ /'/) {
         sub(/'/,initalics?"</I>":"<I>"); initalics=!initalics
         }
 print $0, "<br>" >> fn

And you should be OK.

% I've read about this '&' placeholder in the docu: 1.0.3 Edition
% of 'Effective AWK Programming' by Arnold Robbins.

How are you using it? This should work:
 gsub(/'[^']+'/, "<I>&</I>")
but then you'd have to get rid of the ' somehow.

If you don't mind relying on gawk features, you could do this:

  print gensub(/'([^']+)'/, "<I>\\1</I>", "g"), "<br>" >> fn
--

Patrick TJ McPhee
East York  Canada



Sun, 02 Feb 2003 10:40:28 GMT  
 Multiline record - how to detect?

Quote:

> {
>   while ($0 ~ /'/) {
>     sub(/'/,initalics?"</I>":"<I>"); initalics=!initalics
>     print $0, "<br>" >> fn
>   }
> }
> It works, but gives 3 lines of output, which only partially good!

I think you have to put the print statement outside the while loop
for the desired effect.

Quote:
> > > The 'print' seems to work, however its output is not perfect. The
> > > link is just the '&' mark, and not the desired 'http://...'.

gsub() with & works ok for me, as in the example I sent you.
What about showing us the piece of code?

--
  All true believers shall break their eggs at the convenient end.



Sun, 02 Feb 2003 03:00:00 GMT  
 Multiline record - how to detect?


Quote:
>> {
>>   while ($0 ~ /'/) {
>>     sub(/'/,initalics?"</I>":"<I>"); initalics=!initalics
>>     print $0, "<br>" >> fn
>>   }
>> }
>> works, but gives 3 lines of output, which only partially good!
> I think you have to put the print statement outside the while loop
> for the desired effect.

Yes, my mistake, sorry!
(the 3rd line was generated somewhere else in the code... :)

Quote:
> gsub() with & works ok for me, as in the example I sent you.
> What about showing us the piece of code?

Now..., this line works! :)

Again: I misunderstood the code what you suggested...
(I put the 'print' command into wrong place...)

Next problem:

I don't (quite) understand the logic of AWK... :-(

I suppose:

1. run BEGIN {} block
2. read 1 record from input
3. run every command in the awk program for the actual record
4. ( repeat step 2. and 3. after each other, until last record )
5. run END {} block

Is this true?

How can I combine the next pieces of awk code into one condition?
The problem is: The line that contain  http://..., appears 2x.
(The italics code works fine...).

{
while ($0 ~ /'/) {
        sub(/'/,initalics?"</I>":"<I>"); initalics=!initalics
        }
        print $0, "<br>" >> fn

Quote:
}

gsub(/http:\/\/[\/\-._a-zA-Z0-9]+/, "<a href=\"&\">&</a>") { print >> fn }

# end of code

Anyone else experienced AWK a bit mysterious?? :)

--
Mihaly Gyulai

http://www.freeyellow.com/members5/gyulai/

Sent via Deja.com http://www.deja.com/
Before you buy.



Sun, 02 Feb 2003 03:00:00 GMT  
 Multiline record - how to detect?

Quote:

> I suppose:
> 1. run BEGIN {} block
> 2. read 1 record from input
> 3. run every command in the awk program for the actual record
> 4. ( repeat step 2. and 3. after each other, until last record )
> 5. run END {} block

Yes, that's it, but each 'rule' (pair of pattern + action) in the
'main loop' (2-3-4 section of yours) get evaluated in sequence
unless 'next' (sort of 'continue': read next line and restart the
loop) or 'exit' (sort of 'break': end input and goto to END rule)
statements intervene.

Quote:
> How can I combine the next pieces of awk code into one condition?
> The problem is: The line that contain  http://..., appears 2x.

... you do NOT need to print in each rule: just perform all
the substitutions and text massaging you need, and then perform
the print in the last rule at the bottom.
Or else, if you are performing those massaging always, include them
all in the same { action block } and perform the only print at the
bootom of the block.

  {
    ...massage italics
    ...massage urls
    print whatever > wherever
  }

--
  All true believers shall break their eggs at the convenient end.



Sun, 02 Feb 2003 03:00:00 GMT  
 
 [ 20 post ]  Go to page: [1] [2]

 Relevant Pages 

1. multiline records

2. Variable multiline record averaging

3. moving multiline records into one line each

4. Help combining multiline records

5. Multiline Record Gawk extraction problem

6. awk w/ multiline records

7. how to handle multiline fields and records?

8. How to easily detect record changes?

9. Detecting changed records ????

10. how to detect a record has realy chnaged ??

11. multiline cell on table

12. PROBLEM WITH REPORT FEATURE & MULTILINE TEXT

 

 
Powered by phpBB® Forum Software