newbie...To parse or not to parse 
Author Message
 newbie...To parse or not to parse

First, the "simple" XML file:

<table>
  <row>
    <year>1978</year>
    <month>01</month>
    <xray_class>M</xray_class>
  </row>
...rest of file snipped...
</table>

and here is the script:

#!/usr/bin/perl
use strict;
use XML::Parser;
use GD::Graph::bars;


my $p = XML::Parser->new(Handlers=>{Char=>sub{
   ++$month_values[$_[1]-1] if $_[1]>=1 && $_[1]<=12;

Quote:
}});

$p->parsefile("xray1978.xml");

# create the graph


                  "Jun.", "Jul.", "Aug.", "Sep.", "Oct.", "Nov.", "Dec.");

my $graph = GD::Graph::bars->new(500, 400);

$graph->set(
 x_label=>'Month',
 y_label=>'Occurrences',
 bar_spacing=>3,
 # long_ticks=>1,
 show_values=>1,
 )
or warn $graph->error;
$graph->set_legend( 'Number of Flares per Month' );

open(GRAPH,">graph.jpg") || die "Cannot open graph.jpg: $!\n";

# the next line is only used if you are using a Win32 system

binmode GRAPH;
print GRAPH $graph->gd->jpeg(100);

The script does a great job. It takes the number of occurrences of each
month and creates a bar graph. Now what I'm trying to do is make the
XML data in the file more "meaningful". I compiled a new file and this is
what it looks like:

<year_1978>
  <record>
     <year>1978</year>
     <month>01</month>
     <day>01</day>
     <flare_start>0555</flare_start>
     <flare_end>0625</flare_end>
     <flare_max>0558</flare_max>
     <latitude>S18</latitude>
     <c_m_d>E41</c_m_d>
     <opt_importance>1</opt_importance>
     <opt_brightness>N</opt_brightness>
     <xray_class>M</xray_class>
     <xray_intensity>1</xray_intensity>
  </record>
...rest of file snipped...
</year_1978>

Of course, now when I run the script, it doesn't work the way it did with
the simpler file.

So here are my questions (finally...I know): should I even bother trying to
use XML::Parser to gather the information from the XML file? Would
it be possible to just use regular expressions somehow and accomplish
the same thing? I wanted to use a parser because this is intended to be
a CGI script (if it ever gets done) and I thought that a parser would be
faster. But now....it just doesn't matter! I'm at a point where I just want
the
darn thing to work! (sorry for the rant...just a bit frustrated).

This is what I would like to accomplish:

1- get the total number of entries for each month
2- get the total number of "M" xray class per month
3- graph the results



Wed, 15 Sep 2004 02:56:44 GMT  
 newbie...To parse or not to parse

[ building a graph from XML data ]

Quote:
> The script does a great job. It takes the number of occurrences of each
> month and creates a bar graph. Now what I'm trying to do is make the
> XML data in the file more "meaningful". I compiled a new file and this is
> what it looks like:

> <year_1978>
>   <record>
>      <year>1978</year>
>      <month>01</month>
>      <day>01</day>
>      <flare_start>0555</flare_start>
>      <flare_end>0625</flare_end>
>      <flare_max>0558</flare_max>
>      <latitude>S18</latitude>
>      <c_m_d>E41</c_m_d>
>      <opt_importance>1</opt_importance>
>      <opt_brightness>N</opt_brightness>
>      <xray_class>M</xray_class>
>      <xray_intensity>1</xray_intensity>
>   </record>
> ...rest of file snipped...
> </year_1978>

> Of course, now when I run the script, it doesn't work the way it did with
> the simpler file.

> So here are my questions (finally...I know): should I even bother trying to
> use XML::Parser to gather the information from the XML file? Would
> it be possible to just use regular expressions somehow and accomplish
> the same thing?

No. ;)

Quote:
> This is what I would like to accomplish:

> 1- get the total number of entries for each month
> 2- get the total number of "M" xray class per month
> 3- graph the results

I would probably just use XPath:

  use XML::XPath;  # or LibXML or GDOME, which are fast
  ...

  for my $node ($root->findnodes('//record')) {
      my $month = $node->findvalue('month');

      $totals{$month}++;
      $m_xrays{$month}++ if $node->findvalue('xray_class') eq 'M';
  }

HTH
--
Steve



Wed, 15 Sep 2004 03:56:19 GMT  
 newbie...To parse or not to parse

Quote:


>><snip>
> > This is what I would like to accomplish:

> > 1- get the total number of entries for each month
> > 2- get the total number of "M" xray class per month
> > 3- graph the results

> I would probably just use XPath:

>   use XML::XPath;  # or LibXML or GDOME, which are fast
>   ...

>   for my $node ($root->findnodes('//record')) {
>       my $month = $node->findvalue('month');

>       $totals{$month}++;
>       $m_xrays{$month}++ if $node->findvalue('xray_class') eq 'M';
>   }

Steve....will this script find the totals separately? IOW, all the records
for January totaled independently from February etc. I'm sorry if I wasn't
clear about that, but what would work is if somehow the information for
each month is stored separately (total entries for each month, and then
total entries of "M" from each of those totals). I may be wrong (more than
likely I am), but it looks like this script will find the total numbers for
the
whole year (months and 'M')....is that right?

Gus



Wed, 15 Sep 2004 05:00:09 GMT  
 newbie...To parse or not to parse

Quote:


>>><snip>

>> > This is what I would like to accomplish:

>> > 1- get the total number of entries for each month
>> > 2- get the total number of "M" xray class per month
>> > 3- graph the results

>> I would probably just use XPath:

>>   use XML::XPath;  # or LibXML or GDOME, which are fast
>>   ...

>>   for my $node ($root->findnodes('//record')) {
>>       my $month = $node->findvalue('month');

>>       $totals{$month}++;
>>       $m_xrays{$month}++ if $node->findvalue('xray_class') eq 'M';
>>   }

> Steve....will this script find the totals separately? IOW, all the records
> for January totaled independently from February etc. I'm sorry if I wasn't
> clear about that, but what would work is if somehow the information for
> each month is stored separately (total entries for each month, and then
> total entries of "M" from each of those totals). I may be wrong (more than
> likely I am), but it looks like this script will find the total numbers for
> the whole year (months and 'M')....is that right?

Well, no... $totals{$month} is a hash lookup, so you'll end up with a total
for each month in the XML document.

But since the months look like ints (and since you seemed to like the array
in an earlier thread ;) you can just as well use:


  ...

  $totals[ $month ]++;
  $m_xrays[ $month ]++ if ...etc

HTH
--
Steve



Wed, 15 Sep 2004 05:56:12 GMT  
 newbie...To parse or not to parse
Steve Grazzini

Quote:
> Well, no... $totals{$month} is a hash lookup, so you'll end up with a
total
> for each month in the XML document.

...that's really slick!...hard to believe that such few lines of code can do
so
much...

Quote:
> But since the months look like ints (and since you seemed to like the
array
> in an earlier thread ;) you can just as well use:

...I already had the graphing part working using arrays...I figured why fix
it if it ain't broken ;)

Quote:

>   ...

>   $totals[ $month ]++;
>   $m_xrays[ $month ]++ if ...etc

...man...this is so close to working....what's happening now is that the
information on the graph is being shifted over to the right by one month.
So,
January is blank, February has the data of January, March has the data of
February, etc. Here is the beginning of the script. I only added the line
that
calls the XML file:


my $root = XML::XPath->new(filename => 'xray1978.xml');

for my $node ($root->findnodes('//record')) {
    my $month = $node->findvalue('month');

    $totals[$month]++;
    $xrays[$month]++ if $node->findvalue('xray_class') eq 'M';

Quote:
}



Wed, 15 Sep 2004 08:01:34 GMT  
 newbie...To parse or not to parse

Quote:
> ....
> what's happening now is that
> the information on the graph is being shifted over to the right by
> one month. So,
> January is blank, February has the data of January,
> ...
>     $totals[$month]++;

Remember arrays start at 0 in Perl. So you have to substract 1 here:
        $totals[$month - 1]++;

(It's the same as in a previous thread about your project).

--
felix



Wed, 15 Sep 2004 09:47:35 GMT  
 newbie...To parse or not to parse

Quote:

> Remember arrays start at 0 in Perl. So you have to substract 1 here:
>     $totals[$month - 1]++;

> (It's the same as in a previous thread about your project).

...you can bet I'll remember this the next time!...on to the CGI part...

Thanks for all the help.

Gus



Wed, 15 Sep 2004 14:40:01 GMT  
 newbie...To parse or not to parse

<snip>

....just wanted to thank you for all the help...Felix Greerinckx helped me
on
the last part...now, on to the CGI part....

Gus



Wed, 15 Sep 2004 14:47:14 GMT  
 newbie...To parse or not to parse

Quote:

> First, the "simple" XML file:

> <table>
>   <row>
>     <year>1978</year>
>     <month>01</month>
>     <xray_class>M</xray_class>
>   </row>
> ...rest of file snipped...
> </table>

> and here is the script:

> #!/usr/bin/perl
> use strict;
> use XML::Parser;
> use GD::Graph::bars;


> my $p = XML::Parser->new(Handlers=>{Char=>sub{
>    ++$month_values[$_[1]-1] if $_[1]>=1 && $_[1]<=12;
> }});

> $p->parsefile("xray1978.xml");

The problem with this parser is that it assumes that all data in the
form of a number between 1 and 12 is a month.

my $m = 13; # set to a non-month integer.
my $p = XML::Parser->new(Handlers=>{End=>sub{
   ++$month_values[$m-1] if $_[1] eq "month";
   $m = 13;

Quote:
}, Char=>sub{
   $m = $_[1];
}});

$p->parsefile("xray1978.xml");

[snip]

Quote:
> The script does a great job. It takes the number of occurrences of
> each month and creates a bar graph. Now what I'm trying to do is make
> the XML data in the file more "meaningful". I compiled a new file and
> this is what it looks like:

> <year_1978>
>   <record>
>      <year>1978</year>
>      <month>01</month>
>      <day>01</day>

You note that day is between 1 and 12, right?  And you know that that's
all that your code checked for to tell whether or not it was looking at
a month, right?

Quote:
>      <flare_start>0555</flare_start>
>      <flare_end>0625</flare_end>
>      <flare_max>0558</flare_max>
>      <latitude>S18</latitude>
>      <c_m_d>E41</c_m_d>
>      <opt_importance>1</opt_importance>

This is also a number between 1 and 12.

Quote:
>      <opt_brightness>N</opt_brightness>
>      <xray_class>M</xray_class>
>      <xray_intensity>1</xray_intensity>

So's this.

Quote:
>   </record>
> ...rest of file snipped...
> </year_1978>

> Of course, now when I run the script, it doesn't work the way it did
> with the simpler file.

Duh.  That's cause it's considering all numbers between 1 and 12 to be
months.

Quote:
> So here are my questions (finally...I know): should I even bother
> trying to use XML::Parser to gather the information from the XML file?
> Would it be possible to just use regular expressions somehow and
> accomplish the same thing?

Well, you should be using *some* parsing module, certainly.

Maybe not XML::Parser, but don't try and roll-your-own with regexen.

Quote:
> I wanted to use a parser because this is intended to be a CGI script
> (if it ever gets done) and I thought that a parser would be faster.

Being correct is more important than being fast.  And a parser [used
right] is far more likely to get it done right than regex code.

Well, depending on how the data is created.  If you're absolutely
POSITIVE that it will always be in exactly such-and-such format, with
whitespace here there and there, then a regex solution might (if you're
really lucky) work ok.

Quote:
> But now....it just doesn't matter! I'm at a point where I just want
> the darn thing to work! (sorry for the rant...just a bit frustrated).

> This is what I would like to accomplish:

> 1- get the total number of entries for each month
> 2- get the total number of "M" xray class per month
> 3- graph the results

Hmm, this requires to keep track of both month and xray class, and to
update them at the end of each record.


my ($m, $x);
my $p = XML::Parser->new(Handlers=>{Start=>sub{

   $m = 13, $x = ""

Quote:
},End => sub{




      $m = $data if $elem eq "month";
      $x = $data if $elem eq "xray_class";

      ++$total_per_month[$m-1];
      ++$xrays_per_month[$m-1] if $x eq "M";
   }
Quote:
}, Char=>sub{

   $data[-1] .= $_[1];
Quote:
}});

$p->parsefile("xray1978.xml");
[untested]

Note that this is ceasing to be simple :)
You'd probably be better off using XML::Simple or XML::XPath at this
point, or switching to it pretty soon, anyway.

This is likely to be more efficient and faster, but not so easy on the
eyes :)

--
print reverse( ",rekcah", " lreP", " rehtona", " tsuJ" )."\n";



Sun, 19 Sep 2004 12:17:53 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Text Parsing - Parse::RecDescent or another method?

2. Parsing with Parse::RecDescent

3. Parse::RecDescent and parsing comments

4. Help: Problem with simple parsing and Parse::RecDescent

5. Parsing with Parse::RecDescent

6. Parse::RecDescent stops parsing.

7. help w/libwww, FormatText not parsing

8. xml::parse exit if condition not met

9. Parsing XML (Not XML::Parser)

10. PerlIS not parsing arguments in WIN IIS4

11. perl switches do not appear to be parsed?

12. Parsing text file not working properly

 

 
Powered by phpBB® Forum Software