preg_match 
Author Message
 preg_match

ok, first of all, I'm looking for a manual about this!
I can't find a decent on on the internet....(it's very probable I'm looking
in the wrong direction...)
not about preg_match, but about regular expressions!

my problem is this, I would like to copy some information about a site
(now before I get people nagging about copyright, I asked them, and it's ok)

The meaning is to steal all the html code between several HTML tags!
I successfully took the information from the page, and now I have a variable
with the whole HTML page!
(I printed that out, it's ok!)

now I need to extract the code...
let's take this as an example:
<?
$HTML_page = "<html><head><title>blabla</title></head><body>this is some
<b>garbage</b>code with some HTML  and such<table><tr><td><b>this is what I
want</b>but not this</td></tr></table> some more junk code</body></html>";

$regularexpr = "/<tr><td><b>(.*)<\/b>.*<\/td><\/tr>/";
preg_match_all($regularexpr, $buffer, $Matches);

?>

and then I get.... nothing! :(
only when I only do this:
echo $Matches[0];
I see "Array", so my brain told me, hey it must be inside an array:
echo $Matches[0][0];
gave an empty string!

I'm shure the code is in the page, and I'm pretty shure I'm making some
stupid mistake due to my lack of knowledge :)

any ideas?



Tue, 21 Jun 2005 07:53:15 GMT  
 preg_match
Try print_r($yourArrayName) to see what the array is....


Tue, 21 Jun 2005 11:35:36 GMT  
 preg_match

Quote:
> Try print_r($yourArrayName) to see what the array is....

it's completely empty,It make the subarray0 & 1 but leaves them empty.


Tue, 21 Jun 2005 17:57:56 GMT  
 preg_match

Quote:

> ok, first of all, I'm looking for a manual about this!
> I can't find a decent on on the internet....(it's very probable I'm looking
> in the wrong direction...)
> not about preg_match, but about regular expressions!

These are about Perl regular expressions:
http://www.geocities.com/SiliconValley/Bay/3464/regex.htm
http://www.anaesthetist.com/mnm/perl/regex.htm

--

PHP POSTERS: Please use comp.lang.php for PHP related questions,
              alt.php* groups are not recommended.



Tue, 21 Jun 2005 21:51:37 GMT  
 preg_match

Quote:

> The meaning is to steal all the html code between several HTML tags!

Search the archives for this group and you'll find about a hundred different
examples for doing this. Even the PHP manual has one example of it, IIRC.

--
----- stephan beal
Registered Linux User #71917 http://counter.li.org
I speak for myself, not my employer. Contents may
be hot. Slippery when wet. Reading disclaimers makes
you go blind. Writing them is worse. You have been Warned.



Tue, 21 Jun 2005 21:57:42 GMT  
 preg_match
I've looked all over, couldn't find a working example...

but I did some progress on my own: I found out that it will not tolerate
newlines and other of these kind of characters!
so here lays my problem... how to remove them, I couldn't fnd in the php
manual,  because
nl2br() only adds a html br before the nl, thus, not remove it!

what do I do?


Quote:

> > The meaning is to steal all the html code between several HTML tags!

> Search the archives for this group and you'll find about a hundred
different
> examples for doing this. Even the PHP manual has one example of it, IIRC.

> --
> ----- stephan beal
> Registered Linux User #71917 http://counter.li.org
> I speak for myself, not my employer. Contents may
> be hot. Slippery when wet. Reading disclaimers makes
> you go blind. Writing them is worse. You have been Warned.



Wed, 22 Jun 2005 01:47:02 GMT  
 preg_match

I had a doozy of a time matching HTML from pages for a long time, and there
were 3 things that helped me.

1. Use non-greedy .*'s.. .*? is your friend.
2. if you have to match over multiple lines, use 'ms' --> s/fjk(.*?)kd/ms
3. Normalize the webpage before you match against it, here is what I
commonly do:
function normalize_page ( &$page ) {
    $page = preg_replace( "/\r/",       "",        $page); <-- Strip out
carraige returns
    $page = preg_replace( "/\n/",       "",        $page); <-- Strip out new
lines returns
    $page = preg_replace( "/\t/",       "",        $page); <-- bye bye tags
  //$page = preg_replace( "/  /",       " ",       $page); <-- Use this with
some caution.
    $page = preg_replace( "|</(.*?)>|", "</$1>\n", $page); <-- This breaks
up tags onto new lines
    $page = preg_replace( "|><|",       ">\n<",    $page); <-- This also
breaks up things.
    }

Normalizing a page before reg'exing it can also help when html formats
change "just a little", a space here, or a tab/newline there, and your regex
is broken. However, normalizing the data first helps to keep your regexes
functioning.

Just my $.02.

--Brian


Quote:
> ok, first of all, I'm looking for a manual about this!
> I can't find a decent on on the internet....(it's very probable I'm
looking
> in the wrong direction...)
> not about preg_match, but about regular expressions!

> my problem is this, I would like to copy some information about a site
> (now before I get people nagging about copyright, I asked them, and it's
ok)

> The meaning is to steal all the html code between several HTML tags!
> I successfully took the information from the page, and now I have a
variable
> with the whole HTML page!
> (I printed that out, it's ok!)

> now I need to extract the code...
> let's take this as an example:
> <?
> $HTML_page = "<html><head><title>blabla</title></head><body>this is some
> <b>garbage</b>code with some HTML  and such<table><tr><td><b>this is what
I
> want</b>but not this</td></tr></table> some more junk code</body></html>";

> $regularexpr = "/<tr><td><b>(.*)<\/b>.*<\/td><\/tr>/";
> preg_match_all($regularexpr, $buffer, $Matches);

> ?>

> and then I get.... nothing! :(
> only when I only do this:
> echo $Matches[0];
> I see "Array", so my brain told me, hey it must be inside an array:
> echo $Matches[0][0];
> gave an empty string!

> I'm shure the code is in the page, and I'm pretty shure I'm making some
> stupid mistake due to my lack of knowledge :)

> any ideas?



Thu, 23 Jun 2005 08:48:12 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. preg_match help

2. Execute PHP code wit preg_match/e

3. using preg_match to find specific nested table

 

 
Powered by phpBB® Forum Software