regular expression href replace question 
Author Message
 regular expression href replace question

Given this string pulled from a domain called remotedomain.com

<html>
<body>
<a href=/dir/testdir/test.asp>test.asp</a>
<a HREF='higherdir/higherdir/uphigher.asp'>uphigher.asp</a>
<a HreF=" http://www.*-*-*.com/ ;>someotherdomain.asp</a>
</body>
</html>

I'd like to return the following to link to an internal parser page on our
domain.

<html>
<body>
<a href="myparserpage.asp?url=remotedomain.com/dir/testdir/test.asp">test.asp</a>
<a href="myparserpage.asp?url=remotedomain.com/dir/higherdir/uphigher.asp">uphigher.asp</a>
<a href="myparserpage.asp?url= http://www.*-*-*.com/ ;>someotherdomain.asp</a>
</body>
</html>

Searching and replacing on multiple cases is giving me fits right
now. Surely, someone has done something like this before, hopefully!
Thanks for any insights.



Tue, 03 Feb 2004 12:17:38 GMT  
 regular expression href replace question
Use a regular expression with the g (global) and i (case-insenstive) options.

http://www.devguru.com/Technologies/ecmascript/quickref/string_replac...

Tim

Quote:

> Given this string pulled from a domain called remotedomain.com

> <html>
> <body>
> <a href=/dir/testdir/test.asp>test.asp</a>
> <a HREF='higherdir/higherdir/uphigher.asp'>uphigher.asp</a>
> <a HreF="http://www.someotherdomain.com/somedomain.asp">someotherdomain.asp</a>
> </body>
> </html>

> I'd like to return the following to link to an internal parser page on our
> domain.

> <html>
> <body>
> <a href="myparserpage.asp?url=remotedomain.com/dir/testdir/test.asp">test.asp</a>
> <a
> href="myparserpage.asp?url=remotedomain.com/dir/higherdir/uphigher.asp">uphigher.asp</a>
> <a
> href="myparserpage.asp?url=http://www.someotherdomain.com/someotherdomain.asp">someotherdomain.asp</a>
> </body>
> </html>

> Searching and replacing on multiple cases is giving me fits right
> now. Surely, someone has done something like this before, hopefully!
> Thanks for any insights.



Tue, 03 Feb 2004 13:00:02 GMT  
 regular expression href replace question
What about the links[ ] array- find the link, set the href in javascript...

hth
Justin Dutoit


Given this string pulled from a domain called remotedomain.com

<html>
<body>
<a href=/dir/testdir/test.asp>test.asp</a>
<a HREF='higherdir/higherdir/uphigher.asp'>uphigher.asp</a>
<a
HreF="http://www.someotherdomain.com/somedomain.asp">someotherdomain.asp</a>
</body>
</html>

I'd like to return the following to link to an internal parser page on our
domain.

<html>
<body>
<a
href="myparserpage.asp?url=remotedomain.com/dir/testdir/test.asp">test.asp</
a>
<a
href="myparserpage.asp?url=remotedomain.com/dir/higherdir/uphigher.asp">uphi
gher.asp</a>
<a
href="myparserpage.asp?url=http://www.someotherdomain.com/someotherdomain.as
p">someotherdomain.asp</a>
</body>
</html>

Searching and replacing on multiple cases is giving me fits right
now. Surely, someone has done something like this before, hopefully!
Thanks for any insights.



Tue, 03 Feb 2004 13:07:00 GMT  
 regular expression href replace question

Quote:
> Use a regular expression with the g (global) and i (case-insenstive)

options.

That wouldn't do it. If you look closely at the lines, you will notice that
is no unique way of knowing what to replace. For instance in the second line
the first 'higherdir' is to be replaced with '/dir'. Such substitutions can
only be done by actions guided by a set of rules.

Quote:

> > <a href=/dir/testdir/test.asp>test.asp</a>
> > <a HREF='higherdir/higherdir/uphigher.asp'>uphigher.asp</a>
> > <a

HreF="http://www.someotherdomain.com/somedomain.asp">someotherdomain.asp</a>
Quote:

> > I'd like to return the following to link to an internal parser page on
our
> > domain.

> > <a

href="myparserpage.asp?url=remotedomain.com/dir/testdir/test.asp">test.asp</
a>
Quote:
> > <a

href="myparserpage.asp?url=remotedomain.com/dir/higherdir/uphigher.asp">uphi
gher.asp</a>
Quote:
> > <a

href="myparserpage.asp?url=http://www.someotherdomain.com/someotherdomain.as
p">someotherdomain.asp</a>

Best regards
Johnny Nielsen



Thu, 05 Feb 2004 17:17:30 GMT  
 regular expression href replace question

Okay, I've taken this down the road a bit, but I'm stuck at this point
and the code is not functioning as I would expect. The concept is to
return the HTML from a remote URL, parse the html and replace the
HREFs as I mentioned earlier in the thread.

<%
'example usage
'name page parserpage.asp
' call it as parserpage.asp?strURL=http://www.yahoo.com/

Function MatchHref(strHTML)
 Dim objRegExp
 Dim objMatch
 Set objRegExp = New RegExp
 objRegExp.IgnoreCase = True
 objRegExp.Global = True
 objRegExp.Multiline = True
 objRegExp.Pattern = "<a.*\bhref(\s*)=(\s*)['""]?([^'""\s>]+)"
 For Each objMatch in objRegExp.Execute(strHTML)
   'Response.Write objMatch.SubMatches(2)
  strHTML = Replace(strHTML, objMatch.SubMatches(2),ChangeLink(objMatch.SubMatches(2),"href"))
 Next  
 strHTML = MatchImagePath(strHTML)
 strHTML = MatchCSSPath(strHTML)
 MatchHref = strHTML
End Function

Function MatchImagePath(strHTML)
 Dim objRegExp2
 Dim objMatch2
 Set objRegExp2 = New RegExp
 objRegExp2.IgnoreCase = True
 objRegExp2.Global = True
 objRegExp2.Pattern = "<img.*\bsrc(\s*)=(\s*)['""]?([^'""\s>]+)"
 For Each objMatch2 in objRegExp2.Execute(strHTML)
  strHTML = Replace(strHTML, objMatch2.SubMatches(2),ChangeLink(objMatch2.SubMatches(2),"image"))
 Next
  Set objRegExp2 = Nothing
  MatchImagePath = strHTML
End Function

Function MatchCSSPath(strHTML)
 Dim objRegExp2
 Dim objMatch2
 Set objRegExp2 = New RegExp
 objRegExp2.IgnoreCase = True
 objRegExp2.Global = True
 objRegExp2.Pattern = "<link.*\bhref(\s*)=(\s*)['""]?([^'""\s>]+)"
 For Each objMatch2 in objRegExp2.Execute(strHTML)
  strHTML = Replace(strHTML, objMatch2.SubMatches(2),ChangeLink(objMatch2.SubMatches(2),"css"))
 Next
  Set objRegExp2 = Nothing
  MatchCSSPath = strHTML
End Function

Function MatchScriptPath(strHTML)
 Dim objRegExp2
 Dim objMatch2
 Set objRegExp2 = New RegExp
 objRegExp2.IgnoreCase = True
 objRegExp2.Global = True
 objRegExp2.Pattern = "<script.*\bhref(\s*)=(\s*)['""]?([^'""\s>]+)"
 For Each objMatch2 in objRegExp2.Execute(strHTML)
  strHTML = Replace(strHTML, objMatch2.SubMatches(2),ChangeLink(objMatch2.SubMatches(2),"script"))
 Next
  Set objRegExp2 = Nothing
  MatchScriptPath = strHTML
End Function

Function ChangeLink(strIn,strType)
  Dim objRegExp, objMatches, objMatch, objSubMatches
  Dim strHTTPURI, strHTTP, strDomainURI, strDomain, strPath, strQuery, strQueryPair
  Dim objRegExp2, objMatches2, objMatch2, objSubMatches2
  Dim strHTTPURI2, strHTTP2, strDomainURI2, strDomain2, strPath2, strQuery2, strQueryPair2
  Dim blnDebug
  Set objRegExp = New RegExp  
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"
  Set objMatches = objRegExp.Execute(strIn)
  Set objMatch = objMatches(0)
  strHTTPURI = objMatch.SubMatches(0)   'matches "http:"
  strHTTP = objMatch.SubMatches(1)    'matches "http"
  strDomainURI = objMatch.SubMatches(2)  'matches "//www.somedomain.com"
  strDomain = objMatch.SubMatches(3)   'matches "www.somedomain.com"
  strPath = objMatch.SubMatches(4)    'matches path
  strQuery = objMatch.SubMatches(5)    'matches entire querystring
  strQueryPair = objMatch.SubMatches(6)  'matches querystring pairs only
  Set objRegExp = Nothing

  'Debug
  'blnDebug = True
  If blnDebug = True Then
  Response.Write "<br>strHTTPURI: <b>" & strHTTPURI & "</b><br>"
  Response.Write "strHTTP: <b>" & strHTTP & "</b><br>"
  Response.Write "strDomainURI: <b>" & strDomainURI & "</b><br>"
  Response.Write "strDomain: <b>" & strDomain & "</b><br>"
  Response.Write "strPath: <b>" & strPath & "</b><br>"
  Response.Write "strQuery: <b>" & strQuery & "</b><br>"
  Response.Write "strQueryPair: <b>" & strQueryPair & "</b><br>"
 End If

  Set objRegExp2 = New RegExp  
  objRegExp2.IgnoreCase = True
  objRegExp2.Global = True
  objRegExp2.Pattern = "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"
  Set objMatches2 = objRegExp2.Execute(Trim(Request("strURL")))
  Set objMatch2 = objMatches2(0)
  strHTTPURI2 = objMatch2.SubMatches(0)  'matches "http:"
  strHTTP2 = objMatch2.SubMatches(1)    'matches "http"
  strDomainURI2 = objMatch2.SubMatches(2) 'matches "//www.somedomain.com"
  strDomain2 = objMatch2.SubMatches(3)   'matches "www.somedomain.com"
  strPath2 = objMatch2.SubMatches(4)    'matches path
  strQuery2 = objMatch2.SubMatches(5)   'matches entire querystring
  strQueryPair2 = objMatch2.SubMatches(6) 'matches querystring pairs only
  If strHTTPURI = "" Then
    strHTTPURI = strHTTPURI2
  End If
  If strDomainURI = "" Then
    strDomainURI = strDomainURI2
  Else
    If strPath = "" Then
      strPath = "/"
    End If
  End If
  If strPath = "" Then
    strPath = strPath2
  End If
  If Left(strPath,1) <> "/" Then
    strPath = "/" & strPath
  End If
  strRemotePath = strHTTPURI & strDomainURI & strPath & strQuery

 Select Case strType
  Case "image", "css", "script"
    ChangeLink = strRemotePath

  Case "href"  
     ChangeLink = "parserpage?strURL=" & strRemotePath

   Case Else
     ChangeLink = strRemotePath

 End Select  

End Function

Function GetRemoteHTML(strURL)
  On Error Resume Next
  If strURL = "" Then Response.Write "The querystring value of strURL must not be empty." : Response.End
  Dim strHTML, objHTTP

  'Create the WinHttp ActiveX Object.
  Set objHTTP = Server.CreateObject("WinHttp.WinHttpRequest.5")
  Call objHTTP.Open("GET", strURL, False)
  Call objHTTP.setRequestHeader("Content-Type", "application/x-www-form-urlencoded")
 objHTTP.Option(0) = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
  'objHTTP.Option(6) = True
  'Send the HTTP request.
  objHTTP.Send()
  If Err.number <> 0 Then
    Response.Write("Error encountered. The error information is " & _
    "<br>Error Number: " & Err.number & " " & _
    "<br>Error Description: " & Err.Description & " " & _
    "<br>Error Source: " & Err.Source)
    Err.Clear
    Response.End
  End If
  'If objHTTP.status = "302" Then
    'Response.Write("Redirects are not supported with this release.")
    'strLocation = objHTTP.GetResponseHeader("Location")
    'Response.Redirect("parserpage.asp?strURL=& strLocation)
  'End If

  'Retrieve the response text.
  strHTML = objHTTP.ResponseText
  Destroy(objHTTP)
  GetRemoteHTML = strHTML
End Function

Sub Destroy(obj)
  If IsObject(obj) Then
    If Not (obj Is Nothing) Then Set obj = Nothing
  End If
End Sub
strRemoteHTML = GetRemoteHTML(Request("strURL"))
strRemoteHTML = MatchHref(strRemoteHTML)
Response.Write strRemoteHTML
%>

Quote:



> > Use a regular expression with the g (global) and i (case-insenstive)
> options.

> That wouldn't do it. If you look closely at the lines, you will notice that
> is no unique way of knowing what to replace. For instance in the second line
> the first 'higherdir' is to be replaced with '/dir'. Such substitutions can
> only be done by actions guided by a set of rules.


> > > <a href=/dir/testdir/test.asp>test.asp</a>
> > > <a HREF='higherdir/higherdir/uphigher.asp'>uphigher.asp</a>
> > > <a
> HreF="http://www.someotherdomain.com/somedomain.asp">someotherdomain.asp</a>

> > > I'd like to return the following to link to an internal parser page on
> our
> > > domain.

> > > <a
> href="myparserpage.asp?url=remotedomain.com/dir/testdir/test.asp">test.asp</
> a>
> > > <a

> href="myparserpage.asp?url=remotedomain.com/dir/higherdir/uphigher.asp">uphi
> gher.asp</a>
> > > <a

> href="myparserpage.asp?url=http://www.someotherdomain.com/someotherdomain.as
> p">someotherdomain.asp</a>

> Best regards
> Johnny Nielsen



Fri, 06 Feb 2004 11:17:52 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. regular expression href replace question

2. Regular Expression replace question

3. Regular Expression Replace Question

4. Regular Expression replace question

5. Regular Expressions with Replace question...

6. Regular expression to create clickable href

7. regular expressions with string.replace

8. Regular expressions to replace characters outside HTML tags

9. regular expression replace

10. VBScript - Regular Expressions replace

11. Replace HTML Tag value? (Regular Expression)

12. regular expression and .replace help

 

 
Powered by phpBB® Forum Software