Splitting a URL into components, how? 
Author Message
 Splitting a URL into components, how?

Hi!

I'm trying to write a function that will split a given URL or file path into
components such as protocol, server, directory path, filename and extension,
but I'm not being very successful. Here are examples of what I want to
accomplish:

    http://www.*-*-*.com/
    protocol = http://
    server = www.microsoft.com
    dir path = /windows/directx/
    filename = default
    extension = .asp

    http://localhost/index.htm
    protocol = http://
    dir path = /
    filename = index
    extension = .htm

    /somedir/somesubdir/
    dir path = /somedir/somesubdir/

    /index.asp
    dir path = /
    filename = index
    extension = .asp

Of course, if you have a solution that comes close to what I need, that
would be very interesting as well. Right now I'm trying to figure out a
regular expression that would take care of it. (But, dang, it doesn't work.)

I would like to use this function to handle the result from
Request.ServerVariables("http_referer") and ...("script_name") etc, other
than that it's allowed to put constraints on the input.

--
Best regards
Tomas Eklund, Sweden



Sat, 19 Feb 2005 18:12:16 GMT  
 Splitting a URL into components, how?
Hi

"SCRIPT_NAME" gives "/scriptname.asp"
The other i would take the "protocol" and split() the rest

--
Best Regards
  Vidar Petursson
 ==============================
Microsoft Internet Client & Controls MVP
 ==============================
 http://www.icysoft.com/


 ==============================
  No matter where you go there you are
 ==============================

Quote:
> Hi!

> I'm trying to write a function that will split a given URL or file path
into
> components such as protocol, server, directory path, filename and
extension,
> but I'm not being very successful. Here are examples of what I want to
> accomplish:

>     http://www.microsoft.com/windows/directx/default.asp
>     protocol = http://
>     server = www.microsoft.com
>     dir path = /windows/directx/
>     filename = default
>     extension = .asp

>     http://localhost/index.htm
>     protocol = http://
>     dir path = /
>     filename = index
>     extension = .htm

>     /somedir/somesubdir/
>     dir path = /somedir/somesubdir/

>     /index.asp
>     dir path = /
>     filename = index
>     extension = .asp

> Of course, if you have a solution that comes close to what I need, that
> would be very interesting as well. Right now I'm trying to figure out a
> regular expression that would take care of it. (But, dang, it doesn't
work.)

> I would like to use this function to handle the result from
> Request.ServerVariables("http_referer") and ...("script_name") etc, other
> than that it's allowed to put constraints on the input.

> --
> Best regards
> Tomas Eklund, Sweden



Sat, 19 Feb 2005 22:16:27 GMT  
 Splitting a URL into components, how?
Hi Vidar and thanks, but that "solution" doesn't come close enough.

I'd like the function to handle, at least, URLs/paths similar to the four
examples I presented.

Just splitting at slashes (/) won't give much clue as to what is what
(server, path, filename etc.). Of course, with a little bit of handling you
might be able to figure it out, but then again - that really is the tricky
part. And the tricky part is what I'm asking about.

Best regards
Tomas Eklund, Sweden



Sun, 20 Feb 2005 01:32:16 GMT  
 Splitting a URL into components, how?

Quote:

> Hi!

> I'm trying to write a function that will split a given URL or file
> path into components such as protocol, server, directory path,
> filename and extension, but I'm not being very successful. Here are
> examples of what I want to accomplish:

>     http://www.microsoft.com/windows/directx/default.asp
>     protocol = http://
>     server = www.microsoft.com
>     dir path = /windows/directx/
>     filename = default
>     extension = .asp

>     http://localhost/index.htm
>     protocol = http://
>     dir path = /
>     filename = index
>     extension = .htm

>     /somedir/somesubdir/
>     dir path = /somedir/somesubdir/

>     /index.asp
>     dir path = /
>     filename = index
>     extension = .asp

> Of course, if you have a solution that comes close to what I need,
> that would be very interesting as well. Right now I'm trying to
> figure out a regular expression that would take care of it. (But,
> dang, it doesn't work.)

> I would like to use this function to handle the result from
> Request.ServerVariables("http_referer") and ...("script_name") etc,
> other than that it's allowed to put constraints on the input.

You may want to try the following:

    <html>
    <head>
    <script language="javascript">
    function checkUrl(sUrl) {
        var oReUrl = new RegExp("^" +
                                "([^:]+://)?" +   // Protocol
                                "([^/]+)?" +      // Host
                                "(.*/)?" +        // Path
                                "([^.?#]+)?" +    // Name
                                "(\\.[^.?#]+)?" + // Ext
                                "(\\?[^#]*)?" +   // Query
                                "(#.*)?" +        // Anchor
                                "$");
        var aMatch = oReUrl.exec(sUrl);
        if (aMatch) {
            alert("URL:\t" + sUrl + "\n" +
                  "Protocol:\t" + aMatch[1] + "\n" +
                  "Host:\t" + aMatch[2] + "\n" +
                  "Path:\t" + aMatch[3] + "\n" +
                  "Name:\t" + aMatch[4] + "\n" +
                  "Ext:\t" + aMatch[5] + "\n" +
                  "Query:\t" + aMatch[6] + "\n" +
                  "Anchor:\t" + aMatch[7]);
        } else {
            alert("Invalid URL: " + sUrl);
        }
    }
    checkUrl("http://www.microsoft.com/windows/directx/default.asp");
    checkUrl("http://localhost/index.htm");
    checkUrl("/somedir/somesubdir/");
    checkUrl("/index.asp");
    checkUrl("/index.asp?p1=first&p2=second#resume");
    </script>
    </head>
    </html>

Boniface



Sun, 20 Feb 2005 08:17:16 GMT  
 Splitting a URL into components, how?
Hi Boniface and thanks!

Wow! I was getting closer and closer to a solution (I had it almost working)
but your solution was superior, of course. And I really like your coding
style. It's really nice. Thanks again!

If I want to be able to handle file names such as "/file.001.jpg", would
that require special handling or is it possible to build into the regexp?
It's not really an important thing, I'm just wondering.

Of course I could change the extension part of the regexp to this:

    "(\\.[^?#]+)?" // Ext

(by removing the dot), but that would yield:

    Name:    file
    Ext:    .001.jpg

which is not the perfect solution (I want the .001 to go on the name part).
Well, I guess it's as simple as using this regexp and then checking the
extension for multiple dots and moving anything preceding the last dot to
the file name. (Not that I create files with dots in them... I just want to
handle it.)

Well, back to coding...

Best regards
Tomas Eklund, Sweden



Sun, 20 Feb 2005 17:53:45 GMT  
 Splitting a URL into components, how?

[...]

Quote:
> If I want to be able to handle file names such as "/file.001.jpg",
> would that require special handling or is it possible to build into
> the regexp?

Tomas, glad to know that you found the code readable. To handle dots
in a file name, here is the revised code:

    <html>
    <head>
    <script language="javascript">
    function checkUrl(sUrl) {
        var oReUrl = new RegExp("^" +
                                "([^:]+://)?" +   // Protocol
                                "([^/]+)?" +      // Host
                                "(.*/)?" +        // Path
                                "(" +
                                  // First try name with
                                  // mandatory ext.
                                  "([^?#]+)?" +   // Name
                                  "(\\.[^?#]+)" + // Ext
                                  "|" +            
                                  // Ext not found.
                                  "([^?#]+)?" +   // Name
                                ")" +
                                "(\\?[^#]*)?" +   // Query
                                "(#.*)?" +        // Anchor
                                "$");
        var aMatch = oReUrl.exec(sUrl);
        if (aMatch) {
            var sName = (aMatch[5] ? aMatch[5] : aMatch[7]);
            alert("URL:\t" + sUrl + "\n" +
                  "Protocol:\t" + aMatch[1] + "\n" +
                  "Host:\t" + aMatch[2] + "\n" +
                  "Path:\t" + aMatch[3] + "\n" +
                  "Name:\t" + sName + "\n" +
                  "Ext:\t" + aMatch[6] + "\n" +
                  "Query:\t" + aMatch[8] + "\n" +
                  "Anchor:\t" + aMatch[9]);
        } else {
            alert("Invalid URL: " + sUrl);
        }
    }
    checkUrl("http://www.microsoft.com/windows/directx/default.asp");
    checkUrl("http://localhost/index.htm");
    checkUrl("/somedir/somesubdir/");
    checkUrl("/index.asp");
    checkUrl("/file.001.jpg");
    checkUrl("/index.asp?p1=first&p2=second#resume");
    </script>
    </head>
    </html>

Boniface



Mon, 21 Feb 2005 08:09:21 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Splitting object paths into components

2. how to split a split string

3. Server side .split() working differently to client side .split()

4. how to use url.Encode for NS problem w/ params in URL

5. URL, Getting the url using asp

6. WSH that makes a url file Makes BlueDragon.url

7. Extracting the URL from a url-file

8. Convert relative URL to absolute URL?

9. A split program on the PC or Sun to split big .ps files?

10. Split string by regular expression (ie. date)

11. how do i split it?

12. Split jobs, keep procedures

 

 
Powered by phpBB® Forum Software