HTTP::Request encoding query 
Author Message
 HTTP::Request encoding query

Hi,

Take the following HTTP::Request :

$tgt = ' http://www.*-*-*.com/ |';
$req = new HTTP::Request('GET' => $tgt);

Now, when this request is made at some point it is encoded (in iso latin 1 I
think) into a HTTP request that looks like

GET http://www.*-*-*.com/ %7C

where '|' has become '%7C'

So what I'd like to know is if there is any way to prevent this encoding
taking place?  I don't mind changing perl source and recompiling if someone
can point me in the right direction, though obviously I'd prefer not to :)

This is under ActivePerl 5.22 (5.005_03) on Win32.  Thanks for any help.

--
Iain

Sent via Deja.com
http://www.*-*-*.com/



Tue, 01 Jul 2003 05:57:15 GMT  
 HTTP::Request encoding query

Quote:

> Hi,

> Take the following HTTP::Request :

> $tgt = 'http://www.acme.com/cgi/test.cgi?val=bar|';
> $req = new HTTP::Request('GET' => $tgt);

> Now, when this request is made at some point it is encoded (in iso
latin 1 I
> think) into a HTTP request that looks like

> GET http://www.acme.com/cgi/test.cgi?val=bar%7C

> where '|' has become '%7C'

> So what I'd like to know is if there is any way to prevent this
encoding
> taking place?  I don't mind changing perl source and recompiling if

If it wasn't encoded, it wouldn't be a valid URL, and you might get
a "Malformed Request" response.  Why do you want to do this?

Sent via Deja.com
http://www.deja.com/



Tue, 01 Jul 2003 06:59:30 GMT  
 HTTP::Request encoding query

Quote:



> > Hi,

> > Take the following HTTP::Request :

> > $tgt = 'http://www.acme.com/cgi/test.cgi?val=bar|';
> > $req = new HTTP::Request('GET' => $tgt);

> > Now, when this request is made at some point it is encoded (in iso
> latin 1 I
> > think) into a HTTP request that looks like

> > GET http://www.acme.com/cgi/test.cgi?val=bar%7C

> > where '|' has become '%7C'

> > So what I'd like to know is if there is any way to prevent this
> encoding
> > taking place?  I don't mind changing perl source and recompiling if

> If it wasn't encoded, it wouldn't be a valid URL, and you might get
> a "Malformed Request" response.  Why do you want to do this?

The cgi-bin script I'm trying to work with (which I can't change) does not
unencode its input i.e. change '%7C' back to '|', so I need to send the
cgi-bin arguments un-encoded. Incidentally, I've been capturing IE and Opera
HTTP GET requests, and they don't encode '|' at all, whether in the URL or as
cgi arguments.

--
Iain

Sent via Deja.com
http://www.deja.com/



Tue, 01 Jul 2003 19:39:18 GMT  
 HTTP::Request encoding query

Quote:

> Take the following HTTP::Request :

> $tgt = 'http://www.acme.com/cgi/test.cgi?val=bar|';
> $req = new HTTP::Request('GET' => $tgt);

> Now, when this request is made at some point it is encoded (in iso latin 1 I
> think) into a HTTP request that looks like

> GET http://www.acme.com/cgi/test.cgi?val=bar%7C

> where '|' has become '%7C'

> So what I'd like to know is if there is any way to prevent this encoding
> taking place?

Wouldn't it make more sense to fix the bug in test.cgi?

Quote:
>  I don't mind changing perl source and recompiling if someone
> can point me in the right direction, though obviously I'd prefer not to :)

ISTR that the URI::Escape module actually keeps the characters to be
substuted in a package variable so you can tweak that varaible
directly after loading URI::Escape.  For details see source.

BTW: Why do you want to make your LWP not conform to RFC2396?

--
     \\   ( )
  .  _\\__[oo

 .  l___\\
  # ll  l\\
 ###LL  LL\\



Wed, 02 Jul 2003 02:07:03 GMT  
 HTTP::Request encoding query

(excess quotage now removed)

Quote:
> The cgi-bin script I'm trying to work with (which I can't change) does not
> unencode its input i.e. change '%7C' back to '|',

Then that's your real problem, but it has nothing to do with
programming in Perl.  It would be more on-topic for a group with CGI
in its name.

Quote:
> so I need to send the cgi-bin arguments un-encoded.

No, you don't.  You need a working script, and on the basis of your
postings, it's clear that you haven't got one.

You seem blithely unaware that there is a published interworking
specification, and that you are using a script that doesn't follow
that specification.  On that basis, there could well be a lot of other
things wrong with the script, including security exposures.

You'd be well advised to carefully study the relevant FAQs.

(I'm thinking it's time I added my-deja to at least the scorefile, if
not the killfile.)



Wed, 02 Jul 2003 02:41:34 GMT  
 HTTP::Request encoding query

Quote:


>> The cgi-bin script I'm trying to work with (which I can't change) does not
>> unencode its input i.e. change '%7C' back to '|',

>Then that's your real problem, but it has nothing to do with
>programming in Perl.  It would be more on-topic for a group with CGI
>in its name.

Hardly, IMHO, unless the OP is planning to write his own script
instead of accessing an existing one.

Quote:
>You seem blithely unaware that there is a published interworking
>specification, and that you are using a script that doesn't follow
>that specification.  On that basis, there could well be a lot of other
>things wrong with the script, including security exposures.

I considered a similar reply, but reading the post more carefully
suggested the the OP was trying to access an existing, broken, service
that he has no control over.

Now, if I were in the OP's situation, I would certainly write to the
maintainers of the broken script and tell them to fix it.  But it's
quite possible he has already done that, and they just don't care.

On a completely unrelated issue, I would actually suggest that
HTTP::Request should accept anything given to it as an URI, rather
than trying to fix encoding problems after they've already happened.
RFC 2396, section 2.4.2, has something to say about this:

   Normally, the only time escape encodings can safely be made is when
   the URI is being created from its component parts; each component
   may have its own set of characters that are reserved, so only the
   mechanism responsible for generating or interpreting that component
   can determine whether or not escaping a character will change its
   semantics.

No, it doesn't address the issue directly.  But it does point out that
any encoding done after the URI has been composed should be pointless,
and any cases where it isn't should be considered broken.

--
Ilmari Karonen - http://www.sci.fi/~iltzu/
"Of course, it's filled up with rolls of toilet paper (bought in bulk from
 Costco) and leaky inflatable mattresses that we really ought to throw away,
 but it's a linen closet."  -- Dorothy J. Heydt in rec.arts.sf.composition



Wed, 02 Jul 2003 07:52:51 GMT  
 HTTP::Request encoding query

     (rest of excerpt from RFC 2396 added)

        2.4.2. When to Escape and Unescape

        A URI is always in an "escaped" form, since escaping or
        unescaping a completed URI might change its semantics.

Quote:
>>    Normally, the only time escape encodings can safely be made is when
>>    the URI is being created from its component parts; each component
>>    may have its own set of characters that are reserved, so only the
>>    mechanism responsible for generating or interpreting that component
>>    can determine whether or not escaping a character will change its
>>    semantics.

> No, it doesn't address the issue directly.  

Yes it does!

        2.4.3. Excluded US-ASCII Characters
...
        Other characters are excluded because gateways and other
        transport agents are known to sometimes modify such characters,
        or they are used as delimiters.

           unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

        Data corresponding to excluded characters must be escaped
        in order to be properly represented within a URI.
...

I'd say that's a pretty direct statement that "|" is forbidden
in a URI.  

Quote:
> But it does point out that any encoding done after the URI has
> been composed should be pointless, and any cases where it isn't
> should be considered broken.

Which is why the RFC makes no effort to define an "escaped" URI.
URI's are *always* in escaped form.

--
Joe Schaefer



Wed, 02 Jul 2003 08:20:28 GMT  
 HTTP::Request encoding query

Quote:


> >> The cgi-bin script I'm trying to work with (which I can't change) does not
> >> unencode its input i.e. change '%7C' back to '|',

> >Then that's your real problem, but it has nothing to do with
> >programming in Perl.  It would be more on-topic for a group with CGI
> >in its name.

> Hardly, IMHO, unless the OP is planning to write his own script
> instead of accessing an existing one.

OK, it's a fair call.  I still say the _real_ problem is that he's
working with a broken script, but as you say, it might be outside of
his control.

And since in this specific case the client software _is_ under his
control, I can see that I'd taken the wrong line with the original
posting.  My apologies.



Wed, 02 Jul 2003 08:17:12 GMT  
 HTTP::Request encoding query

Quote:


> > Take the following HTTP::Request :

> > $tgt = 'http://www.acme.com/cgi/test.cgi?val=bar|';
> > $req = new HTTP::Request('GET' => $tgt);

> > Now, when this request is made at some point it is encoded (in iso latin 1 I
> > think) into a HTTP request that looks like

> > GET http://www.acme.com/cgi/test.cgi?val=bar%7C

> > where '|' has become '%7C'

> > So what I'd like to know is if there is any way to prevent this encoding
> > taking place?

> Wouldn't it make more sense to fix the bug in test.cgi?

> >  I don't mind changing perl source and recompiling if someone
> > can point me in the right direction, though obviously I'd prefer not to :)

> ISTR that the URI::Escape module actually keeps the characters to be
> substuted in a package variable so you can tweak that varaible
> directly after loading URI::Escape.  For details see source.

> BTW: Why do you want to make your LWP not conform to RFC2396?

The cgi-bin script I'm working with is out of my control, so I've no choice
but to modify the client instead.  And thanks for this pointer, I'll have a
dig through the source and try this out.

--
Iain

Sent via Deja.com
http://www.deja.com/



Wed, 02 Jul 2003 19:47:13 GMT  
 HTTP::Request encoding query

Quote:


>> But it does point out that any encoding done after the URI has
>> been composed should be pointless, and any cases where it isn't
>> should be considered broken.

>Which is why the RFC makes no effort to define an "escaped" URI.
>URI's are *always* in escaped form.

I snipped the rest of your points because this is the one I wanted to
originally address.  In essence, RFC2396, in defining how an URI is to
be constructed, also defines the set of strings that this construction
process can yield.

Faced with a string outside this set, HTTP::Request, which _only_ sees
the finished product of the construction process, has three options:

 a) Complain and refuse to treat the string as an URI.

 b) GIGO: accept the string as is, possibly emitting a warning.

 c) Try to fix the string by coercing it to the set.

Of those, c is the current behaviour.  I would claim that it is in
fact the worst one: it hides bugs instead of fixing or reporting them,
and it also loses to b in its ability to accommodate broken decoders.

Perhaps it would be ideal if the default behavior were a, with some
option to switch it to b for this specific purpose.

--
Ilmari Karonen -- http://www.sci.fi/~iltzu/
"Get real!  This is a discussion group, not a helpdesk.  You post
 something, we discuss its implications.  If the discussion happens to
 answer a question you've asked, that's incidental." -- nobull in clpm



Wed, 02 Jul 2003 20:31:14 GMT  
 HTTP::Request encoding query


[cut]

Quote:

>         2.4.3. Excluded US-ASCII Characters
> ...
>         Other characters are excluded because gateways and other
>         transport agents are known to sometimes modify such characters,
>         or they are used as delimiters.

>            unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

>         Data corresponding to excluded characters must be escaped
>         in order to be properly represented within a URI.
> ...

> I'd say that's a pretty direct statement that "|" is forbidden
> in a URI.

No arguments there.  Having a brief look over RFC 2396 did seem to suggest
though that a URI only identifies the resource and not the context i.e a URI
would be 'www.foo.com/cgi-bin/test.cgi' and would exclude any arguments to
the script i.e. 'www.foo.com/cgi-bin/test.cgi?attr=val', hence the arguments
would be considered outside the URI?  This is only my take on it though.

As a side note, both Opera 4.0 and IE 5.0 ignore this RFC, they send a '|'
unescaped.

--
Iain

Sent via Deja.com
http://www.deja.com/



Wed, 02 Jul 2003 20:20:40 GMT  
 HTTP::Request encoding query


Quote:


> >> The cgi-bin script I'm trying to work with (which I can't change) does not
> >> unencode its input i.e. change '%7C' back to '|',

> >Then that's your real problem, but it has nothing to do with
> >programming in Perl.  It would be more on-topic for a group with CGI
> >in its name.

> Hardly, IMHO, unless the OP is planning to write his own script
> instead of accessing an existing one.

> >You seem blithely unaware that there is a published interworking
> >specification, and that you are using a script that doesn't follow
> >that specification.  On that basis, there could well be a lot of other
> >things wrong with the script, including security exposures.

> I considered a similar reply, but reading the post more carefully
> suggested the the OP was trying to access an existing, broken, service
> that he has no control over.

This is indeed the case.  I have no control over the service, so I am forced
to adapt the client.  Rather than stating 'I couldn't change' the cgi script
I should have perhaps made it clearer it was outside my control.

--
Iain

Sent via Deja.com
http://www.deja.com/



Wed, 02 Jul 2003 20:22:52 GMT  
 HTTP::Request encoding query

Quote:

> No arguments there.  Having a brief look over RFC 2396 did seem
> to suggest though that a URI only identifies the resource and not
> the context i.e a URI would be 'www.foo.com/cgi-bin/test.cgi' and
> would exclude any arguments to the script i.e.
> 'www.foo.com/cgi-bin/test.cgi?attr=val', hence the arguments would
> be considered outside the URI?  This is only my take on it though.

RFC 2396 applies to the query string as well. Section 3.4 gives the
specification, but it's quite brief. Here is a excerpt from the intro
to section 3:

 This "generic URI" syntax consists of a sequence of four main
 components:

      <scheme>://<authority><path>?<query>

Quote:
> As a side note, both Opera 4.0 and IE 5.0 ignore this RFC, they
> send a '|' unescaped.

I guess this is the real issue.  Unfortunately not enough browsers/CGI
scripts are fully compliant with the specification.  IME many CPAN
modules are usually flexible enough to allow you to do the Wrong Thing
if you really need to. In a followup to my post, Ilmari points out
that HTTP::Request appears to be less flexible in this regard, and
inadequately so for your needs.  DWIMery seems to work against you here.

When this happens, you aren't necessarily powerless to "fix" the
problem yourself.  If the code is OO, by identifying the offending
subroutine/data structure and "overriding" it, you can usually solve
the problem.  In your case it's a twisty path, but you could try
something like this

% cat try.pl
#!/usr/bin/perl -wT
use strict;

use HTTP::Request;
my $r = new HTTP::Request GET => "http://www.foo.com/cgi-bin/test?bar=|";
${$r->uri->[0]} =~ s/%7C/|/;# force broken query string

print $r->as_string; #test request string

% try.pl
GET http://www.foo.com/cgi-bin/test?bar=|

%

or you could encapsulate the brokenness into a package (untested):

#!/usr/bin/perl -wT
use strict;

package HTTP::Request::Broken;
use HTTP::Request;

sub new {
        my $class = shift;

        # "fixes" go here
        ${$r->uri->[0]} =~ s/%7C/|/; # force broken "|"

        bless $r, ref $class || $class;

Quote:
}

package main;
use HTTP::Request::Broken;

my $r = new HTTP::Request::Broken GET =>
        "http://www.foo.com/cgi-bin/test?bar=|";

print $r->as_string;

__END__

HTH
--
Joe Schaefer



Thu, 03 Jul 2003 02:49:27 GMT  
 HTTP::Request encoding query


Quote:

> RFC 2396 applies to the query string as well. Section 3.4 gives the
> specification, but it's quite brief. Here is a excerpt from the intro
> to section 3:

>  This "generic URI" syntax consists of a sequence of four main
>  components:

>       <scheme>://<authority><path>?<query>

You're right, I didn't spot this, sorry.

Quote:
> When this happens, you aren't necessarily powerless to "fix" the
> problem yourself.  If the code is OO, by identifying the offending
> subroutine/data structure and "overriding" it, you can usually solve
> the problem.  In your case it's a twisty path, but you could try
> something like this

[solutions cut]

Thanks, your first solution works fine, I'll probably move to the second
solution long term.  My dirty solution was to modify the URI::Escape source
so the %escapes hash had '|' hashing to '|' and reinstalled the package,
obviously a bad idea.  I wrongly assumed that the character escaping of the
request would be done immediately before sending it, it didn't occur to me
that modifying the request directly would work.

Again, thanks for your help.

--
Iain

Sent via Deja.com
http://www.deja.com/



Fri, 04 Jul 2003 04:40:03 GMT  
 
 [ 14 post ] 

 Relevant Pages 

1. using HTTP::Request to POST a form query

2. HTTP::Request not requesting

3. HTTP::Request::Common and HTTP::Cookies

4. LWP HTTP::Request and HTTP::Cookies

5. encoding query strings manually?

6. need to encode a query string

7. HELP! - Perl Query String Encoding

8. How does HTML encodes GET and POST requests?

9. Perl 6 Request: encoding independence

10. Request: Perl script to query Quote Server

11. Query re: BBC News Online (http://news.bbc.co.uk/)

12. Query re: BBC News Online (http://news.bbc.co.uk/)

 

 
Powered by phpBB® Forum Software