|
|
An Introduction to CGI - The Common Gateway Interface
by Jay Eckles
Decoding the Query String
The query string is defined as anything which follows the first
? in the URL, or the information attached to the HTTP
header in the case of a POST request. Once you have the query string in
a buffer in your program (by reading stdin or the environment variable
QUERY_STRING), you need to decode it. It will be encoding using
standard URL encoding; this scheme requires that any special or unsafe
characters be replaced with 3 characters: the character %
and the two hexadecimal digits that make up the special or unsafe
character's hexadecimal US-ASCII value. For example, the character
" " (a space) would be encoded as "%20".
Examples of special and unsafe characters are <,
>, ", ', {,
}, |, \, ^,
~, [, ], `, and all
whitespace characters such as space and tab. Also, spaces may be
replaced with a plus (+) rather than being encoded.
Here's an example of a query string before and after decoding:
Before Decoding: This+is+an+example%2E+It%2Cs+easy%2E
After Decoding: This is an example. It's easy.
Any time you see a percent sign in an undecoded query string, you
take the next two characters to be a hexadecimal number corresponding to
an ASCII value. You should then replace the percent sign and the two
digit hexadecimal number with the correct character. Once you have done
this for every occurrence of a percent sign in the query string, it's
decoded.
See RFC 1738 at http://www.w3.org/Addressing/rfc1738.txt for the URL encoding specification.
[Contents] [Next] [Previous]
If you have any questions or would like to contact me for any reason, please email me at j.eckles@computer.org.
|