Jay Eckles
Menu
Tutorials
  -CGI

 

Search

An Introduction to CGI - The Common Gateway Interface

by Jay Eckles

Decoding the Query String

The query string is defined as anything which follows the first ? in the URL, or the information attached to the HTTP header in the case of a POST request. Once you have the query string in a buffer in your program (by reading stdin or the environment variable QUERY_STRING), you need to decode it. It will be encoding using standard URL encoding; this scheme requires that any special or unsafe characters be replaced with 3 characters: the character % and the two hexadecimal digits that make up the special or unsafe character's hexadecimal US-ASCII value. For example, the character " " (a space) would be encoded as "%20". Examples of special and unsafe characters are <, >, ", ', {, }, |, \, ^, ~, [, ], `, and all whitespace characters such as space and tab. Also, spaces may be replaced with a plus (+) rather than being encoded. Here's an example of a query string before and after decoding:

Before Decoding: This+is+an+example%2E+It%2Cs+easy%2E
After Decoding: This is an example. It's easy.

Any time you see a percent sign in an undecoded query string, you take the next two characters to be a hexadecimal number corresponding to an ASCII value. You should then replace the percent sign and the two digit hexadecimal number with the correct character. Once you have done this for every occurrence of a percent sign in the query string, it's decoded.

See RFC 1738 at http://www.w3.org/Addressing/rfc1738.txt for the URL encoding specification.

[Contents] [Next] [Previous]


If you have any questions or would like to contact me for any reason, please email me at j.eckles@computer.org.