September 19, 2010

Fuzzing of a Mod_rewrite "Protected" Site

There is a growing possibility of encountering some sites on the Internet that hide parameters passed to an application using the mod_rewrite Apache module. Often, web developers have an illusion that this can protect a web application against attacks, such as SQL Injection, Cross-Site Scripting, etc. In fact, this is a common delusion, similar to the delusion that hiding “fingerprints” of services improves the security of the services. There is no doubt that the use of mod_rewrite for hiding parameters passed to an application, just as hiding fingerprints, is a certain obstacle for an attacker. However, as they say, “there is no such obstacle that could not be surmounted”.



Introduction


Mod_rewrite is a fairly powerful tool for URL modifications on-the-fly. This wonderful Apache web server module provides really boundless opportunities. Currently, this module is most frequently used for the following purposes:

  • search engine optimization (SEO);
  • protection against direct downloads via hiding real location of a file;
  • hiding the hierarchy of incoming web application parameters, directories and scripts via centralized dynamic modification;
  • access control; mod_rewrite has the ability to check HTTP header values (including cookie value) for compliance with the rules and perform (or not to perform) redirection depending on the results of the check.
Generally, tuned mod_rewrite hinders web application vulnerabilities detection and exploitation. Let’s examine the reasons:
  1. It is difficult to recognize the real purpose of an URL element. For instance, with a hyperlink of the type "http://www.example.com/main/search/search_bar", it is impossible to understand, which of the URL elements is the relative path from the web server root directory, which element represents the script name, and which one is the script parameter. It really complicates web application structure analysis.
For example, the following rules will allow one and the same link to correspond to different real representations of web application structure:
RewriteRule ^search/(.+)$ search.php?search=$1
According to this rule, the "http://www.example.com/main/search/search_bar" link corresponds to the "search.php" script, which is located in the "/main/" directory relative to the web server root directory and is called using the "search=search_bar" parameter.
RewriteRule ^(.+)/(.+)$ script.php?act=$1&value=$2

According to this rule, the same link corresponds to the "script.php" script, which is located in the "main" directory and is called using the "act=search&value=search_bar" parameters.

RewriteRule ^(.+)/(.+)/(.+)$ $1.php?value1=$2&value2=$3
Also, according to this rule, the same link corresponds to the "main.php" script, which is located in the web server root directory and is called using the "value1=search&value2=search_bar" parameters.
Moreover, due to the flexibility of regular expressions used by mod_rewrite, even similarly looking URLs may refer to absolutely different scripts:
For example, the following rules:
            RewriteRule ^(.+)/(.+)$   script1.php?value1=$2&value2=$3
RewriteRule ^(.+)/(.+)/(.+)$ script2.php?value1=$2&value2=$3
will send 2 similar requests to different scripts:
http://www.example.com/string1/string2
http://www.example.com/string1/string2/string3

  1. It is difficult to determine the programming language used for developing the application. The "http://www.example.com/main/articles/article.html" link may refer to a PHP, ASP, or Perl server script, or a static HTML page.
  1. The existence of regular expressions, which can be used in the rewrite rules of mod_rewrite, allows filtering of input parameters (the example is provided further in the text).
  1. When using automated vulnerability search, it is necessary to substitute certain characters (e. g. "%2F" (hexadecimal encoding) or "%252F" (double encoding) for slash "/"), since they are processed as early as on the stage of URL parsing by mod_rewrite.
In this connection, many developers and administrators prefer to "mask" the existence of vulnerabilities using mod_rewrite, and not to detect and fix the problems. However, this approach, just as any method based on the "Security Through Obscurity" approach, is quite inefficient.

Working technique


The vulnerability search with mod_rewrite enabled is based on the brute-force method. The method allows determining real names of real server script parameters, which take on values from rewrite rules of mod_rewrite. The main characteristics of the brute force method are the following:
  • the use of multiple parameters in one request (the number of parameters is limited to URL maximum length: by default, to 8 192 characters for Apache 2.x and to 16 384 characters for IIS);
  • the use of the binary search method (dichotomy) for determining necessary parameters;
  • dictionary brute force, that is the use of well-known parameters and prefixes, such as id, count, etc., in combination with parameter names brute force (combined attacks);
  • different ways of analyzing results;
  • the possibility of recursive search for parameters (to search for the whole set of parameters of one script).
Let us consider the vulnerabilities search algorithm in a web application with mod_rewrite enabled.


1. Determination of a real server script name
Common and well-known script names, such as "index.php" or "main.php", are used for determination. It is necessary to determine whether a given script exists or not (for example, judging by the occurrence of the "404 – Not Found" error).
Sometimes, rules may rewrite any URLs, starting with a web server root directory, i.e. the   
"http://www.example.com/index.php" request will already refer to an absolutely different script (RewriteRule ^(.+)$ script.php?$1). In such case, there is no point in further checking.

2. Determination of parameters passed to an application
Commonly, web application developers use such parameter names as id, file, etc. Taking this into account, it is to check:
  • common variable names (id, path, page, debug, cat, etc.) – the dictionary of common names;
  • short variable names (1-5 characters) in the alphabet [a-z0-9-_] – brute force;
  • «hybrid» names – using the following formulas:
o      «prefix» + «common parameter name»;
o      «common parameter name» + «postfix»;
o      «brute force» + «delimiter (_,-)» + «common parameter name »;
o      «common parameter name» + «delimiter (_,-)» + «brute force».
  • for the code that uses different suffixes and prefixes in variable names;
  • array variables (param[]).
The latter parameter type requires clarification: in a PHP script, if a parameter initialized as an array (http://example.com/index.php?param[]=value) is referred to as a primitive type, an error will occur (according to the error_reporting level), which will reveal the installation path.
It is also possible to use:
  • the GLOBALS arrays (http://example.com/index.php?GLOBALS[var]=value);
  • common _SERVER variables (in some cases they can also be rewritten);
  • variables in combination with their zend_hash_key (to bypass vulnerable unset() functions). 

3. Determination of parameter values
To solve this problem it is important to use different variants of parameters. Because if, for example, one uses the “1” value, there is a strong possibility that it will match the default variable value and the server will return the same response. Also, different parameters entail different errors. If the search is performed by parameter values in responses (potential XSS, Local File Including, Path Traversal, etc.), it will be required to generate a unique value for each parameter.
Consider the following possible variants, their purpose, advantages and disadvantages:
  • Numerical value: 0,1,2,… It is the simplest variant. The following 3 variants may be enough – 0, 1, and more than 1. The advantage is that the request string length is minimal, which minimally affects performance.
  • Invalid value: «‘», «../», «a», etc. Invalid values are various character sets, which can lead to potential errors, the signatures of which allow determining the existence of a parameter.
  • Fixed parameter value. For example, if the "http://example.com/main/search/search_bar" URL exists, it is required to fix the response of the script to the search_bar value and substitute this value for all parameters. The response that matches the standard value will allow determining the parameter which defines a specific position in a URL. (*)
  • Random number. Generation of long enough random numbers (5-9 characters long) allows searching the script responses for matches. Random numbers increase the probability that there will be no type II errors (false positives).
(*) - For instance, the "http://example.com/main/search/test" URL refers to a search script with the «test» parameter. The purpose is to find the original search bar parameter, so it is required to fix the signature from "http://example.com/main/search/test" (e. g. "the test request results") and perform the brute force of all parameters, using "test" as parameter value. The required parameter can be found through examining responses for the existence of the signature.

4. Request creation and brute force

Create a request of the following type: "http://example.com/script.php?param1=value&param2=value&…&abc=value". The length of the URL request is limited to 8192 characters (for the Apache server). It means that all alphabetic parameters [a-z0-9] up to 4 characters long will require approximately 5880 requests to be bruteforced. If the Internet bit rate is acceptable, the brute force attack takes from 3 to 5 minutes.

5. Response analysis and determination of the existence of a parameter
This part of the algorithm’s work is the most important, because it is this part that determines efficiency. Differences in web applications structure and conditions determine the choice of response analysis method. Consider advantages and disadvantages of different approaches:
  • Determination using response length:
Advantages:
    • The fastest and simplest method. It only requires finding out the length of the standard request (without parameters) and compare it to the length of responses with parameters.
Disadvantages:

    • This method is inapplicable in the situations, when the script generates a response with unique content every time (banners, random content, the existence of a request string in a response).

  • Determination using the signatures of variable random values:
Advantages:
    • Decrease in number of false positives.
Disadvantages:
    • Necessity to remove false positives, related to a normal situation when a random value is included in a response (for example, when request data are normally included in a response).
    • Decrease in speed due to the increase in the length of parameter values (from 1-2 characters to 5-7 characters, which increases the required brute force time to 1.5-2 times for parameters that are up to 4 characters long).
  • Determination using fixed responses (*)
Advantages:
o       Accurate search.
Disadvantages:
o       Pointed search – only one specific parameter is searched for.

(*) - For instance, when the purpose is to determine parameters on the page "http://example.com/main/search/search_bar", it is required to substitute the "search_bar" value for all variables and monitor the response. The valid parameter will be the one that was included in the request, the response to which matches the content of the initial page.
  • Determination using error signatures
Advantages:
o       Almost absolute absence of the type II errors (false positives), which prevents the identification of existence of a parameter in a request when actually there is no parameter.
Disadvantages:
o       It is required to use error signature database.
o       If the existence of a parameter does not cause an error, then this method will not reveal the parameter.

As follows from the above, different methods are efficient in different situations. The simplest and sometimes the most efficient method is the first one, however, sometimes it is impossible to determine a parameter even using complicated methods. Therefore, it is advisable to use different methods or to make a decision what method will reduce the number of false positives in each specific case. Eventually, if 10-15 variables will be revealed in the working process, it is always possible to manually check them for false positives.

Practical realization

For practical demonstration of efficiency of the methods discussed, the Utility can be used. The utility has the following features:
  1. Brute force using alphabet, common parameters (specified in the params.txt file), or in combined mode («common parameter name» + «delimiter (_,-)» + «brute force»). It is also possible to set a flag to ensure the representation of every variable as an array (param[]);
  2. Use of dichotomy for parameters search;
  3. Search using response length (it is required to specify the initial page and its size);
  4. Possibility of specifying an initial page with parameters (e. g. "http://example.com/index.php?page=admin") to search for multiple nested parameters (specified parameters will naturally be excluded from the check);
  5. Possibility of specifying characters that will be used as variable values

Some of these mechanisms are implemented in the MaxPatrol Compliance and Vulnerability Management System.

Conclusion

The discussed technique is suitable not only for security assessment of web applications which use mod_rewrite, but also to search for variables which may be rewritten if the register_globals parameter is enabled, and to search for undocumented features, such as various debug modes, etc.
Examples of the program’s performance:

Parameters, required for functioning:

Brute force up to 3 characters:
Dictionary brute force of another resource:

(the length parameter does not matter here)
Note that one more parameter was obtained when another parameter value was used:

Hence, index.php without parameters coincides with index.php?page=1
That is how a combined attack can help to reveal a parameter which would be difficult to reveal using brute force and which is absent in the dictionary:


 References:

  1. http://www.owasp.org/index.php/Double_Encoding
  2. http://dimoning.ru/kak-napisat-svoy-dvizhok-bloga-1.html
  3. http://webscript.ru/stories/07/02/01/2099269]
  4. http://raz0r.name/mysli/proveryajte-tip-dannyx/
  5. http://www.hardened-php.net/globals-problem
  6. http://www.hardened-php.net/advisory_192005.78.html
  7. http://www.wisec.it/vulns.php?id=10
  8. http://www.hardened-php.net/hphp/zend_hash_del_key_or_index_vulnerability.html

5 comments:

  1. Interesting article. Are we going to see the ZZZ tool or the PHP fuzzer described in the article?

    ReplyDelete
  2. determining the script name is pretty easy.
    just a lil trick to use:

    H=ha.ckers.org; echo -ne "POST /blog/category/webappsec/books/ HTTP/1.1\nHost: $H\nConnection: close\nContent-length: x\n\n" | nc $H 80 | less

    note the content-length field's value... its invalid :P

    This will produce a simple HTTP/1.1 413 Request Entity Too Large, with a common 413 error message/html followed by the site's code.

    take a closer look...

    HTTP/1.1 413 Request Entity Too Large
    Date: Mon, 20 Sep 2010 14:56:41 GMT
    Server: Apache
    Connection: close
    Content-Type: text/html; charset=iso-8859-1

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>413 Request Entity Too Large</title>
    </head><body>
    <h1>Request Entity Too Large</h1>
    The requested resource<br />/blog/index.php<br />

    ...etc html blah blah

    rewite revealed, its pointin to /blog/index.php
    apache bug/"feature", works most of the time.
    thats all :)

    -CJ

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete