PHP: The issue between parse_url and real file path
Use parse_url($url)
to check $url
‘s scheme and pass $url
to file_get_contents()
will lead to LFR issue.
Table of Contents
The story
Below is a snippet modified from DEVCORE Wargame at HITCON CMT 2019:
1 |
|
In the original challenge, it’s a simple proxy service in blackbox. $url
is get from $_POST['url']
, only allow HTTP or HTTPS protocal by using parse_url()
to check $url
.
It’s a classical SSRF challenge until we need to read the local file with only HTTP/HTTPS. The exploit is something like this:
1 |
|
Wait, what? http:
with double backslashes \\
? How can this malformed url pass the parse_url()
scheme check and lead to local file read problem?
Interesting about parse_url()
Here is a normal url pass to parse_url()
1 | var_dump(parse_url("http://localhost/../../../../../etc/passwd")); |
1 | array(3) { |
The scheme, host and path are all parsed correctly. How about malformed url with double backslashes?
1 | var_dump(parse_url("http:\\localhost/../../../../../etc/passwd")); |
1 | array(2) { |
As you can see, scheme is still parsed as http
but no host value parsed, the remaining string are all parsed as path.
Here comes a problem, parse_url()
is successed without any error even warning, but apparently the parsed result is not what we expected. So it pass the sheme check and move on to file_get_content()
function.
Dig into PHP source code
Out of curiosity, I decided to dig into PHP’s C source code to see how file_get_contents()
work with this path http:\\localhost/../../../../../etc/passwd
.
My environment is Ubuntu Desktop 18.04 with self-compiled PHP 7.3.8 for gdb debug.
1 | $ git clone http://git.php.net/repository/php-src.git |
Test code:
1 |
|
The problem is located at Zend/zend_cirtual_ced.c tsrm_realpath_r()
. This is a recursive function that traversing the entire path string to see if it’s containing /.
or /..
at the end and remove them. For example our malicious path http:\\localhost/../../../../../etc/passwd
will expand to /home/theo/Desktop/http:\\localhost/../../../../../etc/passwd
according to your php-cli’s current working path before pass into tsrm_realpath_r()
.
Snippet of tsrm_realpath_r()
:
1 | ... |
Because it only look for /.
and /..
, it will threat http:\\localhost
as an unnecessary leading directory as well as Desktop
, theo
, and so on. Then remove these unnecessary leading directory according to how many /..
we have in path string. In this case we will get /etc/passwd
as our final file real path, lead to a Local-File Read vulnerability.