PHP: The issue between parse_url and real file path
Use parse_url($url) to check $url‘s scheme and pass $url to file_get_contents() will lead to LFR issue.
Table of Contents
The story
Below is a snippet modified from DEVCORE Wargame at HITCON CMT 2019:
1 |
|
In the original challenge, it’s a simple proxy service in blackbox. $url is get from $_POST['url'], only allow HTTP or HTTPS protocal by using parse_url() to check $url.
It’s a classical SSRF challenge until we need to read the local file with only HTTP/HTTPS. The exploit is something like this:
1 |
|
Wait, what? http: with double backslashes \\ ? How can this malformed url pass the parse_url() scheme check and lead to local file read problem?
Interesting about parse_url()
Here is a normal url pass to parse_url()
1 | var_dump(parse_url("http://localhost/../../../../../etc/passwd")); |
1 | array(3) { |
The scheme, host and path are all parsed correctly. How about malformed url with double backslashes?
1 | var_dump(parse_url("http:\\localhost/../../../../../etc/passwd")); |
1 | array(2) { |
As you can see, scheme is still parsed as http but no host value parsed, the remaining string are all parsed as path.
Here comes a problem, parse_url() is successed without any error even warning, but apparently the parsed result is not what we expected. So it pass the sheme check and move on to file_get_content() function.
Dig into PHP source code
Out of curiosity, I decided to dig into PHP’s C source code to see how file_get_contents() work with this path http:\\localhost/../../../../../etc/passwd.
My environment is Ubuntu Desktop 18.04 with self-compiled PHP 7.3.8 for gdb debug.
1 | $ git clone http://git.php.net/repository/php-src.git |
Test code:
1 |
|
The problem is located at Zend/zend_cirtual_ced.c tsrm_realpath_r(). This is a recursive function that traversing the entire path string to see if it’s containing /. or /.. at the end and remove them. For example our malicious path http:\\localhost/../../../../../etc/passwd will expand to /home/theo/Desktop/http:\\localhost/../../../../../etc/passwd according to your php-cli’s current working path before pass into tsrm_realpath_r().
Snippet of tsrm_realpath_r():
1 | ... |
Because it only look for /. and /.., it will threat http:\\localhost as an unnecessary leading directory as well as Desktop , theo, and so on. Then remove these unnecessary leading directory according to how many /.. we have in path string. In this case we will get /etc/passwd as our final file real path, lead to a Local-File Read vulnerability.