-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various multibyte bugs #22
base: master
Are you sure you want to change the base?
Conversation
Ugh. It seems I know how I'll spend the upcoming holiday. But before I start, can you please take a look at http://stackoverflow.com/questions/22266311/to-parse-sql-with-regex-excluding-quoted-literals I am going to change the parser completely, to allow named placeholders. How do you think - may be it would be better to get rid of regexp and create an old good char-by-char lexer? |
I can't seem to be able to reproduce your code. It does raise an error, which is still a bad thing, but it doesn't allow injection either. Here it goes:
var_dump($data); what I am doing wrong? |
I'm going to have a go at reproducing this now and see if I can find out. As a possible starting point: on my machine, I can't use MySQLi to send a query that contains an unclosed
instead of
I'm not sure whether this behaviour is as a result of the version of MySQL I'm running or the version of MySQLi I have installed, but perhaps it accounts for your failure to reproduce as well? Once I've reproduced this, I'll let you know of any further obstacles I come across. |
I can reproduce this just fine using eggyal's example query (as long as I create a table called |
Actually, the problem is (slightly) more serious than I first thought. Some commands, especially DDL ones, can accept object identifiers that don't exist. For example: $_GET['colname'] = "\x8c` INT, DROP COLUMN id, RENAME TO xyz -- ";
$db = new SafeMySQL(['charset' => 'gbk']);
$db->query('ALTER TABLE users ADD COLUMN ?n DATE', $_GET['colname']); This will send to MySQL: ALTER TABLE users ADD COLUMN `宍` INT, DROP COLUMN id, RENAME TO xyz -- ` DATE So, using a vulnerable connection encoding and parameterising identifiers that don't exist (such as will be the case in many DDL commands) could lead to pretty devastating injection. And having a dynamic schema is pretty much the only justification I can see for parameterising identifiers in the first place, so the second criterion is not all that unlikely... |
This should be merged. |
@Alphapixels unfortunately, there are too much garbage among these changes. One should never mix critical changes and just code prettifying in the same commit. Also, some changes can be questioned. As a solution I would rather forbid using encodings other than utf-8 and single-byte ones. |
I agree with idea to support only utf-8 and single-bytes encodings (btw it needs to be checked that the used encoding is consistent with that limitations). |
There are a number of multibyte bugs in this library.
str_replace()
is not multi-byte safe, yet is being used to escape quote characters within SQL object identifiers. This can lead to SQL injection in certain edge cases. For example:The query that is sent to the server:
Fortunately this vulnerability is somewhat mitigated by the fact that the statement will fail if the resulting identifier (the table name of
宍
in the above example) is not the name of a known object—and an attacker is limited in the identifiers that he can use, since the final byte must be0x60
(an ASCII backtick); furthermore, since the query is dispatched withmysqli::query()
, the server will ignore attempts to introduce a second statement. So the exploit risk is low.The problem can however be fixed by using
mb_ereg_replace()
instead ofstr_replace()
and this is introduced by my first commit (together with a few other OCD changes to string literals).My second commit then moves the
mb_regex_encoding()
call (necessary to set the encoding for the above operation) into theescapeIdent()
function in order to avoid problems if its setting is changed between object construction and statement preparation.In searching for parameters using
preg_split()
,prepareQuery()
implicitly assumes that the raw statement is in ASCII—yet the user is allowed to choose alternative connection character sets. This may not be an unrealistic assumption, since every connection character encoding that MySQL currently supports is a compatible superset of ASCII. However, there's no sensible reason not to perform this operation in the connection character set, which is what my third commit achieves.As an aside, it may be worth noting that performing this split operation using regular expressions is somewhat naive. For example, the raw query might contain a parameter code within a string literal:
Here the library will attempt to replace the
?s
within the string literal. Of course, in this example the query will fail due to a mismatch in the number of parameters and provided values—but it would be better to properly parse the SQL to identify the context in which placeholder strings are encountered: PHP SQL Parser might be useful to that end. In the meantime, one might at least assert that the pattern should end on a word boundary.On the same basis (i.e. that the SQL might theoretically be in an ASCII-incompatible encoding), placeholder expansion must ensure that all values and inserted SQL code are encoded using that connection character set. This is implemented in my fourth commit.