Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional validation for size unit #193

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

pierlon
Copy link

@pierlon pierlon commented May 6, 2020

This PR adds additional validation when parsing a size unit.

With this change, when parsing in strict mode an UnexpectedTokenException error will be thrown when an invalid size unit occurs after a number. When in lenient mode, however, if a valid size unit identifier is detected, it will be kept and the proceeding set of characters will be stripped. For example:

Dimension value Strict parsing Lenient parsing
Before 1radsss 1 rad sss 1 rad sss
After 1radsss UnexpectedTokenException Value is removed

Also, there is no such unit named turns, but there is one called turn (ref), which I suppose is what was meant here 😄.

@pierlon
Copy link
Author

pierlon commented May 6, 2020

That failing test seems to be unrelated to the change here.

@westonruter
Copy link
Contributor

Yeah, if I try validating:

@keyframes spin {
  to {
    transform: rotate(1turns);
  }
}

Then it fails with Unknown dimension: 1turns

@pierlon pierlon changed the title Change turns to turn Validate size unit May 11, 2020
@pierlon pierlon changed the title Validate size unit Add additional validation for size unit May 11, 2020
lib/Sabberworm/CSS/Value/Size.php Outdated Show resolved Hide resolved
tests/Sabberworm/CSS/ParserTest.php Outdated Show resolved Hide resolved
Comment on lines 42 to 56
while (true) {
$sChar = $oParserState->peek(1, $iOffset);
$iPeek = ord($sChar);

// Ranges: a-z A-Z 0-9 %
if (($iPeek >= 97 && $iPeek <= 122) ||
($iPeek >= 65 && $iPeek <= 90) ||
($iPeek >= 48 && $iPeek <= 57) ||
($iPeek === 37)) {
$sParsedUnit .= $sChar;
$iOffset++;
} else {
break;
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely comfortable with this approach of getting the unit; better solutions are welcomed.

Copy link
Contributor

@westonruter westonruter May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a while loop, another option would be to peek 10 chars and then use preg_match() to match the range, and if matched then capture that as the $sParsedUnit and increase the $iOffset by the length.

if ( preg_match( '/^[a-zA-Z0-9%]+/', $oParserState->peek(10, $iOffset), $matches ) ) {
    $sParsedUnit = $matches[0];
    $iOffset += strlen( $matches[0] );
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason behind the 10 characters? Or is it that a unit length of 10 characters or more would be unlikely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It could be set to the max length of the unit chars.

@westonruter
Copy link
Contributor

@sabberworm Anything else you'd like to see here?

@sabberworm
Copy link
Contributor

I’m just wondering if we really need to hard-code all possible units. To be future-proof, couldn’t we just parse everything as a unit that looks like a unit?

I don’t know if there are contexts where it’s ambiguous whether a token is a unit or not but I’d think everything that starts with a number is a CSSSize and every token that follows such a number without white-space in between is a unit. Or am I missing something?

I know this implies a less strict validation but the parser was always meant as a parser that checks syntax, not semantics.

@pierlon
Copy link
Author

pierlon commented May 29, 2020

I don’t know if there are contexts where it’s ambiguous whether a token is a unit or not but I’d think everything that starts with a number is a CSSSize and every token that follows such a number without white-space in between is a unit.

Yes, you're correct. In this case the parser would be looking for either a dimension (number immediately followed by a unit identifier) or a percentage ( number immediately followed by a percent sign).

Removing the hard-coded units along with the validation of the size unit does seem like the best option here, and if that's the case, it will be a breaking change and a major release would be warranted.

@pierlon
Copy link
Author

pierlon commented Nov 2, 2020

⚠️ Rebasing with latest changes from master.

@oliverklee oliverklee changed the base branch from master to main February 7, 2024 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants