Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove ansi escape sequences from escaped xunit output #4527

Closed
wants to merge 7 commits into from

Conversation

trieloff
Copy link

@trieloff trieloff commented Dec 2, 2020

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • All new code requires tests to ensure against regressions.

Description of the Change

The utils.escape function is amended to strip the escape sequence using a regular expression.

Alternate Designs

  • patch the underlying he library: dismissed because the direct fix is likely faster to become usable
  • filter the output before running he: dismissed because the correct escaping in JavaScript is harder to find
  • filter out all invalid XML characters: dismissed because this is not a problem observed with other characters

Why should this be in core?

The xunit reporter is in core and should not be buggy

Benefits

xunit reporter output can be consumed by tools that expect valid XML

Possible Drawbacks

Tools that process the XML as plain (unicode) text (unlikely) can no longer see the ANSI escape sequence.

Applicable issues

fixes #4526

This is a bug fix (patch release)

@coveralls
Copy link

Coverage Status

Coverage remained the same at 94.143% when pulling d417538 on trieloff:issue/4526 into bc8ce05 on mochajs:master.

Copy link
Contributor

@outsideris outsideris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to escape or remove all invalid characters for XML?

lib/utils.js Outdated Show resolved Hide resolved
@juergba
Copy link
Contributor

juergba commented Mar 9, 2021

There is a related xunit issue #3586.

@juergba juergba changed the title remove ansi escape sequences from escaped output (fixes #4526) xunit: remove ansi escape sequences from escaped output Mar 9, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2021

This PR hasn't had any recent activity, and I'm labeling it stale. Remove the label or comment or this PR will be closed in 14 days. Thanks for contributing to Mocha!

@github-actions github-actions bot added the stale this has been inactive for a while... label Jul 9, 2021
@trieloff
Copy link
Author

This issue has been hanging around for 8 months now, and I'm unsure what I can do to get it moving again. There is an open thread that has multiple possible solutions to the problem, but no expression of preference from the Mocha team.

@JoshuaKGoldberg
Copy link
Member

👋 coming back to this @trieloff, are you still interested in working on this PR? As of #5027 there are again maintainers who can review it now. No worries if you no longer have time - but if you do that'd be great!

@trieloff
Copy link
Author

trieloff commented Dec 8, 2023

Hi @JoshuaKGoldberg – that's unexpected – but not unwelcome. I'm happy to rebase the PR to the latest main. Do you want to reopen this PR or have me create a new one?

@JoshuaKGoldberg
Copy link
Member

JoshuaKGoldberg commented Dec 8, 2023

Great! Let's go ahead and reopen this one.

We also discussed that we don't like the requirement of rebasing, so if you want to do a more casual merge that works too. Whatever's easiest on your end. 🙂

Copy link

linux-foundation-easycla bot commented Dec 8, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@trieloff
Copy link
Author

trieloff commented Jan 3, 2024

Ok, let me get that CLA signed, then.

@JoshuaKGoldberg JoshuaKGoldberg added status: in triage a maintainer should (re-)triage (review) this issue semver-patch implementation requires increase of "patch" version number; "bug fixes" and removed stale this has been inactive for a while... labels Mar 2, 2024
@JoshuaKGoldberg JoshuaKGoldberg changed the title xunit: remove ansi escape sequences from escaped output fix: remove ansi escape sequences from escaped xunit output Mar 4, 2024
@trieloff
Copy link
Author

trieloff commented May 6, 2024

  • CLA has been signed (thanks, corporate overlords)
  • master has been merged
  • conflicts have been resolved

@trieloff
Copy link
Author

@JoshuaKGoldberg can I get some eyes on this, please?

@JoshuaKGoldberg
Copy link
Member

Yes! Thanks for the prod - this slipped off my radar. Reviewing now!

Copy link
Member

@JoshuaKGoldberg JoshuaKGoldberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swell! A nice direct change, with a test case - thank you! 🙌

Will wait for another maintainer to review. In the interim, just leaving one small nitpick, nothing particularly important.

test/integration/reporters.spec.js Outdated Show resolved Hide resolved
@JoshuaKGoldberg JoshuaKGoldberg removed the status: in triage a maintainer should (re-)triage (review) this issue label May 30, 2024
@JoshuaKGoldberg JoshuaKGoldberg requested a review from a team May 30, 2024 20:42
Copy link
Member

@voxpelli voxpelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a test that verifies that invalid XML tags are escaped:

it('replaces invalid xml characters', function () {
expect(
utils.escape('\x1B[32mfoo\x1B[0m'),
'to be',
'[32mfoo[0m'
);
// Ensure we can handle non-trivial unicode characters as well
expect(utils.escape('💩'), 'to be', '💩');
});

This should extend those tests rather than introduce a new xunit specific one, as this change is not specific to the xunit parts of the code.

Also added a suggested code change that actually makes use of a regexp from he

Comment on lines +36 to +41
return he
.encode(String(html), {useNamedReferences: false})
.replace(/&#x([0-9A-F]);/g, (match) => {
const val = Number.parseInt(match);
return val === 0x9 || val === 0xA || val === 0xD || (val >= 0xE000 && val <= 0xFFFD) || (val >= 0x10000 && val <= 0x10FFFF) ? `&#x${match};` : '';
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Turns out that he already has this in itself, just doesn't expose it other than through strict: true, which throws when it encounters it.

Maybe we could use that regexp and try to land a PR that either exposes it or adds a skipInvalid: true option or such?

Suggested change
return he
.encode(String(html), {useNamedReferences: false})
.replace(/&#x([0-9A-F]);/g, (match) => {
const val = Number.parseInt(match);
return val === 0x9 || val === 0xA || val === 0xD || (val >= 0xE000 && val <= 0xFFFD) || (val >= 0x10000 && val <= 0x10FFFF) ? `&#x${match};` : '';
});
return he.encode(String(html).replace(regexInvalidRawCodePoint, ''), {useNamedReferences: false});

Requires this to be added as well:

// From https://github.com/mathiasbynens/he/blob/36afe179392226cf1b6ccdb16ebbb7a5a844d93a/he.js#L53
var regexInvalidRawCodePoint = /[\0-\x08\x0B\x0E-\x1F\x7F-\x9F\uFDD0-\uFDEF\uFFFE\uFFFF]|[\uD83F\uD87F\uD8BF\uD8FF\uD93F\uD97F\uD9BF\uD9FF\uDA3F\uDA7F\uDABF\uDAFF\uDB3F\uDB7F\uDBBF\uDBFF][\uDFFE\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]/g;

@JoshuaKGoldberg JoshuaKGoldberg added the status: waiting for author waiting on response from OP - more information needed label Jun 26, 2024
@trieloff trieloff closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch implementation requires increase of "patch" version number; "bug fixes" status: waiting for author waiting on response from OP - more information needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Bug: xunit reporter does not strip ansi escape sequences, leading to invalid XML
6 participants