Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Perseus JSON parser issues identified via production logs #2246

Merged
merged 14 commits into from
Feb 18, 2025

Conversation

benchristel
Copy link
Member

There are a few lingering issues with Perseus data that somehow didn't
show up in my content data dump. This PR updates the parser to handle them.

Issue: none

Test plan:

  • Deploy to webapp
  • Search Sentry for "Perseus parser"
  • There should be fewer errors after the deploy.

@benchristel benchristel self-assigned this Feb 14, 2025
@@ -406,8 +406,7 @@ export type PerseusImageBackground = {
// The scale of the image
// NOTE: perseus_data.go says this is required, but nullable, even though
// it isn't necessary at all.
// Yikes, production data as this as both a number (1) and string ("1")
scale?: number | string;
scale?: number;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We convert scale to a number in the parser now, so this type awkwardness can go away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is data-schema now the representation of the parsed JSON? Because it was the representation of the stored JSON, so the parser wouldn't affect that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data-schema should represent the latest version of the stored JSON. The purpose of the parser is to migrate past versions of the JSON to match what's in data-schema. Really, data-schema never represented "the" stored JSON, because we have multiple versions of JSON stored.

Copy link
Contributor

github-actions bot commented Feb 14, 2025

npm Snapshot: Published

Good news!! We've packaged up the latest commit from this PR (f6125cf) and published it to npm. You
can install it using the tag PR2246.

Example:

yarn add @khanacademy/perseus@PR2246

If you are working in Khan Academy's webapp, you can run:

./dev/tools/bump_perseus_version.sh -t PR2246

Copy link
Contributor

github-actions bot commented Feb 14, 2025

Size Change: +218 B (+0.01%)

Total Size: 1.47 MB

Filename Size Change
packages/perseus-core/dist/es/index.js 48.2 kB +218 B (+0.45%)
ℹ️ View Unchanged
Filename Size
packages/kas/dist/es/index.js 39 kB
packages/keypad-context/dist/es/index.js 760 B
packages/kmath/dist/es/index.js 86.9 kB
packages/math-input/dist/es/index.js 77.7 kB
packages/math-input/dist/es/strings.js 1.79 kB
packages/perseus-editor/dist/es/index.js 689 kB
packages/perseus-linter/dist/es/index.js 22.1 kB
packages/perseus-score/dist/es/index.js 115 kB
packages/perseus/dist/es/index.js 367 kB
packages/perseus/dist/es/strings.js 6.57 kB
packages/pure-markdown/dist/es/index.js 3.66 kB
packages/simple-markdown/dist/es/index.js 12.5 kB

compressed-size-action

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code moved from perseus-item.ts, below.

@benchristel benchristel requested a review from Myranae February 18, 2025 18:59
@@ -406,8 +406,7 @@ export type PerseusImageBackground = {
// The scale of the image
// NOTE: perseus_data.go says this is required, but nullable, even though
// it isn't necessary at all.
// Yikes, production data as this as both a number (1) and string ("1")
scale?: number | string;
scale?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is data-schema now the representation of the parsed JSON? Because it was the representation of the stored JSON, so the parser wouldn't affect that right?

@@ -40,7 +40,7 @@ export const parseNumericInputWidget: Parser<NumericInputWidget> = parseWidget(
// the data, simplify this.
value: optional(nullable(number)),
status: string,
answerForms: optional(array(parseMathFormat)),
answerForms: defaulted(array(parseMathFormat), () => undefined),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the different here that originally it was possible for the answerForms key to not exist but now it must exist but it's value is possibly undefined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that null was previously rejected as invalid, but now it's accepted and converted to undefined.

optional(T) allows either T or undefined, and rejects null.

defaulted(T, () => U) will accept T, undefined, or null as input, and convert null and undefined to U.

// "version": null,
// "static": false,
// "graded": false,
// "alignment": "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, probably a bug with serialization. Some of our serialization is based on a blocklist of properties rather than an allowlist, which occasionally introduces some accidental properties.

// Widget IDs with 0 currently cause a full-page crash when the
// exercise is rendered in webapp!
it("accepts a widget ID numbered 0", () => {
// Widget IDs with 0 cause a full-page crash when an exercise is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a ticket to address this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, probably not? We're not writing content with IDs of 0 into datastore anymore. And Jeremy has talked about wanting the IDs to be completely opaque strings, so addressing the issue with 0 separately seems like maybe a waste of effort.

// be 0, at least for image widgets. However, if widget IDs in an exercise
// contain 0, the exercise renderer will blow up. We allow 0 here for
// compatibility with articles.
if (typeof rawValue !== "string" || !/^(0|[1-9][0-9]*)$/.test(rawValue)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (typeof rawValue !== "string" || !/^(0|[1-9][0-9]*)$/.test(rawValue)) {
if (typeof rawValue !== "string" || !/^(\d+)$/.test(rawValue)) {

I'm guessing we can't do this because we don't want 01?

I'm a little worried about codifying the idea that a widget ID is widget 1 whereas where we're trying to get to is widget ID is something like /[a-z0-9 -]+/ (a uuid) - it just happens that they are made of a widget type and a number now.

I know parseStringToNonNegativeInt is more generic than just an ID, but I thought I'd bring it up since the comment brings up widget IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we can't do this because we don't want 01?

Yeah, a leading 0 causes the number to be parsed as octal, so e.g. 011 will parse to 9.

I'm a little worried about codifying the idea that a widget ID is widget 1 whereas where we're trying to get to is widget ID is something like /[a-z0-9 -]+/ (a uuid) - it just happens that they are made of a widget type and a number now.

It's easy to relax constraints, but harder to tighten them (since we'd have to make sure that our existing data adheres to the tighter constraint). I think it will be straightforward to allow more widget ID formats in the future.

@@ -0,0 +1,2 @@

{"question":{"content":"\n[[☃ image 1]]\n\n##A history of conformity in American schools\n\n*By Heather M. Meston*\n\n___\n\n1. Ever wondered why you sit in rows of desks, stand in silent lines, or follow dozens of other rules during the school day? American public schools are often accused of teaching *conformity.* But what is conformity? It’s when people think and act the same. And, in fact, teaching conformity was a huge part of why American public schools were created! \n\n###The Prussian Model of Education comes to the United States\n\n2. Where did the idea that schools should teach conformity come from?\n\n3. Prussia, it turns out. No, not Russia: Prussia, which today is Germany. After France defeated Prussia in 1807, Prussian leaders blamed the loss on their soldiers. They thought the soldiers were doing too much independent thinking and not enough following orders. So, to eliminate this independence, they imposed required schooling. They designed schools to teach values like duty, obedience, and love of their country, as well as math and reading. The government used taxes to pay for all Prussian children to go to school through 8th grade.\n\n4. \tIn the United States, schools did exist prior to the 1800s, but they often cost money and were open mostly only to White boys. Then in 1843, Horace Mann, often called the Father of American Education, traveled to Prussia to study their schools. He was very impressed by Prussian schools. So, he began fighting for the United States to align its schools with Prussia’s. \n\n5.\tMann called his free, required schools “common schools.” He wanted them to focus on both academics and values, like the Prussian schools. The arrival of large populations of immigrants from Southern and Eastern Europe further shaped the idea that schools should teach values. Common schools were viewed as a way to make immigrants “more American.” Schools would teach immigrants English, democratic values like voting, and how to be productive workers. Common schools became one way to make sure that all people conformed to morals that American leaders valued.\n\n###Who gets to be “normal” in schools today?\n\n6. \tToday, many American public schools still expect students to learn the same skills, at the same time, in the same ways. They also often teach certain kinds of values, like being quiet and following directions. But studies show that this focus on conformity can cause particular problems for certain students, such as those with disabilities or those who don’t speak English as their first language.\n\n7. \tTake, for example, students with Attention Deficit Hyperactivity Disorder, or ADHD. People with ADHD can have incredible talents: many of them have high levels of energy, creativity, and hyperfocus. Hyperfocus is when a person can tune out distractions and focus on one task for a long time. These talents can help people with ADHD do amazing things. Walt Disney, Leonardo Da Vinci, and Albert Einstein are all believed to have had ADHD. The creativity, energy, and hyperfocus they brought to their passions helped them achieve their extraordinary levels of success \n\t\n8. Yet, despite these talents, students with ADHD often struggle in school. They may have trouble conforming to school rules. Sitting for long periods of time can be hard for all students, but particularly for students with ADHD. Students with ADHD may also find it hard to have to learn the same material at the same speed as everyone else. Researchers have shown that students with ADHD learn best in schools that focus on movement, choice, and creativity. In schools like these, students with ADHD can often be very successful. However, because of standardized tests, required lessons, and rules that teachers have to follow, it can be hard for schools to create these kinds of experiences for students.\n\n\n[[☃ explanation 1]]=====\n**Based on the text, which of the following statements is likely true?**\n\n\n[[☃ radio 1]]\n\n\n\n\n","widgets":{"explanation 1":{"type":"explanation","static":false,"graded":true,"alignment":"default","id":null,"key":null,"version":{"major":0,"minor":0},"options":{"showPrompt":"Attribution","hidePrompt":"Hide attribution","explanation":"Public domain image provided by the Boston Public Library","widgets":{},"static":false}},"image 1":{"type":"image","static":false,"graded":true,"alignment":"block","id":null,"key":null,"version":{"major":0,"minor":0},"options":{"title":"","labels":[],"caption":"","alt":"A class in 1895","backgroundImage":{"url":"https://cdn.kastatic.org/ka-content-images/2cd757f8be2af8caa65c5e9c2c8e491f38a504a4.jpg","width":1024,"height":782,"top":0,"scale":null,"bottom":null,"left":null},"static":false,"range":[[0,10],[0,10]],"box":[1024,782]}},"radio 1":{"type":"radio","static":false,"graded":true,"alignment":"default","id":null,"key":null,"version":{"major":1,"minor":0},"options":{"choices":[{"content":"The Prussian Model of Education continues to influence how many American public schools are run today.\n","clue":"**This statement **is** likely true, making it the best choice.** The text first explains how the idea of conformity in schools came from Prussia. Then, the text points out how “today, many American public schools still expect students to learn the same skills, at the same time, in the same ways.” From this, you can infer that the Prussian model continues to influence American public schools today. ","correct":true,"isNoneOfTheAbove":false,"widgets":null},{"content":"The Prussian Model of Education has little influence on the way that American public schools are run today.\n","clue":"This statement isn’t likely true. After explaining how the idea of conformity in schools came from Prussia, the text points out how “today, many American public schools still expect students to learn the same skills, at the same time, in the same ways.” So, you can infer that the Prussian focus on conformity still influences American public schools.\n","correct":false,"isNoneOfTheAbove":false,"widgets":null},{"content":"There have been many people fighting to eliminate the focus on conformity in American public schools.\n","clue":"This statement isn’t likely true. The text discusses how American schools focus on conformity, and points out how “studies show that this focus on conformity can cause particular problems for certain students.” But the text doesn’t mention many people fighting to eliminate schools’ focus on conformity.\n","correct":false,"isNoneOfTheAbove":false,"widgets":null},{"content":"There have been many people fighting to impose new rules within the American public school system.","clue":"This statement isn’t likely true. The text doesn’t mention people fighting to impose new rules. \n","correct":false,"isNoneOfTheAbove":false,"widgets":null}],"noneOfTheAbove":false,"hasNoneOfTheAbove":false,"countChoices":false,"deselectEnabled":false,"randomize":false,"multipleSelect":false,"onePerLine":false,"displayCount":null}}},"replace":null,"metadata":null,"images":{}},"hints":[],"answerArea":{"type":"","static":false,"graded":false,"alignment":"","id":null,"key":null,"version":null,"zTable":false,"chi2Table":false,"tTable":false,"calculator":false,"periodicTable":false,"periodicTableWithKey":false,"financialCalculatorMonthlyPayment":false,"financialCalculatorTotalAmount":false,"financialCalculatorTimeToPayOff":false,"options":null},"_multi":null,"itemDataVersion":{"major":0,"minor":1},"answer":null}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this didn't get formatted? Feels like it would be nicer if it were across multiple lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

tTable: false,
};

it("fills in missing fields with `false`", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action needed, but noting that we don't test a mix between false/true values:

const result = parsePerseusAnswerArea({
  calculator: true,
  chi2Table: false,
}, ctx());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also not testing that each input field is mapped to the correct output field. As I've written the test, there's nothing stopping us from scrambling the fields when we assign them.

I'm inclined not to test for such things; tests provide value when they catch our mistakes, so a test for a mistake we're unlikely to ever make is unlikely to provide value.

@benchristel benchristel changed the base branch from benc/delete-metadata to main February 18, 2025 19:33
// }
//
// This function filters the fields of an answerArea object, keeping only the
// known ones, and converts `undefined` and `null` values to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is there any concern about this changing logic? If something expects null/undefined but instead gets false I mean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answerArea is used to select which auxiliary features of the exercise UI to display. AFAICT code that uses it treats the values as booleans, so missing/undefined values are treated the same as false.

@benchristel benchristel force-pushed the benc/fix-prod-parser-errors branch from 4a79bed to a0b5160 Compare February 18, 2025 20:02
…ePerseusItem` and `parseAndMigratePerseusArticle` to accept legacy data formats observed in production
@benchristel benchristel merged commit e63f83d into main Feb 18, 2025
8 checks passed
@benchristel benchristel deleted the benc/fix-prod-parser-errors branch February 18, 2025 20:47
SonicScrewdriver added a commit that referenced this pull request Feb 18, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @khanacademy/[email protected]

### Minor Changes

- [#2242](#2242)
[`e187c6b67`](e187c6b)
Thanks [@benchristel](https://github.com/benchristel)! - Deprecate the
`metadata` field in renderer, hint, and Group widget data schemas.


- [#2156](#2156)
[`cbd5a6528`](cbd5a65)
Thanks [@Myranae](https://github.com/Myranae)! - Implement a widget
export function to filter out rubric data from widget options for the
Matcher widget

### Patch Changes

- [#2215](#2215)
[`62ed407b8`](62ed407)
Thanks [@Myranae](https://github.com/Myranae)! - Update Sorter's public
widget option function to use Math.random and shuffle

- Updated dependencies
\[[`bae77a63c`](bae77a6),
[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Minor Changes

- [#2246](#2246)
[`e63f83d0d`](e63f83d)
Thanks [@benchristel](https://github.com/benchristel)! - Update
`parseAndMigratePerseusItem` and `parseAndMigratePerseusArticle` to
accept legacy data formats observed in production


- [#2242](#2242)
[`e187c6b67`](e187c6b)
Thanks [@benchristel](https://github.com/benchristel)! - Deprecate the
`metadata` field in renderer, hint, and Group widget data schemas.


- [#2215](#2215)
[`62ed407b8`](62ed407)
Thanks [@Myranae](https://github.com/Myranae)! - Update Sorter's public
widget option function to use Math.random and shuffle


- [#2156](#2156)
[`cbd5a6528`](cbd5a65)
Thanks [@Myranae](https://github.com/Myranae)! - Implement a widget
export function to filter out rubric data from widget options for the
Matcher widget

## @khanacademy/[email protected]

### Minor Changes

- [#2242](#2242)
[`e187c6b67`](e187c6b)
Thanks [@benchristel](https://github.com/benchristel)! - Deprecate the
`metadata` field in renderer, hint, and Group widget data schemas.

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- [#2240](#2240)
[`bae77a63c`](bae77a6)
Thanks [@mark-fitzgerald](https://github.com/mark-fitzgerald)! -
[Linter] Remove Math Font Size rule from editor linter

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]

## @khanacademy/[email protected]

### Patch Changes

- Updated dependencies
\[[`bae77a63c`](bae77a6),
[`e63f83d0d`](e63f83d),
[`e187c6b67`](e187c6b),
[`62ed407b8`](62ed407),
[`cbd5a6528`](cbd5a65)]:
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
    -   @khanacademy/[email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants