-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handwriting Recognition API #591
Comments
We briefly looked at this during our F2F today, and had a couple early review questions:
@r12a do you have any input on this? |
What's the metric of the cartesians in the explainer? Are they physical pixels, logical pixels, or something else? |
@cynthia WICG/handwriting-recognition#4 I also made a bunch of other i18n-related comments (see the issue list). |
Here I assume you mean a language that can be written both horizontally and vertically. Google's recognizer generally returns characters in the order they were written (for the above type of languages). So it works in both writing directions (e.g. rtl, ltr, top-bottom). Our metric shows vertical written isn't commonly used by our users, so this feature hasn't got recent attentions. We aren't sure how other recognizers work. Some may only work with one direction (and doesn't work at all for vertical writing). Some may ignore the character writing order. WDYT to have a hint about writing direction? In case some recognizers need this information. Note, some recognizer may disregard this hint altogether.
For RTL languages, the recognizer already knows it should process text from right to left. For LTR languages, but characters written from right to left (e.g. "hello" written in "olleh" order). It's a rare/uncommon scenario. I'm not sure what's the correct interpretation. The user perhaps want the text to be interpreted as "hello", but it's really up to the recognizer to decide what it will output. Either output can be considered valid IMO.
The recognizer could determine the writing direction by looking at each character's written time and their spatial relations. Similarly for context switching. For example,
This being said, existing recognizers (those available on the market) don't support mixed scripts (e.g. english + arabic). They will recognize text as if the text is written in a single script (e.g. recognize arabic characters as english characters, and give less-ideal results). I don't think we should try to solve the mixed script problem if the underlying implementations haven't solved it. Our solution may not work for them. Or, if the implementation is advanced, it doesn't care about whether we provide this information / hint).
We choose navigator because it's preferred over alternatives (e.g. window, global constructor): We expect handwriting recognizer to interact with platform-specific APIs, and support different features (on different platforms). Navigator seems natural based on this consideration of feature differences. We don't have particular preferences on where the methods are. Are you suggesting we put the methods behind a attribute (e.g.
The explainer examples use logical pixels. The recognizer doesn't particularly care about the measurement unit, as long as all provided coordinates are measured in the same way (i.e. don't mix logical pixels and device pixels). The recognizer implementation normalizes the coordinates, and perform recognition relatively (e.g. relative to the smallest character / block in the drawing). |
@wacky6, thank you for your patience! @atanassov and I looked at this during our F2F. Your response covers most of the questions we had - thanks a lot.
I think having that as an extension point would be useful - if there is some sort specific of post-processing that needs to be done based on this before it hits the recognizer, it feels like this information could be useful to expose.
We agree that this isn't an important scenario to handle. Our concerns on RTL was mostly about languages that are actually written left to right.
If it's an unsolved problem, I think we don't need to delve too much into this.
Yes, this was one of the reasons we asked this question. We were also a bit curious on three different tabs initiating multiple recognition contexts - is anything shared? (This question is based on the navigator layering) (More comments based on the discussion with @atanassov to come in a bit.) |
During our May 2021 vf2f, @cynthia and myself did another pass at this review, thank you for all of the answers. Regarding adding a direction hint to the recognizer - we found that to be a useful futureproofing feature and recommend that you do. After going over the privacy & security questionnaire I am still not clear if the API exposes additional fingerprinting capabilities. With exposure of strokes, ordering of strokes and timing of strokes, I worry that models can be trained to easily recognize patterns for various disabilities. This will be a very unfortunate byproduct of this API. Is this something that you considered and could expand on? |
Hi, @cynthia
So that the recognizer for ("en" and "ar") can differentiate the following two outputs, for text "نشاط التدويل، W3C"
Could you confirm this addresses your concern?
Is there any documents on navigator layering? Hi, @atanassov The handwriting process looks like this: Websites can already collect handwriting and analyze them. All they need is some user input (step 1), and some analysis code. For example, ask user to draw on canvas, use PointerEvent to collect the drawing, then everything to a server for analysis (they don't have to use our API). Our API is at step 2. It converts stroke data (represented with our proposed HandwritingStroke and HandwritingDrawing) to some text. Websites can already analyze handwritings in JavaScript. Our API made this easier (call a method instead of supplying a bunch of JavaScript code) and more efficient (run in native code / accelerators). In short, our API isn't introducing new things that Web can't already do. |
Thank you for your feedback. We've discussed this in a breakout and concluded that this proposal is good to move forward - we'll discuss further in the plenary and close if everyone agrees. Thanks for bringing this to our attention. As for the navigator layering, we don't have any formal recommendations - we'll discuss this in the plenary and provide feedback afterwards. |
Sure, text written in English rarely has Arabic text in it, but that's not true at all the other way around. Text written in Arabic and all the other languages that use RTL scripts will contain LTR Latin script text on a regular basis. Not only that, but they will also contain numbers, and those are written LTR within the RTL flow. Same goes for expressions, numeric ranges, etc. for some languages. For example, in Hebrew you'll write "Score: 82" as I don't think you'd want the text stored in memory to become "Score: 28". Or how about: "No parking: 08:00 - 20:00". Will the text stored in memory indicate that you can't park during the day, or overnight – it depends on the direction in which the range is read, and that will depend on the rules of the language being used. Note that WICG/handwriting-recognition#4 already raises some of these issues, but as yet has no response. Sorry, but I don't buy that you don't have to consider how this would work if implementations don't currently enable handwriting recognition properly for large percentages of the people on the planet. Our mission is to make the World Wide Web accessible worldwide. I think some thought has to be given to how to address the needs of the currently underserved millions of potential users. |
Of course, if the recogniser recognises strokes and stores characters in the order they are written, then that may provide a solution, because someone writing "Score: 82" in Hebrew will write the 8 before the 2 (leaving a gap for it to fit). If the conversion of strokes to characters takes place after an input is completed, however, then mixed direction text will require parsing for direction changes. Note, however, that in the former case, where strokes are converted on-the-fly, it's not straightforward either, since Arabic and Hebrew graphemes tend to be only half-written during the initial pass, and those graphemes are completed after the word is completed (eg. the top bar for scripts such as Devanagari). |
Thanks for the very comprehensive privacy & security section in the explainer. We're basically fine with the design. Since this relies on the presence of a handwriting recognizer software component that raises some concerns about implementability - especially across lower spec devices and in open source efforts. There seems to be an issue regarding multi-stakeholder support as there's no documented support from other browser engines on Chrome Status - can you provide any feedback there? What is the trajectory for this spec after incubation in WICG? Where do you see this going? |
WebKit (https://lists.webkit.org/pipermail/webkit-dev/2021-March/031762.html) and Mozilla (mozilla/standards-positions#507) have been asked for their opinions, but without a response so far. |
The feedback @r12a wrote above we think is important, but beyond the scope of this review and ideally should be discussed on the group's repository. As noted earlier, we are happy to see this move forward. Thank you for bringing this to our attention. (And please ping other stakeholders again when you have time!) |
HIQaH! QaH! TAG!
I'm requesting a TAG review of Handwriting Recognition API.
Handwriting is a widely used input method, one key usage is to recognize the texts when users are drawing. This feature already exists on many operating systems (e.g. handwriting input methods). However, the web platform as of today doesn't have this capability, the developers need to integrate with third-party libraries (or cloud services), or to develop native apps.
We want to add handwriting recognition capability to the web platform, so developers can use the existing handwriting recognition features available on the operating system.
Further details:
We'd prefer the TAG provide feedback as (please delete all but the desired option):
🐛 open issues in our GitHub repo for each point of feedback
Thanks.
The text was updated successfully, but these errors were encountered: