Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate adding ocaml-tree-sitter Hack tests to CI tests #11

Open
aosq opened this issue Jul 8, 2021 · 1 comment
Open

Investigate adding ocaml-tree-sitter Hack tests to CI tests #11

aosq opened this issue Jul 8, 2021 · 1 comment

Comments

@aosq
Copy link
Member

aosq commented Jul 8, 2021

Is it https://github.com/returntocorp/ocaml-tree-sitter-languages or https://github.com/returntocorp/ocaml-tree-sitter-semgrep that tests different Hack repos against their parser?

Wonder if we can leverage some of their work and run their tests as part of this repos CI tests (first we need to add CI tests 😅).

@frankeld
Copy link
Member

frankeld commented Jul 9, 2021

ocaml-tree-sitter-languages and ocaml-tree-sitter-semgrep are currently in the process of being split and re-organized, so as of right now the answer is both. Both languages repo and semgrep repo have a Hack subfolder that lists projects to scan for parsing stats. The process is briefly described here: https://github.com/returntocorp/ocaml-tree-sitter-languages/blob/main/doc/adding-a-language.md#parsing-statistics. However, there are some existing flaws in this process, like not properly capturing PHP vs Hack files since the scanning is entirely based on file extensions. Or, incorrectly identifying files with Hack-like extensions (semgrep/ocaml-tree-sitter-languages#6).

Generally, the OCaml parser seems to inherit any flaws in the original grammar, so for the most part any errors in the grammar that trigger with t-s-h's npx tree-sitter parse would reappear with o-t-s's make stat. make stat functions very similarly to bin/fetch-examples; bin/test-examples in that it relies on cloning a list of public repos to generate statistics against. The process is here: https://github.com/returntocorp/ocaml-tree-sitter-core/blob/main/scripts/lang-stat (this can publish to https://dashboard.semgrep.dev/metric/semgrep.core.hack.parse.pct). Also it can handle internal private repos, which is helpful for a language like Hack that doesn't have a lot of large open source repos.

The t-s-h corpus tests are still the only tests that actually test the correctness of parse results instead of just checking for explicit parse errors. However, some form of CI test with o-s-t will be useful to prevent regressions that cause errors in the o-s-t build process that derives from t-s-h (these error types: https://github.com/returntocorp/ocaml-tree-sitter-languages/blob/main/doc/adding-a-language.md#troubleshooting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants