Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from bigrams to trigrams for search #342

Merged
merged 3 commits into from
Oct 25, 2023

Commits on Oct 23, 2023

  1. Use JavaScript typed arrays for search index

    This uses `Uint8Array` to represent byte arrays in the search index,
    reducing heap snapshot size in Firefox from 12.2 MB to 6.8 MB.  Though
    Chrome appears to have a similar optimization already built in, so its
    heap snapshot marginally increases from 6.1 MB to 8.0 MB.
    jonathanhefner committed Oct 23, 2023
    Configuration menu
    Copy the full SHA
    36b184c View commit details
    Browse the repository at this point in the history
  2. Replace occurrences of "bigram" with "ngram"

    This is in preparation for switching from bigrams to trigrams, reducing
    the size of the subsequent diff.
    jonathanhefner committed Oct 23, 2023
    Configuration menu
    Copy the full SHA
    f9e496a View commit details
    Browse the repository at this point in the history
  3. Switch from bigrams to trigrams for search

    Trigrams can provide more accurate search results than bigrams.  For
    example, using bigrams, searching for "sel" would attempt to match the
    ngrams " s", "se", and "el".  For the Rails API (at `7c65a4b83b583f4f`),
    the top result is `ActiveModel::Serializers` due to "Model" matching
    "el" and ":Serial" matching " s" and "se".  However, using trigrams,
    "sel" would attempt to match " se" and "sel".  In that case, for the
    Rails API, the top result is `ActiveRecord::QueryMethods#select`.
    
    The downside to using trigrams is that the search index increases from
    2.9 MB to 8.6 MB.  But the data compresses well, so when gzipped the
    size only increases from 474 kB to 670 kB.  And browser heap snapshot
    size stays reasonably small, increasing from 6.8 MB to 11.1 MB in
    Firefox and 8.0 MB to 22.2 MB in Chrome.
    jonathanhefner committed Oct 23, 2023
    Configuration menu
    Copy the full SHA
    6795062 View commit details
    Browse the repository at this point in the history