Skip to content
This repository has been archived by the owner on Feb 15, 2023. It is now read-only.

gumbo-next #295

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open

gumbo-next #295

wants to merge 38 commits into from

Commits on Feb 17, 2015

  1. Add a token type for CDATA.

    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    0340cad View commit details
    Browse the repository at this point in the history
  2. Add a state flag for whether the tokenizer is in a cdata section, and…

    … set it as appropriate.
    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    f9a515f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    58d5fad View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fa3a71d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2b804fa View commit details
    Browse the repository at this point in the history
  6. Print the decimal value of the current character in the debug output …

    …for lexing, to ease debugging non-printable characters.
    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    8b867b4 View commit details
    Browse the repository at this point in the history
  7. Add test for unsafe cdata.

    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    3f6012a View commit details
    Browse the repository at this point in the history
  8. Fix missing case statement for GUMBO_TOKEN_CDATA in handle_parser_err…

    …or. (The whole error handling really needs to be redone, it's not very helpful to users.)
    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    fe28c18 View commit details
    Browse the repository at this point in the history
  9. Additional debugging instructions.

    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    b6c9617 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    adc4c76 View commit details
    Browse the repository at this point in the history
  11. Update parser and tokenizer tests with testcases for null CDATA, and …

    …make sure their input mechanisms can accept this without relying on strlen.
    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    29f48f2 View commit details
    Browse the repository at this point in the history
  12. Fix handling of nulls in CDATA sections.

    nostrademons authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    7fea4b5 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    4383a40 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    d8f369d View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    975cfcf View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    ac84d02 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    ed9c9e5 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    4d1efca View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    61fc188 View commit details
    Browse the repository at this point in the history
  20. Add in require rtc tag

    kevinhendricks authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    f236a8c View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    7d433e0 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    befeb12 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    a2f9e41 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    723a5f7 View commit details
    Browse the repository at this point in the history
  25. Spec Fixes handle_in_column_group

    kevinhendricks authored and vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    57bce0f View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    328c9e1 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    49a5194 View commit details
    Browse the repository at this point in the history
  28. memory: Simplify the memory allocator implementation

    This (admittedly massive) path simplifies the way memory allocation is
    performed in the library.
    
    The old `gumbo_parser_allocate` APIs have been removed and replace with
    the following:
    
    - `gumbo_malloc`
    - `gumbo_realloc`
    - `gumbo_free`
    - `gumbo_strdup`
    
    As you can see, the 4 APIs match their C standard equivalents (in both
    function and signature), and they no longer take a `GumboParser *`
    object to lookup their implementation.
    
    Instead, their implementation can be customized, globally, using the
    following APIs:
    
    - `gumbo_memory_set_allocator`
    - `gumbo_memory_set_free`
    
    These two APIs allow the user to set a global memory allocator and free
    function. The `allocator` function needs to have the same signature as
    the standard `realloc` (this allows us to use it both as a realloc in
    the vector and string buffer code, *greatly* reducing memory usage), and
    as a normal malloc (by passing `NULL` as the first argument).
    
    The `free` function needs to have the same signature as the standard
    `free`.
    
    With just these two functions, we can abstract the whole set of standard
    C memory allocation APIs, and we can do so globally, without having to
    pass around the parser state to find them.
    
    This greatly simplifies many parts of the library, improves performance,
    and fixes several pathological cases of excessive memory usage,
    caused by the previous lack of a `realloc` API.
    
    The following external APIs, however, are no longer backwards
    compatible:
    
    - struct GumboInternalOptions: no longer allows the user to set a custom
       memory allocator callback.
    
    - gumbo_destroy_output: no longer requires a Parser object.
    
    - gumbo_destroy_node: can now be safely exported
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    d24c9d4 View commit details
    Browse the repository at this point in the history
  29. tags: Use a perfect hash for lookups

    The previous version using `strcasecmp` over an array was a bottleneck
    on the library.
    
    This version uses a simple, minimal perfect hash table (computed via
    `mph`) to convert tag names into strings. Since we're now hashing tag
    names, we can pass in the length of the tag name explicitly, and avoid
    the superfluous allocations that the tokenizer was performing in order
    to NULL-terminate the tag. This is implemented on the new
    `gumbo_tagn_enum` API.
    
    The old `gumbo_tag_enum` API has been left as a thin wrapper to keep
    backwards compatibility -- it is not used internally by the library.
    
    `mph` was chosen for the perfect hash function because it generates
    hashes that are slightly slower than GPerf but significantly simpler,
    and occuppying an order of magnitude less memory (as they don't
    need a full copy of all the strings in the set for hashing).
    
    If the tag lookup function proves to be a bottleneck, this decision can
    be re-evaluated in the future.
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    c34e2d9 View commit details
    Browse the repository at this point in the history
  30. parser: Simplify the element_in_specific_scope calls

    The old implementation using 2 tagsets was being rather wasteful with
    stack space, allocating 2 whole sets when one of them always contains a
    single tag element. Knowing that the `expected` elements must always be
    in the HTML namespace, we can simplify these APIs by passing an array of
    elements and stop allocating so much space on the stack.
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    4d8ae0b View commit details
    Browse the repository at this point in the history
  31. parser: Implement fragment parsing

    The HTML5 fragment parsing algorithm has been implemented using a new
    API, `gumbo_parse_fragment`. The old APIs are maintained for backwards
    compatibility, although passing `GUMBO_TAG_LAST` as the inner_html
    context to `parse_fragment` will cause it to parse the buffer as a full
    document (same functionality as `gumbo_parse_with_options`).
    
    The HTML5lib adapter code has been modified to support fragment parsing
    tests (the tests are passing 100%).
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    72a2be1 View commit details
    Browse the repository at this point in the history
  32. parser: Enable these SVG attribute replacements

    The most recent version of the HTML5 standard does **not** perform these
    replacements. However, we are targetting the html5lib 0.95 tests, where
    they are still performed. Hence, conditionally enable them for now until
    we can bring the whole suite up to speed.
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    d59e569 View commit details
    Browse the repository at this point in the history
  33. travis: Use GTest 1.7.0

    Fixes compilation in Yosemite
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    2df0efc View commit details
    Browse the repository at this point in the history
  34. Fix compilation in Mac OS X

    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    ee05f9f View commit details
    Browse the repository at this point in the history
  35. tags: Automatically generate tag data

    Use `sed` rules in the Makefile to automatically generate all the Tag
    tables. This way we can avoid keeping them in sync.
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    a87add3 View commit details
    Browse the repository at this point in the history
  36. tokenizer: Refactor ASCII-only helpers

    The ascii-only helpers in the tokenizer should be used in other parts of
    the codebase (namely: when comparing tag names case-insensitively).
    Hence, export them on the util.h header.
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    62fd3e2 View commit details
    Browse the repository at this point in the history
  37. parser: Export create_node

    The `create_mode` helper is very useful when building tooling on top of
    Gumbo, so don't keep it static.
    vmg committed Feb 17, 2015
    Configuration menu
    Copy the full SHA
    b6dcb36 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    37479c5 View commit details
    Browse the repository at this point in the history