-
Notifications
You must be signed in to change notification settings - Fork 660
gumbo-next #295
base: master
Are you sure you want to change the base?
gumbo-next #295
Commits on Feb 17, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 0340cad - Browse repository at this point
Copy the full SHA 0340cadView commit details -
Add a state flag for whether the tokenizer is in a cdata section, and…
… set it as appropriate.
Configuration menu - View commit details
-
Copy full SHA for f9a515f - Browse repository at this point
Copy the full SHA f9a515fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 58d5fad - Browse repository at this point
Copy the full SHA 58d5fadView commit details -
Configuration menu - View commit details
-
Copy full SHA for fa3a71d - Browse repository at this point
Copy the full SHA fa3a71dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2b804fa - Browse repository at this point
Copy the full SHA 2b804faView commit details -
Print the decimal value of the current character in the debug output …
…for lexing, to ease debugging non-printable characters.
Configuration menu - View commit details
-
Copy full SHA for 8b867b4 - Browse repository at this point
Copy the full SHA 8b867b4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f6012a - Browse repository at this point
Copy the full SHA 3f6012aView commit details -
Fix missing case statement for GUMBO_TOKEN_CDATA in handle_parser_err…
…or. (The whole error handling really needs to be redone, it's not very helpful to users.)
Configuration menu - View commit details
-
Copy full SHA for fe28c18 - Browse repository at this point
Copy the full SHA fe28c18View commit details -
Configuration menu - View commit details
-
Copy full SHA for b6c9617 - Browse repository at this point
Copy the full SHA b6c9617View commit details -
Configuration menu - View commit details
-
Copy full SHA for adc4c76 - Browse repository at this point
Copy the full SHA adc4c76View commit details -
Update parser and tokenizer tests with testcases for null CDATA, and …
…make sure their input mechanisms can accept this without relying on strlen.
Configuration menu - View commit details
-
Copy full SHA for 29f48f2 - Browse repository at this point
Copy the full SHA 29f48f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7fea4b5 - Browse repository at this point
Copy the full SHA 7fea4b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4383a40 - Browse repository at this point
Copy the full SHA 4383a40View commit details -
Configuration menu - View commit details
-
Copy full SHA for d8f369d - Browse repository at this point
Copy the full SHA d8f369dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 975cfcf - Browse repository at this point
Copy the full SHA 975cfcfView commit details -
Configuration menu - View commit details
-
Copy full SHA for ac84d02 - Browse repository at this point
Copy the full SHA ac84d02View commit details -
Configuration menu - View commit details
-
Copy full SHA for ed9c9e5 - Browse repository at this point
Copy the full SHA ed9c9e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d1efca - Browse repository at this point
Copy the full SHA 4d1efcaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 61fc188 - Browse repository at this point
Copy the full SHA 61fc188View commit details -
Configuration menu - View commit details
-
Copy full SHA for f236a8c - Browse repository at this point
Copy the full SHA f236a8cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7d433e0 - Browse repository at this point
Copy the full SHA 7d433e0View commit details -
Configuration menu - View commit details
-
Copy full SHA for befeb12 - Browse repository at this point
Copy the full SHA befeb12View commit details -
Configuration menu - View commit details
-
Copy full SHA for a2f9e41 - Browse repository at this point
Copy the full SHA a2f9e41View commit details -
Configuration menu - View commit details
-
Copy full SHA for 723a5f7 - Browse repository at this point
Copy the full SHA 723a5f7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 57bce0f - Browse repository at this point
Copy the full SHA 57bce0fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 328c9e1 - Browse repository at this point
Copy the full SHA 328c9e1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 49a5194 - Browse repository at this point
Copy the full SHA 49a5194View commit details -
memory: Simplify the memory allocator implementation
This (admittedly massive) path simplifies the way memory allocation is performed in the library. The old `gumbo_parser_allocate` APIs have been removed and replace with the following: - `gumbo_malloc` - `gumbo_realloc` - `gumbo_free` - `gumbo_strdup` As you can see, the 4 APIs match their C standard equivalents (in both function and signature), and they no longer take a `GumboParser *` object to lookup their implementation. Instead, their implementation can be customized, globally, using the following APIs: - `gumbo_memory_set_allocator` - `gumbo_memory_set_free` These two APIs allow the user to set a global memory allocator and free function. The `allocator` function needs to have the same signature as the standard `realloc` (this allows us to use it both as a realloc in the vector and string buffer code, *greatly* reducing memory usage), and as a normal malloc (by passing `NULL` as the first argument). The `free` function needs to have the same signature as the standard `free`. With just these two functions, we can abstract the whole set of standard C memory allocation APIs, and we can do so globally, without having to pass around the parser state to find them. This greatly simplifies many parts of the library, improves performance, and fixes several pathological cases of excessive memory usage, caused by the previous lack of a `realloc` API. The following external APIs, however, are no longer backwards compatible: - struct GumboInternalOptions: no longer allows the user to set a custom memory allocator callback. - gumbo_destroy_output: no longer requires a Parser object. - gumbo_destroy_node: can now be safely exported
Configuration menu - View commit details
-
Copy full SHA for d24c9d4 - Browse repository at this point
Copy the full SHA d24c9d4View commit details -
tags: Use a perfect hash for lookups
The previous version using `strcasecmp` over an array was a bottleneck on the library. This version uses a simple, minimal perfect hash table (computed via `mph`) to convert tag names into strings. Since we're now hashing tag names, we can pass in the length of the tag name explicitly, and avoid the superfluous allocations that the tokenizer was performing in order to NULL-terminate the tag. This is implemented on the new `gumbo_tagn_enum` API. The old `gumbo_tag_enum` API has been left as a thin wrapper to keep backwards compatibility -- it is not used internally by the library. `mph` was chosen for the perfect hash function because it generates hashes that are slightly slower than GPerf but significantly simpler, and occuppying an order of magnitude less memory (as they don't need a full copy of all the strings in the set for hashing). If the tag lookup function proves to be a bottleneck, this decision can be re-evaluated in the future.
Configuration menu - View commit details
-
Copy full SHA for c34e2d9 - Browse repository at this point
Copy the full SHA c34e2d9View commit details -
parser: Simplify the
element_in_specific_scope
callsThe old implementation using 2 tagsets was being rather wasteful with stack space, allocating 2 whole sets when one of them always contains a single tag element. Knowing that the `expected` elements must always be in the HTML namespace, we can simplify these APIs by passing an array of elements and stop allocating so much space on the stack.
Configuration menu - View commit details
-
Copy full SHA for 4d8ae0b - Browse repository at this point
Copy the full SHA 4d8ae0bView commit details -
parser: Implement fragment parsing
The HTML5 fragment parsing algorithm has been implemented using a new API, `gumbo_parse_fragment`. The old APIs are maintained for backwards compatibility, although passing `GUMBO_TAG_LAST` as the inner_html context to `parse_fragment` will cause it to parse the buffer as a full document (same functionality as `gumbo_parse_with_options`). The HTML5lib adapter code has been modified to support fragment parsing tests (the tests are passing 100%).
Configuration menu - View commit details
-
Copy full SHA for 72a2be1 - Browse repository at this point
Copy the full SHA 72a2be1View commit details -
parser: Enable these SVG attribute replacements
The most recent version of the HTML5 standard does **not** perform these replacements. However, we are targetting the html5lib 0.95 tests, where they are still performed. Hence, conditionally enable them for now until we can bring the whole suite up to speed.
Configuration menu - View commit details
-
Copy full SHA for d59e569 - Browse repository at this point
Copy the full SHA d59e569View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2df0efc - Browse repository at this point
Copy the full SHA 2df0efcView commit details -
Configuration menu - View commit details
-
Copy full SHA for ee05f9f - Browse repository at this point
Copy the full SHA ee05f9fView commit details -
tags: Automatically generate tag data
Use `sed` rules in the Makefile to automatically generate all the Tag tables. This way we can avoid keeping them in sync.
Configuration menu - View commit details
-
Copy full SHA for a87add3 - Browse repository at this point
Copy the full SHA a87add3View commit details -
tokenizer: Refactor ASCII-only helpers
The ascii-only helpers in the tokenizer should be used in other parts of the codebase (namely: when comparing tag names case-insensitively). Hence, export them on the util.h header.
Configuration menu - View commit details
-
Copy full SHA for 62fd3e2 - Browse repository at this point
Copy the full SHA 62fd3e2View commit details -
The `create_mode` helper is very useful when building tooling on top of Gumbo, so don't keep it static.
Configuration menu - View commit details
-
Copy full SHA for b6dcb36 - Browse repository at this point
Copy the full SHA b6dcb36View commit details -
Configuration menu - View commit details
-
Copy full SHA for 37479c5 - Browse repository at this point
Copy the full SHA 37479c5View commit details