Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 1.65 KB

README.md

File metadata and controls

42 lines (30 loc) · 1.65 KB

regexp-disasm – Emacs regexp bytecode disassembler

regexp-disasm disassembles compiled Emacs regexp bytecode to something that can (just) be understood by a human. It is mainly intended for those debugging the Emacs regexp engine, but anyone curious about how it works and how regexps are interpreted may find it interesting.

This is a specialised package: it requires some knowledge of the regexp internals to be useful. See src/regex-emacs.c in the Emacs source tree, and re_opcode_t in particular.

Installation

This package requires either Emacs 30, or a patched Emacs 28-29. For Emacs 29 and older, apply regexp-bytecode.patch to your Emacs source tree and rebuild.

Emacs 30 does not need patching but since it has (at the time of writing) not yet been released, you may need to build a snapshot of a development version from source.

Start Emacs and load regexp-disasm.el.

Usage

  • (regexp-disassemble REGEXP &optional CASE-TABLE)

    Compile REGEXP with the optional CASE-TABLE and display a pretty-printed disassembly in a separate buffer. This function can be used as an interactive command; note that the regexp is then input as a Lisp expression, not a string.

  • (regexp-disasm REGEXP &optional CASE-TABLE)

    Compile REGEXP with the optional CASE-TABLE and return the disassembly as a list of instructions. Each instruction takes the form (ADDRESS . INSTR) where ADDRESS is the byte offset and INSTR is the instruction in a symbolic form.

For both functions, previously compiled (cached) regexp bytecode may be used. It should be functionally equivalent but may not be identical, since Emacs regexps sometimes use self-modifying code.