Learn about Obscenity's custom pattern syntax.
Patterns are used to specify blacklisted words. To ease matching variations of words with only small changes, they support some special syntax (namely, wildcards, optional expressions, and boundary assertions). For example, the pattern f?ck
matches f
, then any character, then ck
; and matches on fuck
, fbck
, fyck
, fack
, and so on.
This might sound similar to regular expressions, which are widely used for similar purposes. Why not just use them instead of inventing some custom syntax? A few reasons:
- Regular expressions are overkill for profanity filtering in most cases. Their expressive syntax is, for the most part, completely unneeded as most variations are normalized before matching (see the article on transformers).
- Not supporting all the features of regular expressions can make a more efficient implementation in certain cases. In addition to a simpler matcher implementation using regular expressions (ironically) and string searching methods, Obscenity also features a matcher implementation using finite automata techniques which searches for patterns in parallel, which may be useful if you have a large number of patterns.
Most characters match literally. that is, a
matches an a
, book
matches book
, and so on. However, there are three special expressions that are available:
- Wildcards: A
?
matches any character. - Optional expressions: Wrapping an expression in a set of square brackets (
[]
) matches it optional:a[bc]
matches eithera
orabc
. - Boundary assertions: Placing a pipe (
|
) at the start or end of the pattern asserts position at a word boundary:|tit
matchestit
andtits
but notsubstitute
. Similarly,chick|
matcheschick
but notchicken
.
A special character mentioned above can be escaped using a backslash (\
): \?
matches ?
instead of a wildcard.
A pattern may be created using the parseRawPattern()
function:
import { parseRawPattern } from 'obscenity';
const p = parseRawPattern('f?ck');
However, it is usually more convenient to use the pattern
tagged template:
import { pattern } from 'obscenity';
const p = pattern`f?ck`;
Note the lack of ()
when calling pattern
and the usage of backticks.
Due to how the pattern
tagged template works internally, it is not necessary to double-escape backslashes:
import { pattern } from 'obscenity';
const p = pattern`\[`;
If you were using parseRawPattern
instead, the following would be required:
const p = parseRawPattern('\\[');
Next up: Transformers.