regex

General format:

/character-set/flags

Character classes

Use brackets to create capture groups, helpful for logical operator |. & is implicit.

. : Match all characters except newlines. Also see the /s flag.

\w : Any word. Same as [A-Za-z0-9_].

\W : Opposite of \w.

\d : Any digit. Same as [0-9].

\D : Opposite of \d.

\s : Matches a witespace.

\S : Match anything that is not a whitespace. Used in conjunction with \s to match anything, inculding line breaks.

[] : Character set. Used to choose any of the characters in the bracket. A range can be specified with a - in between two characters. Eg: [A-Z]

[^] : Negated character set. DO NOT match any of the letters inside.

() : Capture group.

Anchors and Quantifiers

^ : Beginning of the text. See also the /m flag.

$ : Matches the end of the text.

* : Match 0 or more of the preceding token.

+ : Match 1 or more of the preceding token.

? : Make the previous token optional.

+?/*? : Make the search lazy. This matches as few characters as possible.

| : Boolean OR. Match the expression before or after.

Flags

Flags can be one of the following:

  • global - /g
  • case insensitive - /i
  • multiline /m

Multiline makes the anchors catch all lines instead of the string beginning and ending.

  • unicode /u

When the unicode flag is enabled, you can use extended unicode escapes in the form \x{FFFFF}.

  • sticky /y

Undo the global flag.

  • dotall /s

Dot (.) will match newlines as well.

The flags can be combined. Eg: /ms