ES proposal: s (dotAll) flag for regular expressions

(Ad, please don’t block)

The proposal “s (dotAll) flag for regular expressions” by Mathias Bynens and Brian Terlson is currently at stage 3. This blog post explains how it works.

Overview  

Currently, the dot (.) in regular expressions doesn’t match line terminator characters:

> /^[^]$/.test('\n')
true

The proposal specifies the regular expression flag /s that changes that:

> /^.$/s.test('\n')
true

Limitations of the dot (.) in regular expressions  

The dot (.) in regular expressions has two limitations.

First, it doesn’t match astral (non-BMP) characters such as emoji:

> /^.$/.test('😀')
false

This can be fixed via the /u (unicode) flag:

> /^.$/u.test('😀')
true

Second, the dot does not match line terminator characters:

> /^.$/.test('\n')
false

That can currently only be fixed by replacing the dot with work-arounds such as [^] (“all characters except no character”) or [\s\S] (“either whitespace nor not whitespace”).

> /^[^]$/.test('\n')
true
> /^[\s\S]$/.test('\n')
true

Line terminators recognized by ECMAScript  

Line termators in ECMAScript affect:

  • The dot, in all regular expressions that don’t have the flag /s.
  • The anchors ^ and $ if the flag /m (multiline) is used.

The following for characters are considered line terminators by ECMAScript:

  • U+000A LINE FEED (LF) (\n)
  • U+000D CARRIAGE RETURN (CR) (\r)
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

There are additionally some newline-ish characters that are not considered line terminators by ECMAScript:

  • U+000B VERTICAL TAB (\v)
  • U+000C FORM FEED (\f)
  • U+0085 NEXT LINE

Those three characters are matched by the dot without a flag:

> /^...$/.test('\v\f\u{0085}')
true

The proposal  

The proposal introduces the regular expression flag /s (short for “singleline”), which leads to the dot matching line terminators:

> /^.$/s.test('\n')
true

The long name of /s is dotAll:

> /./s.dotAll
true
> /./s.flags
's'
> new RegExp('.', 's').dotAll
true
> /./.dotAll
false

dotAll vs. multiline  

  • dotAll only affects the dot.
  • multiline only affects ^ and $.

FAQ  

Why is the flag named /s?  

dotAll is a good description of what the flag does, so, arguably, /a or /d would have been better names. However, /s is already an established name (Perl, Python, Java, C#, ...).

Trying it out  

V8 5.9+ implements the proposal, but you need --harmony-regexp-dotall to switch it on:

  • Chrome 59+ comes with V8 5.9:
    • Check via chrome://version/
    • This page also shows the “Executable Path” for Chrome. For example: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
    • Start Chrome with the flag:
    '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome' --js-flags="--harmony-regexp-dotall"
    
  • Node.js 8.1.4 comes with V8 5.8 and does not yet support the proposal.
    • Check via npm version

Most popular (last 30 days)

Loading...