Reveal Help Center

Searching Using Regular Expressions (Regex)

Regular Expressions use a standard set of specific symbols to identify highly formatted information in text. Some examples are:

  • Social Security Numbers

  • Telephone numbers

  • Credit card numbers

The coding for regular expressions looks for characters, numerals, punctuation and so on. To search using regular expressions in Reveal, the Standard Search Syntax must be selected from the User Settings - General screen.

603d316be2d8f.png

Once a user's default search syntax is set, they can run Regular Expression (regex) searches from the document text search box.  

The current syntax selection appears above the main search window in the search pane.

603d316db9bb0.png

Here are some common examples of regular expressions.

Note

There are many ways to code regular expressions, and many variations for different programming environments. The following examples are offered as initial demonstrations of one way to express these searches. Recommended starting points for reference on building Regular expressions (with syntax testing) are https://regex101.com/ and http://regexstorm.net/reference.

Social Security Number:

(?<!\d)\d{3}-\d{2}-\d{4}(?!\d) 

Syntax:

  • (?<!\d) = Zero-width look-behind assertion - continue only if the \d decimal pattern does not match on the left.

  • \d{3}-\d{2}-\d{4} = match any 3 decimal digits-any 2 decimal digits-any 4 decimal digits.

  • (?!\d) = Zero-width look-ahead assertion - stop if the \d decimal pattern does not match on the left.

US Phone Number:

(?<!\d)(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?(?!\d)

Syntax:

  • (?<!\d) = Zero-width look-behind assertion - continue only if the \d decimal pattern does not match on the left.

  • (?:(?:\+?1\s*(?:[.-]\s*)?) = Every group in this first section seeks to locate the U.S. Country Code '1' followed by space(s), a dot or a dash character.

    • (?: = Non-capturing group contained within parentheses.

    • \+?1 = Match any single character literally 0 or 1 time - here, the character '1' if present.

    • \s* = Match a whitespace 0 to unlimited times.

    • [.-] = Match a single character in this set.

  • ([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])| = Capturing group for area code.

    • [2-9]1[02-9]| = Match a numeral between '2' and '9' inclusive, followed by the literal numeral '1', and any numeral in the set of a literal '0' or a numeral between '2' and '9' inclusive, OR

    • [2-9][02-8]1| = Match a numeral between '2' and '9' inclusive, followed by any numeral in the set of a literal '0' or a numeral between '2' and '8' inclusive and a literal '1', OR

    • [2-9][02-8][02-9] = Match a numeral between '2' and '9' inclusive, followed by any numeral in the set of a literal '0' or a numeral between '2' and '8' inclusive, and a any numeral in the set of a literal '0' or a numeral between '2' and '9' inclusive, OR

  • ([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s* = a second capturing group sequence for area code followed by whitespace(s).

  • (?:[.-]\s*)?)? = a set off by a dot, a dash or whitespace(s) between zero and one time(s).

  • ([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2}) = Capturing group for telephone exchange.

    • [2-9]1[02-9]| = Match a numeral between '2' and '9' inclusive, followed by the literal numeral '1', and any numeral in the set of a literal '0' or a numeral between '2' and '9' inclusive, OR

    • [2-9][02-9]1| = Match a numeral between '2' and '9' inclusive, followed by any numeral in the set of a literal '0' or a numeral between '2' and '9' inclusive and a literal '1', OR

    • [2-9][02-9]{2} = Match a numeral between '2' and '9' inclusive, followed by any numeral in the set of a literal '0' or a numeral between '2' and '9' inclusive and two additional digits.

  • (?:[.-]\s*)?)? = a set off by a dot, a dash or whitespace(s) between zero and one time(s).

  • ([0-9]{4}) = Capturing group for the number, defined as being comprised of four digits between '0' and '9' inclusive.

  • (?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?(?!\d) = Capturing group for extension. Matches the literal characters '#' or 'x.' or 'ext.' or 'extension' followed by whitespace(s) and between one and unlimited decimal characters.

  • (?!\d) = Zero-width look-ahead assertion - stop if the \d decimal pattern does not match on the left.

Credit Card Number (example used is MasterCard):

(^|\b)(5[1-5]\d{2})([- ])(\d{4})\3(\d{4})\3(\d{4})(\b|$) 

This is one simple example of one type of 16-digit MasterCard search expression where the initial number is '51' to '55' and dashes or spaces are used. There is also syntax to include variations using the prefix numbers 2221 through 2720 using the '|' or operator - 222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)

Syntax:

  • (^|\b) = Starts string. This is the first positional parameter $1.

  • (5[1-5]\d{2}) = First group matches '5' followed by a number in the range from '1' to '5' followed by two further decimal characters. This captures the digits returned as the second positional parameter $2.

  • ([- ]) = The separator character is a hyphen or space. This is the third positional parameter $3.

  • (\d{4}) = Match a single decimal character four times. This captures the digits returned as the fourth positional parameter $4.

  • \3 = References the third parameter; separators must be identical.

  • (\d{4}) = Match a single decimal character four times. This captures the digits returned as the fifth positional parameter $5.

  • \3 = References the third parameter; separators must be identical.

  • (\d{4}) = Match a single decimal character four times. This captures the digits returned as the sixth positional parameter $6.

  • \3 = References the third parameter; separators must be identical.

  • ($|\b) = End of string OR word boundary. This is positional parameter $7.

  • The captured values are concatenated $2-$4-$5-$6 = $0 for the string expression.