Regex Sets / Ranges
Regular Expressions: Regex Sets / Ranges
How do square brackets […] perform in regular expression sets?
View Answer:
Interview Response: Several characters or character classes inside square brackets […] mean to “search for any character among given”. For example, [eao] means any of the 3 characters: 'a', 'e', or 'o'. That is called a set in regex terminology. We use sets in a regular expressions along with recurring characters. We should note that although there are multiple characters in the set, they correspond to exactly one character in the match.
Code Example:
// find [t or m], and then "op"
alert('Mop top'.match(/[tm]op/gi)); // "Mop", "top"
// Return null or no matches
// find "V", then [o or i], then "la"
alert('Voila'.match(/V[oi]la/)); // null, no matches
Can you explain how range gets set in a regular expression?
View Answer:
Interview Response: In simple terms, a range in a regular expression is denoted or expressed inside of square brackets. A range may be from [a-z] or [1-100]. These settings can be set based on your needs. We can also use character classes inside […]. For example, if we would like to look for a wordy character \w or a hyphen -, then the set is [\w-]. Combining multiple classes is also possible, e.g. [\s\d] means “a space character or a digit”.
Code Example:
alert('Exception 0xAF'.match(/x[0-9A-F][0-9A-F]/g)); // xAF
Is there a way to handle Han (Chinese) or Cyrillic in regexp ranges?
View Answer:
Interview Response: Yes, we can write a universal pattern that looks for wordy characters in any language. That is easy; we would use the Unicode properties in regular expressions. You should note that Internet Explorer does not support Unicode properties, and if we need them, we can use library XRegExp for IE users.
Code Example:
let regexp = /[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]/gu;
let str = `Hi 你好 12`;
// finds all letters and digits:
alert(str.match(regexp)); // H,i,你,好,1,2
How do you exclude a range of characters in regular expressions?
View Answer:
Interview Response: If we want to exclude a range of characters in a regular expression. We can place the caret ^ character at the start and match any character except the ones we are trying to match.
Code Example:
alert('alice15@gmail.com'.match(/[^\d\sA-Z]/gi)); // returns @ and .
Do we have to escape special characters in regex sets or ranges?
View Answer:
Interview Response: No, there is no need to escape special characters in regex ranges or sets. The only characters that we escape are the caret ^ and the closing bracket, and they are not escaped in the technical sense but rather implicitly. This technicality does not mean we cannot escape characters, but it is not necessary in most cases.
Code Example:
// No need to escape
let regexp = /[-().^+]/g;
alert('1 + 2 - 3'.match(regexp)); // Matches +, -
// Escaped everything
let regexp = /[\-\(\)\.\^\+]/g;
alert('1 + 2 - 3'.match(regexp)); // also works: +, -
What is the recommended way to match against surrogate pairs in a set or range?
View Answer:
Interview Response: If there are surrogate pairs in the set, flag u is required for them to work correctly. This requirement also applies to a range of surrogate pairs.
Code Example:
// SET: look for 𝒳
alert('𝒳'.match(/[𝒳𝒴]/u)); // returns 𝒳
// RANGE: look for characters from 𝒳 to 𝒵
alert('𝒴'.match(/[𝒳-𝒵]/u)); // returns 𝒴