How to write super simple and useful regular expressions for the real world27th Jan 2021
Regular expressions are HARD! They look so complicated, they’re turn me off completely most of the time. Sometimes I wished I was smarter so I can use them more effectively.
I’m going to show you a real example.
Best place to learn regular expressions
The best place to learn regular expressions is regexone.com. It gives you step-by-step challenges to work through, which helps to build your regex vocabulary. It also has a handy cheatsheet that I always refer back to when I need some regex help.
Now onward to the real-life example.
- They get to see how to use frameworks, which eases them into React/Vue and other frameworks they may want to learn.
I decided to call this framework
Tiny since it’s a small framework that’s not meant for production-use. There are many lessons you need to know — and one of them is a tiny bit of knowledge regarding regular expressions.
If you’re curious, here’s a draft of table of contents for this framework part.
Now on to the part where I needed regular expressions.
Extracting patterns from a string
When building Tiny, we added properties into children components via props. Since I wanted to keep things simple, we added all props into a
tiny-props HTML attribute.
Here’s an example of such an attribute.
<div tiny-props="[count, state.count]">...</div>
The child component should then get a
count property which corresponds to the value written inside the parent’s
The challenge here is to extract
state.count separately, so we can assign appropriate values.
This is simple if we only have one set of props.
- We can
and]` with empty strings
- Then we split the string at
- Then we trim any unnecessary whitespace.
const attribute = div.getAttribute('tiny-props') const props = attribute .replace('[', '') .replace(']', '') .split(',') .map(part => part.trim()) console.log(props)
The problem becomes slightly more complicated when we need to pass multiple props. In this case, I chose to separate each prop-value pair with square brackets.
<div tiny-props="[count, state.count] [message, state.message]">...</div>
We cannot same
replace code above for this new string. You’ll get weird results.
The culprit becomes obvious when omit the
trim parts. You can clearly see
replace only replaces the first instance of
const attribute = div.getAttribute('tiny-props') const props = attribute.replace('[', '').replace(']', '') console.log(props)
Fixing this is easy. We can use a regular expression with a
g flag. The
g flag signifies “global” (but I remember it as greedy 😂), allows the regular expression to match all occurrences of the specified value.
In this case, we need to escape
] with a
\ because square brackets mean something in regular expressions. The escape character tells the regular expression we’re literally searching for
const attribute = div.getAttribute('tiny-listener') const props = attribute.replace(/\[/g, '').replace(/\]/g, '') console.log(props)
At this point we can also replace all commas, then split the string at each empty space to put each value into an array.
const props = attribute .replace(/\[/g, '') .replace(/\]/g, '') .replace(/,/g, '') .split(' ') console.log(props)
At this point we can loop through the
props array to get the values we need. Each odd item is the property and each even item is the value needed.
Combining the regular expressions
Square brackets symbolizes OR in regular expressions. If we put any character inside square brackets, the regular expression will find the letter inside it.
So if a regular expression says `/[abc]/, it will look for letter a, or letter b, or letter c.
We can use this behaviour to combine all three
replace call into a single one.
const props = attribute.replace(/[\[\],]/g, '').split(' ') console.log(string)
This regular expression looks foreign and scary, but if you can trace back its origins (by splitting them up), then it’s not as scary as it seems.
Making the code more robust
Users can break the string by adding in unwanted spaces before or after the string. If they do this, we’ll end up empty items which throws the array into disarray (Ha! 😂).
<div tiny-props=" [count, state.count] [message, state.message] ">...</div>
The simple way to prevent these issues is to trim the string before passing
const props = attribute .trim() .replace(/[\[\],]/g, '') .split(' ') console.log(props)
Users can also break the implementation by adding extra whitespaces between commas or between square brackets.
<div tiny-props="[count, state.count] [message, state.message]"> ... </div>
The simplest way to fix this is, once again, with regular expressions. In this case, we’ll use split the string with
\smeans any whitespace character.
+means one or more instances of the whitespace character.
Once we add
+, multiple whitespaces don’t matter anymore. What matters is users actually leave one whitespace between values.
const props = attribute .trim() .replace(/[\[\],]/g, '') .split(/\s+/)
Finally, users can break this implementation (again) by omitting whitespaces between each value.
We can fix this by creating whitespaces intentionally when replacing
,. This creates extra whitespace in three places:
- The front of the string
- Between each item
- At the back of the string
The whitespace between each item can be stripped away with
\s+. The whitespace in front and behind can be removed by using
const props = attribute .replace(/[\[\],]/g, ' ') .trim() .split(/\s+/) console.log(props)
I hope this taught you a bit more about real-world regex usage. The regular expressions do not have to be complicated, unreadable, and overwhelming. It can be quite simple as long as you understand the principles behind it 😉.