Beginner’s Guide to Regular Expression (Regex)

A regular expression is a set of characters forming a pattern that can be searched in a string. Regex can be used for validation such as to validate credit card numbers, for search i.e. via complex text matches, and for replacing matched text with another string. It also has great multiple language support – learn it once and you can use it across many programming languages.

I’ve seen few people take a first look at regex, and ignore it completely. I don’t blame them; regex’s syntax is complex and will make many cringe just like those command line languages, only worse. But then every new thing is scary and seem impossible to learn at first. So, borrowing Horatius’ words I’ll say this; Begin, be bold, and venture to be wise.

About Regex

Regex had its roots in neuroscience and mathematics and was only implemented in programming in 1968 by Ken Thompson in QED text editor for text search. Now it’s part of many programming languages like Perl, Java, Python, Ruby, and JavaScript.

Let’s look at some examples on how regex works.

I’ll be using JavaScript in my examples. Now, to pass beginner level, you need to learn all the characters, classes, quantifiers, modifiers and methods used in regex. Here’s a link to Mozilla Developer Network’s Regular Expression page where you can view a table containing all those. You can also refer to the cheatsheet at the end of this post with most used characters.

Let’s see a simple example with an explanation. This is a regex.

This is what the above regex will look for in a line, a character ‘B’ followed by at least one of any character between (and including) ‘a’ to ‘z’, ‘A’ to ‘Z’ and numbers 0 to 9.

Here’s a sample of matches in a line highlighted:

Basket, bulb, B12 vitamin, BaSO4, N BC company

The above regex will stop the search at Basket and return a positive response. That’s because the global modifier ‘g‘ has to be specified if you want the regex to look into all the possible matches.

Now, let’s see how to use this expression in JavaScript. The test method goes: if found a match return true, else false.

	var input = "your test string", regex = /B[a-zA-Z\d]+/;
	if(!regex.test(input))
	    alert('No match is found');
	else
	    alert('A match is found');

Let’s try another method: match returns the matches found in an array.

	var input = "your test string", 
	    regex = /B[a-zA-Z\d]+/g, 
	    /*I've added the global modifier 'g' to the regex to get all the matches*/
	    ary = input.match(regex);    
	if(ary===null)
	    alert('No match is found');
	else
	    alert('matches are: ' + ary.toString());

How about string replace? Let’s try that with regex now.

	var input = "your test string", 
	    regex = /B[a-zA-Z\d]+/g;
	alert(input.replace(regex, "#"));

Below is a codepen for you to tweak. Click the “JavaScript” tab to view the JS code.

Exercises

For exercises, you can google “regex exercises” and try solving them. Here’s what to expect when attempting these exercises, according to difficulty levels.

Basic

To me being able to validate a password is enough for starters. So, validate a password for 8 to 16 character length, alphanumeric with your choice of special characters allowed.

Intermediate

This is where you should practice with more real world data and learn few more regex points like lookahead, lookbehind assertions and matching groups;

  • Validate PIN codes, hexadecimals, dates, email ID, floating point.
  • Replace trailing zero, whitespaces, a set of matching words
  • Extract different parts of a URL

Advanced

You can optimize the above exercises’ solutions – the most optimum regex for email has thousands of characters in it – so take it as far as you feel comfortable with and that’s enough. You can also try:

  • Parsing HTML or XML (eventhough in the real world it is discouraged to do so because using regular expression to parse non-regular language like HTML will never make it foolproof. Plus XML parsing is a difficult task, more suitable for advanced level users)
  • Replacing tags
  • Removing comments (except the IE conditional comments)

Tools

Tools to visualize regex are one of the coolest things out there for me. If you ever come across a long complex regex, just copy paste them into one of those tools and you’ll be able to view the flow clearly. Besides that, there are many tools that you can use to fiddle with the regex code. They also showcase examples and cheatsheets along with share features.

  • Debuggex – It draws a regex diagram as per your input and you can make a quick share to StackOverflow right from there.
  • RegExr – You can test your regex with this one. It also got reference, a cheatsheet and examples to help you out.
  • Refiddle – At the moment, other than JavaScript, you can also fiddle with Ruby and .NET versions of regex in it.

Regex Cheatsheet

Token Definition
[abc] Any single character a, b or c
[^abc] Any character other than a, b or c
[a-z] Character between(including) a to z
[^a-z] Character except from a to z
[A-Z] Character between(including) A to Z
. Any single character
\s Any whitespace character
\S Any non-whitespace character
\d Any digit 0 to 9
\D Any non-digit
\w Any word character (letter, number & underscore)
\W Any non-word character
(…) Capture everything enclosed
(a|b) Match either a or b
a? Character a is either absent or present one time
a* Character a is either absent or present more times
a+ Character a is present one or more times
a{3} 3 occurences of character a consecutively
a{3,} 3 or more occurences of character a consecutively
a{3,6} 3 to 6 occurences of character a consecutively
^ Start of string
$ End of string
\b A word boundary. If a character is a word’s last or first word character or If a character is between a word or non-word character
\B Non-word boundary

Now Read:Regular Expressions: 30 Useful Tools and Resources

FacebookTwitterInstagramPinterestLinkedInGoogle+YoutubeRedditDribbbleBehanceGithubCodePenEmail