Lead Image © Jakub Jirsak, 123RF.com

Regular expression security

Pass the Test

Article from ADMIN 55/2020

By Matthias W¸bbeling

Regular expressions are invaluable for checking user input, but a vulnerability could make them ripe for exploitation.

One important paradigm in software development, especially web applications, is careful validation of user input before allowing further processing. Much can go wrong if the input is not carefully checked. SQL injection and cross-site scripting attacks are just two of the most common examples of exploitation. Regular expressions are useful for checking user input, but even they are vulnerable to attacks. In this article, I show you how to check your regular expressions for vulnerabilities.

Regular Expressions

Regular expressions (regex) have become an established method of validating user input before processing to describe and check for permitted entries and to prevent certain characters (e.g., those with a special function in an application) from being entered. Unfortunately, some regular expressions can still cause unwanted behavior with certain types of input and make the script unusable – much like a denial of service attack.

A regular expression describes a language, wherein you define the language that you want to accept as input for an application. Email addresses provide a simple but useful example. The World Wide Web Consortium recommends the following regular expression for the language of email addresses:

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*$

When you see a string like this for the first time, the semantics are not immediately apparent, but regular expressions form a language that is actually quite easy to understand. The two characters ^ and $ describe the start and end of input, respectively.

The parentheses form groups and the square brackets ([ ]) classes of admissible input characters. In this case, no value other than those that would appear in an email address may be present in the input. The first class in the example comprises a-z (all lowercase

...

Use one of the options below to read the full article