Performance of Perl Regular Expressions

Perl owes much of its power to regular expressions, making programming easier and more intuitive. But this power often comes at a cost – parsing and processing regular expressions is a non-trivial task that uses quite a few CPU cycles.

In some situations it is possible to significantly reduce the amount of time necessary for a particular processing routine by avoiding regular expressions altogether and using other techniques. Consider, for example, a task of extracting file name from a string containing a full path. Presented below are three functions that accomplish this using three different methods, along with their performance analysis.

Function filename1 splits the path into parts using path separator and returns the last part as the file name:

sub filename1
{
    my @tok = split /\//,$_[0];
    return $tok[-1];
}

The same result can be achieved using a slightly different approach – write a regexp that takes the part between the last path separator and the end of string:

sub filename2
{
   $_[0] =~ /\/([^\/]+)$/;   
   return $1;
}

Both functions are elegant and neat, but when it comes to performance, the following code is the best:

sub filename3
{
   return substr($_[0],rindex($_[0],'/')+1);
}

rindex() finds the index of the rightmost separator character in the string and then the file name is extracted using substr().

Here’s how long it takes all three functions to parse ‘/abc/def/geg-geg/page.pl’ 30000 times:

function1: 0.580 seconds
function2: 0.541 seconds
function3: 0.260 seconds

With a longer path ‘/abc/def/geg-geg/page.pl/abc/def/geg-geg/page.pl/ abc/def/geg-geg/page.pl/abc/def/geg-geg/page.pl’ the difference is more dramatic:

function1: 1.191 seconds
function2: 1.101 seconds
function3: 0.271 seconds

These are typical performance figures produced by the Perl profiler (Devel::DProf.) In this particular example, replacing a regular expression with search and substring extraction has reduced execution time by 50 to 75%.

Leave a Reply