[% setvar title Behavior of empty regex should be simple %]

This file is part of the Perl 6 Archive

To see what is currently happening visit http://www.perl6.org/

TITLE

Behavior of empty regex should be simple

VERSION

  Maintainer: Mark Dominus <mjd@plover.com>
  Date: 24 Aug 2000
  Last Modified: 11 Sep 2000
  Mailing List: perl6-language-regex@perl.org
  Number: 144
  Version: 3
  Status: Frozen

ABSTRACT

Standard Documentation

According to perlop:

This behavior should be changed. If the PATTERN is empty, Perl should look for the empty string. (That is, if the PATTERN is empty, it should always match.)

DESCRIPTION

Literal empty patterns, such as:

        $s =~ // ;

are not the problem here. The real problem is that the special case is invoked for interpolated patterns also. For example,

        chomp($pat = <STDIN>);
        $s =~ /\Q$pat\E/;

looks to see if $pat is a substring of $s, unless $pat is empty, in which case it matches $s against the last regex that was matched successfully. That regex might be far away, in some other module. If the far-away regex happened to contain backreference groups, the backreference variables will be set accordingly.

To make this safe in Perl 5, the programmer has to write something peculiar like

        $s =~ /(?=)\Q$pat\E/;

to ensure that the regex, after interpolation, is never empty.

I propose that this 'last successful match' behavior be discarded entirely, and that an empty pattern always match the empty string.

RATIONALE

The Feature Was Not Useful, I

The special behavior for empty patterns has never been particularly useful. For example, you could imagine code like this:

        for $pat (@patterns) {
          if ($a =~ /$pat/ && $b =~ //) {
            # do something 
          }
        }

This would be more efficient than the equivalent

        for $pat (@patterns) {
          if ($a =~ /$pat/ && $b =~ /$pat/) {
            # do something 
          }
        }

because $pat would be compiled only once per loop instead of twice. It is now more straightforward and efficient to do this sort of thing explicitly with the qr// operator:

        @patterns = map qr/$_/, @patterns;
        for $pat (@patterns) {
          if ($a =~ /$pat/ && $b =~ /$pat/) {
            # do something 
          }
        }

The Feature Was Not Useful, II

People sometimes propose the following use for the empty pattern special case: They have a pattern, and many strings, and they want to see if every string matches the pattern. This code works, but is inefficient:

        sub match_all {
          my $pat = shift;
          for (@_) {
            return 0 unless /$pat/;
          }
          return 1; 
        }

This is because /$pat/ must be recompiled for each string, or checked to see whether recompilation is necessary.

This code does not work:

        sub match_all {
          my $pat = shift;
          for (@_) {
            return 0 unless /$pat/o;
          }
          return 1; 
        }

because $pat changes with each call.

One solution is to use 'eval' here to generate the pattern matching code (with /o) at run time.

People have sometimes tried to use // here, but usually without success. The idea is:

        sub match_all {
          my $pat = shift;
          # load $pat into 'last successfully matched' space
          for (@_) {
            return 0 unless //;
          }
          return 1;
        }

The problem here is that there is no way to designate $pat as the last successfully matched regex without actually finding a string that matches it. In the past people attempting this strategy have appeared in comp.lang.perl.misc asking how to find a string that matches a given regex. As far as I know, no useful solutions have been offered. (In fact, there may not be any such string. Consider the pattern /a\bx/ for example.)

A better, simpler solution to this problem is to use the qr operator:

        sub match_all {
          my $pat = shift;
          $pat = qr($pat);
          for (@_) {
            return 0 unless /$pat/;
          }
          return 1; 
        }

This feature has resulted in bugs

Any code that contains the innocent-looking

        if (/\Q$string\E/) {
          ...
        }

is potentially booby-trapped. Such code is common. An example of this type appears in perlfaq6.

Alternatives

Rather than eliminating the special case entirely, alternative changes are sometimes proposed.

Empty pattern to mean 'last match' instead of 'last successful match'

This behavior would be more useful than the current behavior and is sometimes proposed as an alternative.

For example, the application discussed in the section 'The feature was not useful, II' above would be feasible if the empty pattern matched the last-matched pattern, because it would no longer be necessary to manufacture a matching string. But it is simpler and more straightforward to solve the problem with qr//.

Moreover, this alternative behavior retains all the drawbacks of the current behavior: It yields subtle and intermittent bugs and introduces strange action-at-a-distance effects.

Retain special behavior for literal empty pattern?

In the past it has been proposed that the special behavior be discarded for patterns with interpolated variables, but retained for empty patterns that appear literally in the source. /\Q$string\E/ would look for $string, regardless of whether $string was empty, becayse /\Q$string\E/ is not a literal empty pattern. But

        $s =~ // ;

would still retain the special behavior and match $s against whatever was the last pattern to be successfully matched.

Since the feature is of marginal usefulness (especially now that qr// is available) it should be eliminated anyway to reduce code and documentation bloat. However, see Translation Issues below.

IMPLEMENTATION

No special implementation is necessary.

TRANSLATION ISSUES

Old Perl 5 scripts that may depend on this feature may be hard to translate unless the feature is implemented in Perl 6 also.

If the literal empty pattern were to retain the special behavior, then Perl 5 code like this:

        if (/$FOO/) {
          ...
        }

could be translated to Perl 6 code like this:

        if (do { my $_tmp = $FOO; ($_tmp eq '') ? // : /$_tmp/ }) {
          ...
        }

However, it's not clear that such translation is actually desirable.

REFERENCES

perlop discussion of empty patterns, quoted above.