« July 9, 2006 | Main | July 13, 2006 »
Monday, July 10, 2006
Perl: goto vs. Pattern Match Variables
If you're not a Perl programmer, the following will make no sense whatsoever and you should stop reading now. Still with me? All right, away we go. I've been programming in Perl for almost fifteen years, but there's always something in there to surprise you. Here's one that doesn't involve any of the fancy new stuff such as Unicode, threads, objects, or even modules: just classic Perl constructs which have been around since Perl 4.0 was released in 1991. In keeping with the tradition of programming puzzles, I'll frame the issue as the question: “What does the following program print?”
    use strict;
    use warnings;
    
    my $s = "one,two,three";    
    $s =~ m/^(\w*),(\w*),(\w*)$/;    
    print("($1) ($2) ($3)\n");    
    if (1) {
    	goto heck;
    }    
    print("Let's not go there.\n");    
heck:
    print("($1) ($2) ($3)\n");
That seems pretty simple, doesn't it?  And you'd probably guess it prints:
(one) (two) (three) (one) (two) (three)…but it doesn't. In fact, the output from this program is:
(one) (two) (three)
Use of uninitialized value in
    concatenation (.) or string at line 12.
Use of uninitialized value in
    concatenation (.) or string at line 12.
Use of uninitialized value in
    concatenation (.) or string at line 12.
() () ()
Now that's odd, isn't it?  The only thing we did between the print statement which worked and the one that reported the pattern match variables undefined was the goto, but it was a goto from the nested scope within the if statement back to the global scope in which the pattern match occurred and the match variables were set.  You'd expect them to be undefined if you jumped to a scope outside that in which they were set, but not on a jump back to their own scope.
But that's how it works!  If you comment out the if statement and closing bracket:
    use strict;
    use warnings;
    
    my $s = "one,two,three";    
    $s =~ m/^(\w*),(\w*),(\w*)$/;    
    print("($1) ($2) ($3)\n");    
#    if (1) {
    	goto heck;
#    }
    print("Let's not go there.\n");    
heck:
    print("($1) ($2) ($3)\n");
then there are no warning messages and the program prints the match variables twice as you'd expect.
It appears that a goto from one scope to another causes the pattern match variables to be undefined, even if the goto is to the same scope in which they were set.  This seems counterintuitive to me, but there may be a perfectly good reason for it.  Of course, purists will argue that simply using a goto statement in a program puts one in a state of sin, and I have much sympathy for this view.  In this case, I was coding a quick-reject function for the Gardol denial of service attack mitigation tool, and since the code in question gets executed for every HTTP access to the Web site, which is more than half a million times a day, I wanted to minimise the amount of code which would be executed after determining that a given request was malign.  Also, the goto was never explicitly coded, but used inside a literate programming macro, so a reader of the code at that level would never see it, but rather:<Deem request malign and banish client>
The undefining of pattern match variables occurs only on recent versions of Perl: I discovered it on Perl version 5.8.6 as supplied with the Fedora Core 4 Linux distribution [perl-5.8.6-24 i386-linux-thread-multi], but it does not happen with Perl 4.036 or Perl 5.004 on the sun4-solaris architecture (with the source code modified to remove features absent in those versions, which I have highlighted in brown type in the program listings). I do not know in which release between 5.004 and 5.8.6 the present behaviour first manifested itself. The issue which causes the loss of the match variable definitions appears to be the change in scope, not the conditional goto; if you replace the if clause with the construction “goto heck if 1;” then the definitions are not lost even on Perl 5.8.6.
