=head1 TITLE Simplifying split() =head1 VERSION Maintainer: Sean M. Burke Date: 30 Sep 2000 Mailing List: perl6-language@perl.org Number: 361 Version: 1 Status: Developing =head1 ABSTRACT Perl 5's C function is messy, and should be simplified. =head1 DESCRIPTION Perl 5 split does five things that I think are just annoying, and which I suggest be removed: =over =item 1. The first argument to split is currently interpreted as a regexp, regardless of whether or not it actually is one. (Yes, C doesn't split on dot -- it's currently the same an C.) I suggest that split be changed to treat only regexps as regexps, and everything else as literals. =item 2. Empty trailing fields are currently suppressed (although a -1 as the third argument disables this). I suggest that empty trailing fields be retained by default. =item 3. When not in list context, split currently splits into @_. I suggest that this side-effect be removed. =item 4. split ?pat? in any context currently splits into @_. I suggest that this side-effect be removed. =item 5. split ' ' (but not split / /) currently splits on whitespace, but also removes leading empty fields. I suggest that this irregularity be removed. =back The last three of the above points speak for themselves. I will focus on the first two. Most notably, I suggest that Perl 6 C should work as most people expect -- splitting on a literal bar. (Under Perl 5, C is synonymous with C -- i.e., split on nullstring or nullstring [sic].) So I suggest: Perl 5: split /\|/, ... be synonymous with (and be better written as) Perl 6: split '|', ... # altho split /\|/, $bar... remains valid And as to the second point, the removal of trailing blanks, I suggest: Perl 5: @x = split /:/, $bar, -1; be synonymous with Perl 6: @x = split ':', $bar; If you want to remove trailing fields, under Perl 6 you should have to do it explicitly: Perl 5: @x = split /:/, $bar; be synonymous with Perl 6: @x = split ':', $bar; while(@x and !length $x[-1]) { pop @x } I believe that the current behavior of removing trailing empty fields is unintuitive and surprising to learners; nothing about the concept of splitting a string into a list suggests removing trailing empties. (Moreover, I find that when I need to remove empties, it's not just the trailing ones; so the current behavior is rarely just what I want.) =head1 IMPLEMENTATION I'll leave the C-coding details to the usual, capable implementers. But I will note one minor complication with my first suggestion (that literals and regexps be distinguished). Consider: Perl 6: @x = split $foo, $bar; I suggest that the correct approach is to treat $foo's value as a literal, unless it holds an object of class Regexp (or a class derived from it?), in which case it should be treated as if the above were: Perl 6: @x = split qr/$foo/, $bar; In other words, in such cases it is not possible to know at compile time whether a given "split" operator means literal-split or regexp-split. I note that such cases are rare. =head1 ALTERNATIVE APPROACH In conclusion, I'll note that there is a conservative alternative approach possible: if any of the above features of Perl 5 split seem really worth keeping, my suggestion for a "clean split" can be implemented as a separate operator called, for example, "cleave". (Consider the precedent of Perl 5 chomp being added alongside Perl 4 chop, not replacing it.) I would consider this suboptimal, though; I think that an operator with as straightforward and intuitive a name as "split" should behave in a straightforward and intuitive way. =head1 REFERENCES Nil.