Using a period as a delimiter in the split() function 
Author Message
 Using a period as a delimiter in the split() function

Hi people,

I am currently having a problem with using a period as a delimiter in the
split function. I am attempting to separate segments of a domain name into
elements of an array, using the period(s) in the domain name as the
delimiter. The offending line of code is as follows:

At the moment a null value is passed to the array. If any of you have any
suggestions for a way to get around this I would be very grateful.

TIA,
Gareth



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function

Quote:

> Hi people,

> I am currently having a problem with using a period as a delimiter in the
> split function. I am attempting to separate segments of a domain name into
> elements of an array, using the period(s) in the domain name as the
> delimiter. The offending line of code is as follows:



Gareth, the first argument to split is treated as a regular expression.
A . is a regular expression metacharacter, and thus must be quoted if
it is to be considered a literal period.  Try:

Quote:

> At the moment a null value is passed to the array. If any of you have any
> suggestions for a way to get around this I would be very grateful.

> TIA,
> Gareth

--
Bob Walton


Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function
[Posted and a courtesy copy sent.]



Quote:
> I am currently having a problem with using a period as a delimiter in the
> split function. I am attempting to separate segments of a domain name into
> elements of an array, using the period(s) in the domain name as the
> delimiter. The offending line of code is as follows:


> At the moment a null value is passed to the array. If any of you have any
> suggestions for a way to get around this I would be very grateful.

Yes.  Instead of writing the argument for split() as a string, write it
as what it really is, a regex.  Then you will see that '.' is a regex
metacharacter, which needs to be escaped to have its literal meaning.


--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function
[posted & mailed]
[removed ancient, defunct comp.lang.perl]

Quote:

> The offending line of code is as follows:



Whenever a funtion is giving you results you don't expect you should
ALWAYS look it up in the manual.

perldoc -f split

Once you look this up you will see that the first argument to split is a
pattern, not a string and so special pattern matching characters like
'.' must be escaped (unless you want their special meaning).

perldoc perlre


Of course, when presented with the code

    split ':', $passwd;

it's no wonder people think split works on strings.  Please go smack the
person that showed you code like that when

    split /:/, $passwd;

would be clearer.

It's getting to the point where I'd like to see

    split ''

deprecated in favour of

    split m''

--
Rick Delaney



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function


...

Quote:
> Of course, when presented with the code

>     split ':', $passwd;

> it's no wonder people think split works on strings.  Please go smack the
> person that showed you code like that when

>     split /:/, $passwd;

> would be clearer.

> It's getting to the point where I'd like to see

>     split ''

> deprecated in favour of

>     split m''

You are pointing out a flaw in the evolution of this function:  the
interpretation of what is syntactically a string to have the semantics
of a regex.  The frequency of programmer error (often for '|') would
indicate the appropriateness of some corrective action.

Here is a comparable situation.  Suppose someone proposed extending the
semantics of $/ to include regexes.  (I know this has been discussed to
death, but perhaps might be reconsidered.)

With our current knowledge, it would not be done by changing the
semantics of a string assigned to $/ to those of a regex.  A line
separator of '0.0' would not suddenly become equivalent to /0.0/ !  
Instead an assignment such as

    $/ = qr/0.0/;

would be required to get the regex semantics.  A flag could be set for
the input routine in this case, and unset if a string were later
assigned to $/, making it possible to preserve the efficiency of the
string input terminator.

I conjecture that split() originally took a string as the separator, and
then when split() was later enhanced to take a regex as the separator
the string arguments were grandfathered for backwards compatibility in
existing code.  Changing their semantics to regex semantics feels like a
mistake to me.

Except for the magic of split ' ', I don't see what purpose is served
now by having two ways of saying the same thing, one of which is fraught
with surprises.  I agree that no educator should ever show split with a
string argument (except for the magic case), unless and until the string
semantics were obeyed when a string is specified.  (Someone who would
write a string argument in the expectation of regex semantics is
malicious, at best.) Until then, a warning seems like a good thing.

--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function
     [courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc,

:I conjecture that split() originally took a string as the separator, and
:then when split() was later enhanced to take a regex as the separator
:the string arguments were grandfathered for backwards compatibility in
:existing code.  

I find that unlikely, since awk's split() has always supported a
regex at that point.  Certainly perl1 already had it.

% man perl1
[...]
 1 split(/PATTERN/,EXPR)
 2 split(/PATTERN/)
 3 split  
 4      Splits a string into an array of strings, and returns it.
 5      If EXPR is omitted, splits the $_ string.  If PATTERN  is also
 6      omitted, splits on whitespace (/[ \t\n]+/).  Anything matching
 7      PATTERN is taken to be a delimiter separating thefields.
 8      (Note that the delimiter may be longer than one character.)
 9      Trailing null fields are stripped, which potential users of pop()
10      would do well to remember.  A pattern matching the null string
11      will split into separate characters.
12
13         Example:
14
15              open(passwd, '/etc/passwd');
16              while (<passwd>) {
17                   ($login, $passwd, $uid, $gid, $gcos, $home, $shell)
18                        = split(/:/);
19                   ...
20              }
21
22     (Note  that $shell above will still have a newline on it.
23     See chop().)  See also join.
24

That's all it had.  Now compare with:

% man 3pl split
 1  split /PATTERN/,EXPR,LIMIT
 2  split /PATTERN/,EXPR
 3  split /PATTERN/
 4  split
 5
 6      Splits a string into a list of strings and returns that list.
 7      By default, empty leading fields are preserved, and empty trailing
 8      ones are deleted.
 9
10      If not in list context, returns the number of fields found and


13      it still returns the list value.) The use of implicit split to

15      arguments.
16
17      If EXPR is omitted, splits the `$_' string. If PATTERN is
18      also omitted, splits on whitespace (after skipping any leading
19      whitespace). Anything matching PATTERN is taken to be a delimiter
20      separating the fields. (Note that the delimiter may be longer
21      than one character.)
22
23      If LIMIT is specified and positive, splits into no more than
24      that many fields (though it may split into fewer). If LIMIT is
25      unspecified or zero, trailing null fields are stripped (which
26      potential users of `pop' would do well to remember). If LIMIT
27      is negative, it is treated as if an arbitrarily large LIMIT had
28      been specified.
29
30      A pattern matching the null string (not to be confused with a
31      null pattern `//', which is just one member of the set of patterns
32      matching a null string) will split the value of EXPR into separate
33      characters at each point it matches that way.  For example:
34
35          print join(':', split(/ */, 'hi there'));
36
37      produces the output 'h:i:t:h:e:r:e'.
38
39      The LIMIT parameter can be used to split a line partially
40
41          ($login, $passwd, $remainder) = split(/:/, $_, 3);
42
43      When assigning to a list, if LIMIT is omitted, Perl supplies
44      a LIMIT one larger than the number of variables in the list,
45      to avoid unnecessary work. For the list above LIMIT would have
46      been 4 by default. In time critical applications it behooves
47      you not to split into more fields than you really need.
48
49      If the PATTERN contains parentheses, additional list elements
50      are created from each matching substring in the delimiter.
51
52          split(/([,-])/, "1-10,20", 3);
53
54      produces the list value
55
56          (1, '-', 10, ',', 20)
57
58      If you had the entire header of a normal Unix email message
59      in $header, you could split it up into fields and their values
60      this way:
61
62          $header =~ s/\n\s+/ /g;  # fix continuation lines
63          %hdrs   =  (UNIX_FROM => split /^(\S*?):\s*/m, $header);
64
65      The pattern `/PATTERN/' may be replaced with an expression to
66      specify patterns that vary at runtime. (To do runtime compilation
67      only once, use `/$variable/o'.)
68
69      As a special case, specifying a PATTERN of space (`' '') will
70      split on white space just as `split' with no arguments does.
71      Thus, `split(' ')' can be used to emulate awk's default behavior,
72      whereas `split(/ /)' will give you as many null initial fields
73      as there are leading spaces. A `split' on `/\s+/' is like a
74      `split(' ')' except that any leading whitespace produces a null
75      first field. A `split' with no arguments really does a `split('
76      ', $_)' internally.
77
78      Example:
79
80          open(PASSWD, '/etc/passwd');
81          while (<PASSWD>) {
82              ($login, $passwd, $uid, $gid,
83               $gcos, $home, $shell) = split(/:/);
84              #...
85          }
86
87      (Note that $shell above will still have a newline on it. See
88      the chop, chomp, and join entries elsewhere in this document.)

Lots more to read.  Maybe it's time to resurrect the perl1 manpage. :-(

--tom
--



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function
[Posted and a courtesy copy sent.]



Quote:
> In comp.lang.perl.misc,

> :I conjecture that split() originally took a string as the separator, and
> :then when split() was later enhanced to take a regex as the separator
> :the string arguments were grandfathered for backwards compatibility in
> :existing code.  

> I find that unlikely, since awk's split() has always supported a
> regex at that point.  Certainly perl1 already had it.

Yes, you are surely right.  The pattern came first.  So let's see how
the string sneaked in, from your quotes.

Quote:
> % man perl1
> [...]
>  1 split(/PATTERN/,EXPR)
>  2 split(/PATTERN/)
>  3 split  
...
> That's all it had.  Now compare with:

Not a word about strings.

Quote:
> % man 3pl split
>  1  split /PATTERN/,EXPR,LIMIT
>  2  split /PATTERN/,EXPR
>  3  split /PATTERN/
>  4  split
...
> 69      As a special case, specifying a PATTERN of space (`' '') will
> 70      split on white space just as `split' with no arguments does.
> 71      Thus, `split(' ')' can be used to emulate awk's default behavior,
> 72      whereas `split(/ /)' will give you as many null initial fields
> 73      as there are leading spaces. A `split' on `/\s+/' is like a
> 74      `split(' ')' except that any leading whitespace produces a null
> 75      first field. A `split' with no arguments really does a `split('
> 76      ', $_)' internally.

So here is the only string mentioned, the magic " ".

Not one example of any other string being a valid argument.  So where
did it suddenly materialize from in people's minds?  All I can think of
is a false analogy to join 'string' => ...

Certainly the use of any other string than " " (as a literal, however
quoted) should be warned against, and ultimately rejected.  In other
words, it should be deprecated.

...

Quote:
> Lots more to read.  Maybe it's time to resurrect the perl1 manpage. :-(

That wouldn't help here.

--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function
Thanks for all the help on this one guys, much appreciated!

Gareth



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function

Quote:

> You are pointing out a flaw in the evolution of this function:  the
> interpretation of what is syntactically a string to have the semantics
> of a regex.  The frequency of programmer error (often for '|') would
> indicate the appropriateness of some corrective action.

The '|' was certainly the first one that bit me but that was not to do
with a misunderstanding of the pattern nature of the first argument but
ignorance of the metacharacter nature of '|' but then there wasnt a 'perlre'
manpage in them days ...

/J\
--

<http://www.gellyfish.com>
Hastings: <URL:http://dmoz.org/Regional/UK/England/East_Sussex/Hastings>



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function


Quote:

> > You are pointing out a flaw in the evolution of this function:  the
> > interpretation of what is syntactically a string to have the semantics
> > of a regex.  The frequency of programmer error (often for '|') would
> > indicate the appropriateness of some corrective action.

> The '|' was certainly the first one that bit me but that was not to do
> with a misunderstanding of the pattern nature of the first argument but
> ignorance of the metacharacter nature of '|' but then there wasnt a 'perlre'
> manpage in them days ...

The compiler should warn at compile-time about any first argument for
split() that is a string literal (other than a single space, of course,
to preserve the magic).

I seldom see a variable used as the first argument for split(), but
perhaps a run-time warning for that would also be appropriate.

--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function

Quote:

> The compiler should warn at compile-time about any first argument for
> split() that is a string literal (other than a single space, of course,
> to preserve the magic).

I agree, though there will probably be a lot of old code generating new
warnings.  But it would serve people right for either:

(a) blindly copying code without ever reading the manual

or

(b) using obscure notation that cause people in group (a) to come to
clpm with their confusion.

Quote:
> I seldom see a variable used as the first argument for split(), but
> perhaps a run-time warning for that would also be appropriate.

I think this would be going too far.  There is nothing obscure about

    split $pattern, $string, $limit;

--
Rick Delaney



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function


Quote:

...
> > I seldom see a variable used as the first argument for split(), but
> > perhaps a run-time warning for that would also be appropriate.

> I think this would be going too far.  There is nothing obscure about

>     split $pattern, $string, $limit;

I didn't say that the compiler should complain about that at compile-
time, as it is obviously acceptable.  I said at run-time.

I assumed that at run-time the function would have enough information
about the first argument to determine if it is a string or a regex
(qr//); to accept the latter, to warn about the former.  But I don't
know enough perl internals to say whether that is reasonable.

--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function
     [courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc,

:I assumed that at run-time the function would have enough information
:about the first argument to determine if it is a string or a regex
:(qr//); to accept the latter, to warn about the former.  But I don't
:know enough perl internals to say whether that is reasonable.

I don't know that this is possible to do without incurring too much noise.
I agree that it's unfortunate, but we lived so long without qr//, I don't
see any reasonable way of wedging in a complaint that won't really{*filter*}
people off.

Think of all the functions people have that take regexes as arguments,
which they then use as the split separator.

% cd /usr/src/perl5.005_61
% tcgrep '\bsplit\s*\(?\s*["\044\042]' lib




lib/CGI/Cookie.pm:      my($key,$value) = split("=");


lib/CGI.pm:    $_[0]->param($_[1],split("\0",$_[2]));




lib/CPAN.pm:                    split("/","$sans.readme"),





lib/ExtUtils/MakeMaker.pm:      $self->{DIR} = [grep $_, split ":", $self->{DIR}];
lib/File/Spec/Mac.pm:     File::Spec->catdir(split(":",$path)) eq $path
lib/Pod/Functions.pm:    ($name, $type, $text) = split " ", $_, 3;



lib/Pod/Parser.pm:        ($cmd, $text) = split(" ", $_, 2);


See what I mean?

--tom
--
Besides, it's good to force C programmers to use the toolbox occasionally.  :-)



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function

:>I didn't say that the compiler should complain about that at compile-
:>time, as it is obviously acceptable.  I said at run-time.

:>I assumed that at run-time the function would have enough information
:>about the first argument to determine if it is a string or a regex
:>(qr//); to accept the latter, to warn about the former.  But I don't
:>know enough perl internals to say whether that is reasonable.

But Larry, I know of at least 4 programs that I have in production
that would start producing warnings when they didn't before.  It
would have to be an optional new warning; not one you would get by
default.

--
// Lee.Lindley   /// Programmer shortage?  What programmer shortage?

////////////////////    50 cent beers are in short supply too.



Wed, 18 Jun 1902 08:00:00 GMT  
 Using a period as a delimiter in the split() function

Quote:



> > I think this would be going too far.  There is nothing obscure about

> >     split $pattern, $string, $limit;

> I didn't say that the compiler should complain about that at compile-
> time, as it is obviously acceptable.  I said at run-time.

Yes, but that doesn't change anything.  The point here is that the code
says what it means.  It shouldn't matter that $pattern is not some qr//
object.  It's hard to read that code and think that $pattern is anything
but a PATTERN.

People using code like

    $pattern = '\s+';

shouldn't be forced to change it to

    $pattern = qr'\s+';

because they're getting a warning like

    First argument to split must be a PATTERN, not a string literal.

That would be more confusing than split '|' not splitting |-separated
fields.

--
Rick Delaney



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 16 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Using a period as a delimiter in the split() function

2. Using SPLIT function with a Period

3. PERLFUNC: split - split up a string using a regexp delimiter

4. PERLFUNC: split - split up a string using a regexp delimiter

5. Using Different Delimiters in Split

6. time period splitting

7. using split function twice on same line

8. splitting with embedded delimiters

9. SPLIT Problem - Embedded Delimiter

10. splitting but maintaining delimiter?

11. returning split() delimiters...

12. perlcc does not compile when split to array function is used

 

 
Powered by phpBB® Forum Software