Using a period as a delimiter in the split() function
Author |
Message |
Garet #1 / 16
|
 Using a period as a delimiter in the split() function
Hi people, I am currently having a problem with using a period as a delimiter in the split function. I am attempting to separate segments of a domain name into elements of an array, using the period(s) in the domain name as the delimiter. The offending line of code is as follows:
At the moment a null value is passed to the array. If any of you have any suggestions for a way to get around this I would be very grateful. TIA, Gareth
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Bob Walto #2 / 16
|
 Using a period as a delimiter in the split() function
Quote:
> Hi people, > I am currently having a problem with using a period as a delimiter in the > split function. I am attempting to separate segments of a domain name into > elements of an array, using the period(s) in the domain name as the > delimiter. The offending line of code is as follows:
Gareth, the first argument to split is treated as a regular expression. A . is a regular expression metacharacter, and thus must be quoted if it is to be considered a literal period. Try:
Quote: > At the moment a null value is passed to the array. If any of you have any > suggestions for a way to get around this I would be very grateful. > TIA, > Gareth
-- Bob Walton
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Larry Rosl #3 / 16
|
 Using a period as a delimiter in the split() function
[Posted and a courtesy copy sent.]
Quote: > I am currently having a problem with using a period as a delimiter in the > split function. I am attempting to separate segments of a domain name into > elements of an array, using the period(s) in the domain name as the > delimiter. The offending line of code is as follows:
> At the moment a null value is passed to the array. If any of you have any > suggestions for a way to get around this I would be very grateful.
Yes. Instead of writing the argument for split() as a string, write it as what it really is, a regex. Then you will see that '.' is a regex metacharacter, which needs to be escaped to have its literal meaning.
-- (Just Another Larry) Rosler Hewlett-Packard Laboratories http://www.hpl.hp.com/personal/Larry_Rosler/
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Rick Delane #4 / 16
|
 Using a period as a delimiter in the split() function
[posted & mailed] [removed ancient, defunct comp.lang.perl] Quote:
> The offending line of code is as follows:
Whenever a funtion is giving you results you don't expect you should ALWAYS look it up in the manual. perldoc -f split Once you look this up you will see that the first argument to split is a pattern, not a string and so special pattern matching characters like '.' must be escaped (unless you want their special meaning). perldoc perlre
Of course, when presented with the code split ':', $passwd; it's no wonder people think split works on strings. Please go smack the person that showed you code like that when split /:/, $passwd; would be clearer. It's getting to the point where I'd like to see split '' deprecated in favour of split m'' -- Rick Delaney
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Larry Rosl #5 / 16
|
 Using a period as a delimiter in the split() function
... Quote: > Of course, when presented with the code > split ':', $passwd; > it's no wonder people think split works on strings. Please go smack the > person that showed you code like that when > split /:/, $passwd; > would be clearer. > It's getting to the point where I'd like to see > split '' > deprecated in favour of > split m''
You are pointing out a flaw in the evolution of this function: the interpretation of what is syntactically a string to have the semantics of a regex. The frequency of programmer error (often for '|') would indicate the appropriateness of some corrective action. Here is a comparable situation. Suppose someone proposed extending the semantics of $/ to include regexes. (I know this has been discussed to death, but perhaps might be reconsidered.) With our current knowledge, it would not be done by changing the semantics of a string assigned to $/ to those of a regex. A line separator of '0.0' would not suddenly become equivalent to /0.0/ ! Instead an assignment such as $/ = qr/0.0/; would be required to get the regex semantics. A flag could be set for the input routine in this case, and unset if a string were later assigned to $/, making it possible to preserve the efficiency of the string input terminator. I conjecture that split() originally took a string as the separator, and then when split() was later enhanced to take a regex as the separator the string arguments were grandfathered for backwards compatibility in existing code. Changing their semantics to regex semantics feels like a mistake to me. Except for the magic of split ' ', I don't see what purpose is served now by having two ways of saying the same thing, one of which is fraught with surprises. I agree that no educator should ever show split with a string argument (except for the magic case), unless and until the string semantics were obeyed when a string is specified. (Someone who would write a string argument in the expectation of regex semantics is malicious, at best.) Until then, a warning seems like a good thing. -- (Just Another Larry) Rosler Hewlett-Packard Laboratories http://www.hpl.hp.com/personal/Larry_Rosler/
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Tom Christianse #6 / 16
|
 Using a period as a delimiter in the split() function
[courtesy cc of this posting mailed to cited author] In comp.lang.perl.misc,
:I conjecture that split() originally took a string as the separator, and :then when split() was later enhanced to take a regex as the separator :the string arguments were grandfathered for backwards compatibility in :existing code. I find that unlikely, since awk's split() has always supported a regex at that point. Certainly perl1 already had it. % man perl1 [...] 1 split(/PATTERN/,EXPR) 2 split(/PATTERN/) 3 split 4 Splits a string into an array of strings, and returns it. 5 If EXPR is omitted, splits the $_ string. If PATTERN is also 6 omitted, splits on whitespace (/[ \t\n]+/). Anything matching 7 PATTERN is taken to be a delimiter separating thefields. 8 (Note that the delimiter may be longer than one character.) 9 Trailing null fields are stripped, which potential users of pop() 10 would do well to remember. A pattern matching the null string 11 will split into separate characters. 12 13 Example: 14 15 open(passwd, '/etc/passwd'); 16 while (<passwd>) { 17 ($login, $passwd, $uid, $gid, $gcos, $home, $shell) 18 = split(/:/); 19 ... 20 } 21 22 (Note that $shell above will still have a newline on it. 23 See chop().) See also join. 24 That's all it had. Now compare with: % man 3pl split 1 split /PATTERN/,EXPR,LIMIT 2 split /PATTERN/,EXPR 3 split /PATTERN/ 4 split 5 6 Splits a string into a list of strings and returns that list. 7 By default, empty leading fields are preserved, and empty trailing 8 ones are deleted. 9 10 If not in list context, returns the number of fields found and
13 it still returns the list value.) The use of implicit split to
15 arguments. 16 17 If EXPR is omitted, splits the `$_' string. If PATTERN is 18 also omitted, splits on whitespace (after skipping any leading 19 whitespace). Anything matching PATTERN is taken to be a delimiter 20 separating the fields. (Note that the delimiter may be longer 21 than one character.) 22 23 If LIMIT is specified and positive, splits into no more than 24 that many fields (though it may split into fewer). If LIMIT is 25 unspecified or zero, trailing null fields are stripped (which 26 potential users of `pop' would do well to remember). If LIMIT 27 is negative, it is treated as if an arbitrarily large LIMIT had 28 been specified. 29 30 A pattern matching the null string (not to be confused with a 31 null pattern `//', which is just one member of the set of patterns 32 matching a null string) will split the value of EXPR into separate 33 characters at each point it matches that way. For example: 34 35 print join(':', split(/ */, 'hi there')); 36 37 produces the output 'h:i:t:h:e:r:e'. 38 39 The LIMIT parameter can be used to split a line partially 40 41 ($login, $passwd, $remainder) = split(/:/, $_, 3); 42 43 When assigning to a list, if LIMIT is omitted, Perl supplies 44 a LIMIT one larger than the number of variables in the list, 45 to avoid unnecessary work. For the list above LIMIT would have 46 been 4 by default. In time critical applications it behooves 47 you not to split into more fields than you really need. 48 49 If the PATTERN contains parentheses, additional list elements 50 are created from each matching substring in the delimiter. 51 52 split(/([,-])/, "1-10,20", 3); 53 54 produces the list value 55 56 (1, '-', 10, ',', 20) 57 58 If you had the entire header of a normal Unix email message 59 in $header, you could split it up into fields and their values 60 this way: 61 62 $header =~ s/\n\s+/ /g; # fix continuation lines 63 %hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header); 64 65 The pattern `/PATTERN/' may be replaced with an expression to 66 specify patterns that vary at runtime. (To do runtime compilation 67 only once, use `/$variable/o'.) 68 69 As a special case, specifying a PATTERN of space (`' '') will 70 split on white space just as `split' with no arguments does. 71 Thus, `split(' ')' can be used to emulate awk's default behavior, 72 whereas `split(/ /)' will give you as many null initial fields 73 as there are leading spaces. A `split' on `/\s+/' is like a 74 `split(' ')' except that any leading whitespace produces a null 75 first field. A `split' with no arguments really does a `split(' 76 ', $_)' internally. 77 78 Example: 79 80 open(PASSWD, '/etc/passwd'); 81 while (<PASSWD>) { 82 ($login, $passwd, $uid, $gid, 83 $gcos, $home, $shell) = split(/:/); 84 #... 85 } 86 87 (Note that $shell above will still have a newline on it. See 88 the chop, chomp, and join entries elsewhere in this document.) Lots more to read. Maybe it's time to resurrect the perl1 manpage. :-( --tom --
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Larry Rosl #7 / 16
|
 Using a period as a delimiter in the split() function
[Posted and a courtesy copy sent.]
Quote: > In comp.lang.perl.misc,
> :I conjecture that split() originally took a string as the separator, and > :then when split() was later enhanced to take a regex as the separator > :the string arguments were grandfathered for backwards compatibility in > :existing code. > I find that unlikely, since awk's split() has always supported a > regex at that point. Certainly perl1 already had it.
Yes, you are surely right. The pattern came first. So let's see how the string sneaked in, from your quotes. Quote: > % man perl1 > [...] > 1 split(/PATTERN/,EXPR) > 2 split(/PATTERN/) > 3 split ... > That's all it had. Now compare with:
Not a word about strings. Quote: > % man 3pl split > 1 split /PATTERN/,EXPR,LIMIT > 2 split /PATTERN/,EXPR > 3 split /PATTERN/ > 4 split ... > 69 As a special case, specifying a PATTERN of space (`' '') will > 70 split on white space just as `split' with no arguments does. > 71 Thus, `split(' ')' can be used to emulate awk's default behavior, > 72 whereas `split(/ /)' will give you as many null initial fields > 73 as there are leading spaces. A `split' on `/\s+/' is like a > 74 `split(' ')' except that any leading whitespace produces a null > 75 first field. A `split' with no arguments really does a `split(' > 76 ', $_)' internally.
So here is the only string mentioned, the magic " ". Not one example of any other string being a valid argument. So where did it suddenly materialize from in people's minds? All I can think of is a false analogy to join 'string' => ... Certainly the use of any other string than " " (as a literal, however quoted) should be warned against, and ultimately rejected. In other words, it should be deprecated. ... Quote: > Lots more to read. Maybe it's time to resurrect the perl1 manpage. :-(
That wouldn't help here. -- (Just Another Larry) Rosler Hewlett-Packard Laboratories http://www.hpl.hp.com/personal/Larry_Rosler/
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Garet #8 / 16
|
 Using a period as a delimiter in the split() function
Thanks for all the help on this one guys, much appreciated! Gareth
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Jonathan Stow #9 / 16
|
 Using a period as a delimiter in the split() function
Quote: > You are pointing out a flaw in the evolution of this function: the > interpretation of what is syntactically a string to have the semantics > of a regex. The frequency of programmer error (often for '|') would > indicate the appropriateness of some corrective action.
The '|' was certainly the first one that bit me but that was not to do with a misunderstanding of the pattern nature of the first argument but ignorance of the metacharacter nature of '|' but then there wasnt a 'perlre' manpage in them days ... /J\ --
<http://www.gellyfish.com> Hastings: <URL:http://dmoz.org/Regional/UK/England/East_Sussex/Hastings>
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Larry Rosl #10 / 16
|
 Using a period as a delimiter in the split() function
Quote:
> > You are pointing out a flaw in the evolution of this function: the > > interpretation of what is syntactically a string to have the semantics > > of a regex. The frequency of programmer error (often for '|') would > > indicate the appropriateness of some corrective action. > The '|' was certainly the first one that bit me but that was not to do > with a misunderstanding of the pattern nature of the first argument but > ignorance of the metacharacter nature of '|' but then there wasnt a 'perlre' > manpage in them days ...
The compiler should warn at compile-time about any first argument for split() that is a string literal (other than a single space, of course, to preserve the magic). I seldom see a variable used as the first argument for split(), but perhaps a run-time warning for that would also be appropriate. -- (Just Another Larry) Rosler Hewlett-Packard Laboratories http://www.hpl.hp.com/personal/Larry_Rosler/
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Rick Delane #11 / 16
|
 Using a period as a delimiter in the split() function
Quote:
> The compiler should warn at compile-time about any first argument for > split() that is a string literal (other than a single space, of course, > to preserve the magic).
I agree, though there will probably be a lot of old code generating new warnings. But it would serve people right for either: (a) blindly copying code without ever reading the manual or (b) using obscure notation that cause people in group (a) to come to clpm with their confusion. Quote: > I seldom see a variable used as the first argument for split(), but > perhaps a run-time warning for that would also be appropriate.
I think this would be going too far. There is nothing obscure about split $pattern, $string, $limit; -- Rick Delaney
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Larry Rosl #12 / 16
|
 Using a period as a delimiter in the split() function
Quote:
... > > I seldom see a variable used as the first argument for split(), but > > perhaps a run-time warning for that would also be appropriate. > I think this would be going too far. There is nothing obscure about > split $pattern, $string, $limit;
I didn't say that the compiler should complain about that at compile- time, as it is obviously acceptable. I said at run-time. I assumed that at run-time the function would have enough information about the first argument to determine if it is a string or a regex (qr//); to accept the latter, to warn about the former. But I don't know enough perl internals to say whether that is reasonable. -- (Just Another Larry) Rosler Hewlett-Packard Laboratories http://www.hpl.hp.com/personal/Larry_Rosler/
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Tom Christianse #13 / 16
|
 Using a period as a delimiter in the split() function
[courtesy cc of this posting mailed to cited author] In comp.lang.perl.misc,
:I assumed that at run-time the function would have enough information :about the first argument to determine if it is a string or a regex :(qr//); to accept the latter, to warn about the former. But I don't :know enough perl internals to say whether that is reasonable. I don't know that this is possible to do without incurring too much noise. I agree that it's unfortunate, but we lived so long without qr//, I don't see any reasonable way of wedging in a complaint that won't really{*filter*} people off. Think of all the functions people have that take regexes as arguments, which they then use as the split separator. % cd /usr/src/perl5.005_61 % tcgrep '\bsplit\s*\(?\s*["\044\042]' lib
lib/CGI/Cookie.pm: my($key,$value) = split("=");
lib/CGI.pm: $_[0]->param($_[1],split("\0",$_[2]));
lib/CPAN.pm: split("/","$sans.readme"),
lib/ExtUtils/MakeMaker.pm: $self->{DIR} = [grep $_, split ":", $self->{DIR}]; lib/File/Spec/Mac.pm: File::Spec->catdir(split(":",$path)) eq $path lib/Pod/Functions.pm: ($name, $type, $text) = split " ", $_, 3;
lib/Pod/Parser.pm: ($cmd, $text) = split(" ", $_, 2);
See what I mean? --tom -- Besides, it's good to force C programmers to use the toolbox occasionally. :-)
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
lt lindle #14 / 16
|
 Using a period as a delimiter in the split() function
:>I didn't say that the compiler should complain about that at compile- :>time, as it is obviously acceptable. I said at run-time. :>I assumed that at run-time the function would have enough information :>about the first argument to determine if it is a string or a regex :>(qr//); to accept the latter, to warn about the former. But I don't :>know enough perl internals to say whether that is reasonable. But Larry, I know of at least 4 programs that I have in production that would start producing warnings when they didn't before. It would have to be an optional new warning; not one you would get by default. -- // Lee.Lindley /// Programmer shortage? What programmer shortage?
//////////////////// 50 cent beers are in short supply too.
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
 |
Rick Delane #15 / 16
|
 Using a period as a delimiter in the split() function
Quote:
> > I think this would be going too far. There is nothing obscure about > > split $pattern, $string, $limit; > I didn't say that the compiler should complain about that at compile- > time, as it is obviously acceptable. I said at run-time.
Yes, but that doesn't change anything. The point here is that the code says what it means. It shouldn't matter that $pattern is not some qr// object. It's hard to read that code and think that $pattern is anything but a PATTERN. People using code like $pattern = '\s+'; shouldn't be forced to change it to $pattern = qr'\s+'; because they're getting a warning like First argument to split must be a PATTERN, not a string literal. That would be more confusing than split '|' not splitting |-separated fields. -- Rick Delaney
|
Wed, 18 Jun 1902 08:00:00 GMT |
|
|
Page 1 of 2
|
[ 16 post ] |
|
Go to page:
[1]
[2] |
|