splitting strings with more than one splitting char 
Author Message
 splitting strings with more than one splitting char

I'm trying to understand why when I try to split a string with split
characters <>, the interpreter does 2 passes; first looking for "<"
and then for ">".  I would've thought the it would do just one pass
looking for "<>" as a whole, but to me the manual isn't very clear on
this.  Is there a way to accomplish what I'm looking for?

Example:

% set string {xfg 17 <> cd7 <> 3t model no. 5345}
xfg 17 <> cd7 <> 3t model no. 5345
% set list [split $string "<>"]
{xfg 17 } {} { cd7 } {} { 3t model no. 5345}

What I really want is:
{xfg 17 } { cd7 } { 3t model no. 5345}

I guess I could do something like this:

% regsub -all "<>" $string ">" string
2
% set list [split $string ">"]
{xfg 17 } { cd7 } { 3t model no. 5345}

But there must be a better way.  Besides, the original string might
contain a lone ">" that should not be split

Thanks,
Shamil D.



Sat, 30 Sep 2006 06:12:21 GMT  
 splitting strings with more than one splitting char

Quote:

> I'm trying to understand why when I try to split a string with split
> characters <>, the interpreter does 2 passes; first looking for "<"
> and then for ">".  I would've thought the it would do just one pass
> looking for "<>" as a whole, but to me the manual isn't very clear on
> this.  Is there a way to accomplish what I'm looking for?

package require textutil
namespace import ::textutil::splitx

Quote:
> Example:

> % set string {xfg 17 <> cd7 <> 3t model no. 5345}
> xfg 17 <> cd7 <> 3t model no. 5345
> % set list [split $string "<>"]
> {xfg 17 } {} { cd7 } {} { 3t model no. 5345}

% set list [splitx $string <>]
{xfg 17 } { cd7 } { 3t model no. 5345}

--
| Don Porter          Mathematical and Computational Sciences Division |

| http://math.nist.gov/~DPorter/                                  NIST |
|______________________________________________________________________|



Sat, 30 Sep 2006 06:23:16 GMT  
 splitting strings with more than one splitting char

Quote:

> I'm trying to understand why when I try to split a string with split
> characters <>, the interpreter does 2 passes; first looking for "<"
> and then for ">".  I would've thought the it would do just one pass
> looking for "<>" as a whole, but to me the manual isn't very clear on
> this.  Is there a way to accomplish what I'm looking for?

The split man page says it splits "at each character that is in the
splitChars argument". That is why it splits on both < and >. It doesn't
make two passes, BTW. Just a single pass looking for each character in
the string of split characters.

tcllib has a function that will split strings using more than one
character as the split delimiter. A simple way to do this without tcllib
is to make two passes yourself; first replace all <>'s with some single
character you know isn't in your data, then split on that character.

The following example assumes you don't have \001 (control-a) in your data:

     set newdata [string map [list <> \001] $data]
     set list [split $newdata \001]



Sat, 30 Sep 2006 06:40:37 GMT  
 splitting strings with more than one splitting char

Quote:

> I'm trying to understand why when I try to split a string with split
> characters <>, the interpreter does 2 passes; first looking for "<"
> and then for ">".  I would've thought the it would do just one pass
> looking for "<>" as a whole, but to me the manual isn't very clear on
> this.  Is there a way to accomplish what I'm looking for?

instead of guessing what it'll do, try the docs

http://www.tcl.tk/man/tcl8.4/TclCmd/split.htm

it is pretty explicit that the second arg is a set of characters
not a string.

you want splitx from tcllib

package require textutil
textutil::splitx $str "<>"

Bruce



Sat, 30 Sep 2006 06:39:37 GMT  
 splitting strings with more than one splitting char

Quote:

> I'm trying to understand why when I try to split a string with split
> characters <>, the interpreter does 2 passes; first looking for "<"
> and then for ">".  I would've thought the it would do just one pass
> looking for "<>" as a whole,

You are confusing multiple passes with your question about how
split operates.  The answer is that split is defined to operate
this way, so that is why it does so :-).  It is useful to split
at any character in a set, and that is what split does.  I'd
agree that the syntax of "any character in this string" is odd,
and would be clearer with a list of characters (or an argument
list of characters) like  [split $string "<" ">"], even though
it is more typing.  Then it would also be trivial to extend the
definition to splitting at arbitrary strings.  But that would
have to be with a new command, because [split] already has its
own entrenched syntax.

Maybe you are looking for regexp -all -inline  to grab all the
items:

regexp -all -inline (.*?)<> "$string<>" --

Or use

split [string map {"<>" \u0001} $string] \u0001

(but you probably want to split in a way that eliminates
white-space).

Quote:
> But there must be a better way.  Besides, the original string might
> contain a lone ">" that should not be split

Yeah, sounds like you have it covered.  Except for choosing a more
obscure character.




Sat, 30 Sep 2006 07:24:41 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. string.split and re.split inconsistency

2. re.split vs. string.split

3. split string by chars

4. String#split converts string args to regexes -- ?

5. splitting a string into 2 new strings

6. split Char and convert to ascii

7. split with multiple char?

8. Can one split a DATE field?

9. Split lazy stream in two with only one pass

10. one liner? --- split file at empty line

11. Split the resulting string.

12. Splitting strings

 

 
Powered by phpBB® Forum Software