Splitting a string every 'n'
Author |
Message |
Simon.Fos.. #1 / 14
|
Splitting a string every 'n'
What is the idiomatic way to split a string into a list containing 'n' character substrings? I normally do something like: while strng: substring = strng[:n] strng = strng[n:] <process substring> But the performance of this is hopeless for very long strings! Presumable because there's too much list reallocation? Can't python just optimise this by shuffling the start of the list forward? Any better ideas, short of manually indexing through? Is there something like: for substring in strng.nsplit(): <process substring>
|
Sat, 25 Dec 2004 20:50:19 GMT |
|
|
Harvey Thoma #2 / 14
|
Splitting a string every 'n'
Quote:
> What is the idiomatic way to split a string into a list > containing 'n' character substrings? I normally do > something like: > while strng: > substring = strng[:n] > strng = strng[n:] > <process substring> > But the performance of this is hopeless for very long strings! > Presumable because there's too much list reallocation? Can't Python > just optimise this by shuffling the start of the list forward? > Any better ideas, short of manually indexing through? Is there > something like: > for substring in strng.nsplit(): > <process substring>
How about: import re rex = re.compile('....', re.DOTALL) for substring in rex.findall(string): <process substring> HTH Harvey Thomas _____________________________________________________________________ This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
|
Sat, 25 Dec 2004 21:04:34 GMT |
|
|
Shagshag1 #3 / 14
|
Splitting a string every 'n'
Quote: > What is the idiomatic way to split a string into a list > containing 'n' character substrings? I normally do > something like: > while strng: > substring = strng[:n] > strng = strng[n:] > <process substring> > But the performance of this is hopeless for very long strings! > Presumable because there's too much list reallocation? Can't Python > just optimise this by shuffling the start of the list forward? > Any better ideas, short of manually indexing through? Is there > something like: > for substring in strng.nsplit(): > <process substring>
(i'm replying but as i'm still a newbie i don't know if it's a good idea - but just trying and hope that gurus will correct) x = [] i0 = 0 for i in range(n,len(strng) + n,n): x.append(strng[i0:i]) i0 = i map(<processing substring function>, x) s13.
|
Sat, 25 Dec 2004 21:46:17 GMT |
|
|
Alex Martell #4 / 14
|
Splitting a string every 'n'
Quote:
> What is the idiomatic way to split a string into a list > containing 'n' character substrings? I normally do
I'm not sure there is just one. I suspect what _feels_ idiomatic to you in this respect depends on where you're coming from -0- just saw Harvey Thomas post a re-based solution that is surely quite correct (and perhaps may even have good performance!) but would just never occur to me first thing... Quote: > something like: > while strng: > substring = strng[:n] > strng = strng[n:] > <process substring> > But the performance of this is hopeless for very long strings!
Definitely! Quote: > Presumable because there's too much list reallocation? Can't Python
Yep. Quote: > just optimise this by shuffling the start of the list forward?
Not without a lot of trouble that would definitely complicate the interpreter's code and quite possibly deteriorate performance for all normal cases that can't easily benefit from such "sharing" of pieces of one string. Quote: > Any better ideas, short of manually indexing through? Is there
What's wrong with "manually indexing through"? I assume you mean: for i in xrange(0, len(strng), n): substring = strng[i:i+n] process(substring) and I don't see anything wrong with it -- though I might shrink it a bit down to for i in xrange(0, len(strng), n): process(strng[i:i+n]) that's basically the same idea. I'm honestly having a hard time seeing anything wrong with this solution, as presumably needed to come up with anything BETTER. DIFFERENT is easy, e.g., on 2.3 or 2.2 + from __future__ import generators, why not a generator: def slicer(strng, n): for i in xrange(0, len(strng), n): yield strng[i:i+n] and then for substring in slicer(strng, n): process(substring) but that's really the same code again with a false moustache... Alex
|
Sat, 25 Dec 2004 22:09:39 GMT |
|
|
Terry Reed #5 / 14
|
Splitting a string every 'n'
Quote: > What is the idiomatic way to split a string into a list > containing 'n' character substrings? I normally do > something like: > while strng: > substring = strng[:n] > strng = strng[n:] > <process substring>
You are asking two different questions in you text and code: 1. How generate explicit list of successie length n substrings (slices)? 2. How process successie length n substrings (slices), (which can then be tossed)? Second is easier than first: both require attention to possibility of remainder of length less than n. ... Quote: > Any better ideas, short of manually indexing through?
What, pray tell, is wrong with doing the simple obvious thing that you can program correctly in a minute or two? Quote: > Is there something like: > for substring in strng.nsplit(): > <process substring>
Note that this says that (2) rather that (1) above is your question. For 2.2+, write a generator that manually indexes thru sequence, returning successive slices. A second param could determine whether a short tail is returned or suppressed. Terry J. Reedy
|
Sat, 25 Dec 2004 22:15:49 GMT |
|
|
Andrew Koeni #6 / 14
|
Splitting a string every 'n'
Simon> What is the idiomatic way to split a string into a list Simon> containing 'n' character substrings? I normally do Simon> something like: Simon> while strng: Simon> substring = strng[:n] Simon> strng = strng[n:] Simon> <process substring> How about this? for start in range(0, len(strng), n): substring = strng[start:start+n] <process substring> --
|
Sat, 25 Dec 2004 22:35:44 GMT |
|
|
William Par #7 / 14
|
Splitting a string every 'n'
Quote:
> What is the idiomatic way to split a string into a list > containing 'n' character substrings? I normally do > something like: > while strng: > substring = strng[:n] > strng = strng[n:] > <process substring> > But the performance of this is hopeless for very long strings! > Presumable because there's too much list reallocation? Can't Python > just optimise this by shuffling the start of the list forward? > Any better ideas, short of manually indexing through? Is there > something like: > for substring in strng.nsplit(): > <process substring>
No, you pretty much have to slice out the range you want, ie. substring = string[i:i+n] --
8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin
|
Sat, 25 Dec 2004 23:48:17 GMT |
|
|
Mark McEaher #8 / 14
|
Splitting a string every 'n'
Quote: > > But the performance of this is hopeless for very long strings! > > Presumable because there's too much list reallocation? Can't Python > > just optimise this by shuffling the start of the list forward?
Using generators here compares favorably with a smart while loop. They have the advantage of separating the iteration from the processing, so you can actually reuse gen_substring since it allows you to iterate over the n-length substrings: #! /usr/bin/env python from __future__ import generators from time import clock def gen_substring(s, n): i = 0 end = len(s) while i <= end: j = i + n yield s[i:j] i = j def do_gen(s, n): for sub in gen_substring(s, n): sub.upper() def do_while_simple(s, n): while s: sub = s[:n] s = s[n:] sub.upper() def do_while_smarter(s, n): i = 0 end = len(s) while i <= end: j = i + n sub = s[i:j] i = j sub.upper() def time_it(f, *args, **kwargs): start = clock() f(*args, **kwargs) end = clock() print "%s: %1.3f" % (f.func_name, end - start) n = 4 size = 100000 s = 'a' * size time_it(do_gen, s, n) time_it(do_while_simple, s, n) time_it(do_while_smarter, s, n) -
|
Sun, 26 Dec 2004 00:24:26 GMT |
|
|
Huaiyu Z #9 / 14
|
Splitting a string every 'n'
Quote:
>import re >rex = re.compile('....', re.DOTALL)
To work with any int n, change to one of these rex = re.compile('.{,%s}'%n, re.DOTALL) # keeps remainder segment rex = re.compile('.{%s}'%n, re.DOTALL) # discards remainder segment Quote: >for substring in rex.findall(string): > <process substring>
Huaiyu
|
Sun, 26 Dec 2004 01:50:54 GMT |
|
|
Rich Harkin #10 / 14
|
Splitting a string every 'n'
Quote:
> > What is the idiomatic way to split a string into a list > > containing 'n' character substrings? I normally do > > something like: > > while strng: > > substring = strng[:n] > > strng = strng[n:] > > <process substring> > > But the performance of this is hopeless for very long strings! > > Presumable because there's too much list reallocation? Can't Python > > just optimise this by shuffling the start of the list forward? > > Any better ideas, short of manually indexing through? Is there > > something like: > > for substring in strng.nsplit(): > > <process substring>
Using python2: [s[i:i+n] for i in range(0,len(s),n)] where: s - is the string to split n - is the number of characters to break at i - is some throwaway variable (previous value of i *not* protected) Rich
|
Sun, 26 Dec 2004 02:21:54 GMT |
|
|
Skip Montanar #11 / 14
|
Splitting a string every 'n'
Huaiyu> To work with any int n, change to one of these Huaiyu> rex = re.compile('.{,%s}'%n, re.DOTALL) # keeps remainder segment I think you want {1,%s}. Note the spurious empty string at the end if at least one character isn't required: >>> import re >>> rex = re.compile(r"(.{,4})") >>> re.findall(rex, "abcd") ['abcd', ''] >>> rex = re.compile(r"(.{1,4})") >>> re.findall(rex, "abcd") ['abcd'] >>> rex = re.compile(r"(.{,4})") >>> re.findall(rex, "abcde") ['abcd', 'e', ''] >>> rex = re.compile(r"(.{1,4})") >>> re.findall(rex, "abcde") ['abcd', 'e'] -- Skip Montanaro
consulting: http://manatee.mojam.com/~skip/resume.html
|
Sun, 26 Dec 2004 03:10:20 GMT |
|
|
Brian McErle #12 / 14
|
Splitting a string every 'n'
Quote:
> What is the idiomatic way to split a string into a list > containing 'n' character substrings? I normally do > something like: > while strng: > substring = strng[:n] > strng = strng[n:] > <process substring> > But the performance of this is hopeless for very long strings! > Presumable because there's too much list reallocation? Can't Python > just optimise this by shuffling the start of the list forward? > Any better ideas, short of manually indexing through? Is there > something like: > for substring in strng.nsplit(): > <process substring>
I have a handy class I use for things like this: class Group: def __init__(self, l, size): self.size=size self.l = l def __getitem__(self, group): idx = group * self.size if idx > len(self.l): raise IndexError("Out of range") return self.l[idx:idx+self.size] I use it mainly for grouping things like: for x,y in Group([1,2,3,4,5,6,7,8,...],2): process_coords(x,y) but its also applicable to your problem, and works neatly with strings. try: for substring in Group(string, n): <process substring> Don't you just love python's polymorphism! You don't state what you want to do if the string isn't a multiple of N characters. This version includes the shorter string at the end. Brian.
|
Sun, 26 Dec 2004 03:59:48 GMT |
|
|
Huaiyu Z #13 / 14
|
Splitting a string every 'n'
Quote:
> Huaiyu> To work with any int n, change to one of these > Huaiyu> rex = re.compile('.{,%s}'%n, re.DOTALL) # keeps remainder segment >I think you want {1,%s}. Note the spurious empty string at the end if at >least one character isn't required:
Yes, you're right. Teaches me to test before post. Huaiyu
|
Sun, 26 Dec 2004 05:45:45 GMT |
|
|
Simon Fost #14 / 14
|
Splitting a string every 'n'
Quote:
>> What is the idiomatic way to split a string into a list >> containing 'n' character substrings? I normally do >> something like: >> while strng: >> substring = strng[:n] >> strng = strng[n:] >> <process substring> >> But the performance of this is hopeless for very long strings! >> Presumable because there's too much list reallocation? Can't Python >> just optimise this by shuffling the start of the list forward? >> Any better ideas, short of manually indexing through? Is there >> something like: >> for substring in strng.nsplit(): >> <process substring> >I have a handy class I use for things like this: >class Group: > def __init__(self, l, size): > self.size=size > self.l = l > def __getitem__(self, group): > idx = group * self.size > if idx > len(self.l): > raise IndexError("Out of range") > return self.l[idx:idx+self.size] >I use it mainly for grouping things like: >for x,y in Group([1,2,3,4,5,6,7,8,...],2): > process_coords(x,y) >but its also applicable to your problem, and works neatly with >strings. >try: >for substring in Group(string, n): > <process substring> >Don't you just love python's polymorphism! >You don't state what you want to do if the string isn't a multiple of >N characters. This version includes the shorter string at the end. >Brian.
That's what I wanted! -- Simon Foster Cheltenham England
|
Mon, 27 Dec 2004 04:32:44 GMT |
|
|
|