Splitting lines in a file 
Author Message
 Splitting lines in a file

Quick query, hopefully,

How come when I split a file on arbitrary whitespace I get one fewer
lines then when I split on end of line?  In the case when I split on
the \n the last line has zero length.

C:\hexbin>python
python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

Quote:
>>> f = file( 'siu_build.hex' ).read()
>>> len( f.split() )
1254
>>> len( f.split( '\n' ))
1255
>>> lines = f.split( '\n' )
>>> len( lines[-1] )
0



Fri, 24 Dec 2004 05:54:39 GMT  
 Splitting lines in a file


Quote:
> How come when I split a file on arbitrary whitespace I get one fewer
> lines then when I split on end of line?  In the case when I split on
> the \n the last line has zero length.

'cause split() drops the empty part at the end:
Quote:
>>> '1\n2\n3\n'.split()
['1', '2', '3']
>>> '1\n2\n3\n'.split('\n')

['1', '2', '3', '']

split() works on different whitespace characters at the same time and
removes them more agressively (see below). when you specify a split
character like split('\n') only that character matters.

split() is more agressive in removing empty parts (note the two spaces
between a and b):

Quote:
>>> 'a  b'.split()
['a', 'b']
>>> 'a  b'.split(' ')

['a', '', 'b']

chris

--



Fri, 24 Dec 2004 06:07:47 GMT  
 Splitting lines in a file

Quote:
>'cause split() drops the empty part at the end:
>>>> '1\n2\n3\n'.split()
>['1', '2', '3']
>>>> '1\n2\n3\n'.split('\n')
>['1', '2', '3', '']

I see it, but I don't see why?  What is the explanation for this?  It
seems at odds with the documentation.


Fri, 24 Dec 2004 06:19:43 GMT  
 Splitting lines in a file

Quote:


>>'cause split() drops the empty part at the end:
>>>>> '1\n2\n3\n'.split()
>>['1', '2', '3']
>>>>> '1\n2\n3\n'.split('\n')
>>['1', '2', '3', '']

> I see it, but I don't see why?  What is the explanation for this?  It
> seems at odds with the documentation.

'1\n2\n3\n'.split() works as follows:
    1\n2\n3\n -> 1\n2\n3 -> 1 \n 2 \n 3 -> 1 2 3

'1\n2\n3\n'.split('\n') works as follows:
    1\n2\n3\n -> 1 \n 2 \n 3 \n -> 1 2 3 ''
where the last '\n' separates '3' and '' (null).

--

8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin



Fri, 24 Dec 2004 06:30:55 GMT  
 Splitting lines in a file
On 7 Jul 2002 22:30:55 GMT, William Park

Quote:

>'1\n2\n3\n'.split() works as follows:
>    1\n2\n3\n -> 1\n2\n3 -> 1 \n 2 \n 3 -> 1 2 3

>'1\n2\n3\n'.split('\n') works as follows:
>    1\n2\n3\n -> 1 \n 2 \n 3 \n -> 1 2 3 ''
>where the last '\n' separates '3' and '' (null).

>--

>8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin

Still seems odd!
--
Simon Foster
Cheltenham
England


Fri, 24 Dec 2004 06:37:37 GMT  
 Splitting lines in a file

Quote:

> On 7 Jul 2002 22:30:55 GMT, William Park

>>'1\n2\n3\n'.split() works as follows:
>>    1\n2\n3\n -> 1\n2\n3 -> 1 \n 2 \n 3 -> 1 2 3

>>'1\n2\n3\n'.split('\n') works as follows:
>>    1\n2\n3\n -> 1 \n 2 \n 3 \n -> 1 2 3 ''
>>where the last '\n' separates '3' and '' (null).

> Still seems odd!

Not if you try to go backwards:

    1 2 3 -> 1 \n 2 \n 3 -> 1\n2\n3

    1 2 3 '' -> 1 \n 2 \n 3 \n '' -> 1\n2\n3\n

--

8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin



Fri, 24 Dec 2004 07:14:30 GMT  
 Splitting lines in a file

| I see it, but I don't see why?  What is the explanation for this?  It
| seems at odds with the documentation.

You are right.  The doc indicates the following should produce the same
output:

 >>> print " hello there ".split()
['hello', 'there']
 >>> print " hello there ".split(" ")
['', 'hello', 'there', '']

It looks like string.split() with no seperator does a string.strip()
first.

If that's not what you want, there's also re.split (with the string and
separator args in the reverse order from string.split).  To get a true
split on whitespace:

 >>> import re
 >>> re.split(r"\s+", " hello \n\n\t there ")
['', 'hello', 'there', '']

--Bryan



Fri, 24 Dec 2004 08:08:44 GMT  
 Splitting lines in a file
On 7 Jul 2002 23:14:30 GMT, William Park

Quote:


>> On 7 Jul 2002 22:30:55 GMT, William Park

>>>'1\n2\n3\n'.split() works as follows:
>>>    1\n2\n3\n -> 1\n2\n3 -> 1 \n 2 \n 3 -> 1 2 3

>>>'1\n2\n3\n'.split('\n') works as follows:
>>>    1\n2\n3\n -> 1 \n 2 \n 3 \n -> 1 2 3 ''
>>>where the last '\n' separates '3' and '' (null).

>> Still seems odd!

>Not if you try to go backwards:

>    1 2 3 -> 1 \n 2 \n 3 -> 1\n2\n3

>    1 2 3 '' -> 1 \n 2 \n 3 \n '' -> 1\n2\n3\n

>--

>8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin

Don't understand what you mean here.  You started from the same thing
and ended up in two different places.  Am I missing something?

Surely whether I split on newline or arbitrary whitespace should not
matter if the arbitrary whitespace consists _only_ of newlines?



Sat, 25 Dec 2004 07:00:30 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Need to split file with two reports per line

2. one liner? --- split file at empty line

3. Parsing files and lines using split

4. how to split each new line into a list, considering quoted new line characters

5. split a file into multi-files ?

6. Splitting Files Into smaller files

7. Read FAX files (Tiff,G3,g4..), for split in 1 page files

8. file split, file join problems

9. Help with file join [file split]

10. How to split csv line?

11. text parsing by splitting a line

12. Splitting long lines before AWKing?

 

 
Powered by phpBB® Forum Software