Split function in Perl 
Author Message
 Split function in Perl

Can anyone help me with some Perl scripting to split a space delimited
text file into a CSV. ie:

aaaa             bbb  ccc
aaaa             bbb  ccc
aaaa             bbb  ccc

into

aaaa, bbb, ccc
aaaa, bbb, ccc
aaaa, bbb, ccc

Thanks heaps!!!



Mon, 04 Oct 2004 08:47:01 GMT  
 Split function in Perl

Quote:

> Can anyone help me with some Perl scripting to split a space delimited
> text file into a CSV. ie:

> aaaa             bbb  ccc
> aaaa             bbb  ccc
> aaaa             bbb  ccc

> into

> aaaa, bbb, ccc
> aaaa, bbb, ccc
> aaaa, bbb, ccc

s/\s+/, /g

Then RTM.

D.



Mon, 04 Oct 2004 08:48:51 GMT  
 Split function in Perl

Quote:

>Can anyone help me with some Perl scripting to split a space delimited
>text file into a CSV. ie:

>aaaa             bbb  ccc
>aaaa             bbb  ccc
>aaaa             bbb  ccc

>into

>aaaa, bbb, ccc
>aaaa, bbb, ccc
>aaaa, bbb, ccc

You already seem to know that you want split(). Well:

        while(<>) {


        }      

It wouldn't be that bad an idea to specify the -l command line switch,
so that newlines will be stripped on read, and appended on print.

You can also set $, to ', ', and simply forget about join():

        local $, = ', ';
        while(<>) {


        }      

Or shorter still:

        local $, = ', ';
        while(<>) {
            print split / +/;
        }      

--
        Bart.



Mon, 04 Oct 2004 11:37:12 GMT  
 Split function in Perl
Bart,

Thanks!! that worked really well. Only, one more challenge presents
itself. The output of your method leaves an extra comma:

aaa, bbb, ccc,
aaa, bbb, ccc,
aaa, bbb, ccc,

What I need is:

aaa, bbb, ccc
aaa, bbb, ccc
aaa, bbb, ccc

Any ideas???



Tue, 05 Oct 2004 02:49:41 GMT  
 Split function in Perl
On 18 Apr 2002 18:49:41 -0700,

[Please, in the future, leave some quoted context in your message. It
makes it much easier for people to understand what you're talking
about.]

Quote:
> Bart,

> Thanks!! that worked really well. Only, one more challenge presents
> itself. The output of your method leaves an extra comma:

> aaa, bbb, ccc,
> aaa, bbb, ccc,
> aaa, bbb, ccc,

It works fine for me. You appear not to have told us everything.  I
suspect that your data has some trailing spaces, which you didn't tell
us about, since the example data in your original post didn't include
any trailing spaces.

BTW, you do realise that int he example code Bart gave you, the last
field will contain a "\n"? Use chomp if you don't want that.

Martien
--
                        |
Martien Verbruggen      | True seekers can always find something to
Trading Post Australia  | believe in.
                        |



Tue, 05 Oct 2004 04:54:14 GMT  
 Split function in Perl

Quote:

>> Thanks!! that worked really well. Only, one more challenge presents
>> itself. The output of your method leaves an extra comma:

>> aaa, bbb, ccc,
>BTW, you do realise that int he example code Bart gave you, the last
>field will contain a "\n"? Use chomp if you don't want that.

That's what the -l option is for. It chomps every line (=remove the
newline), and every print() appends one. For the newbe: you use it like:

        #!/usr/local/bin/perl -l
        while(<>) {


        }      

The -n option lets you leave of the "while" loop (it's implied):

        #!/usr/local/bin/perl -ln


And once the newline is removed, any trailing spaces will be ignored by
split(). Or, in other words: any empty fields at the end will be
dropped.

Abolute minimum:

        #!/usr/local/bin/perl -ln
        $, = ", ";
        # better:  BEGIN { $, = ", "}
        print split / +/;

--
        Bart.



Tue, 05 Oct 2004 08:19:59 GMT  
 Split function in Perl

Quote:

>>> Thanks!! that worked really well. Only, one more challenge presents
>>> itself. The output of your method leaves an extra comma:

>>> aaa, bbb, ccc,

>>BTW, you do realise that int he example code Bart gave you, the last
>>field will contain a "\n"? Use chomp if you don't want that.

> That's what the -l option is for. It chomps every line (=remove the
> newline), and every print() appends one. For the newbe: you use it like:

>    #!/usr/local/bin/perl -l
>    while(<>) {


>    }      

I'm sure you know this, but just to be clear, the -l flag only
chomp()s when used together with -n or -p.

(So this example would produce double-spaced output.)

--
Steve



Tue, 05 Oct 2004 13:51:55 GMT  
 Split function in Perl
On Fri, 19 Apr 2002 07:19:59 GMT,

Quote:

>>> Thanks!! that worked really well. Only, one more challenge presents
>>> itself. The output of your method leaves an extra comma:

>>> aaa, bbb, ccc,

>>BTW, you do realise that int he example code Bart gave you, the last
>>field will contain a "\n"? Use chomp if you don't want that.

> That's what the -l option is for. It chomps every line (=remove the
> newline),

Only when -n or -p are present...

Quote:
>           and every print() appends one. For the newbe: you use it like:

That's true.

Quote:

>    #!/usr/local/bin/perl -l
>    while(<>) {

            chomp;

Quote:


>    }

is equivalent to

Quote:
>    #!/usr/local/bin/perl -ln


> And once the newline is removed, any trailing spaces will be ignored by
> split(). Or, in other words: any empty fields at the end will be
> dropped.

indeed.

And that's why I warned the OP about the fact that he probably did have
trailing spaces.

Quote:
> Abolute minimum:

>    #!/usr/local/bin/perl -ln
>    $, = ", ";
>    # better:  BEGIN { $, = ", "}
>    print split / +/;

If we're looking for minimal, then if you just use split, without
specifying anything (or by specifying ' ' as the first argument), it'll
split on any sequence of whitespace.  Assuming that the OP doesn't have
whitespace characters in his data, apart from the separators, that would
solve all problems in one go.

Martien
--
                        |
Martien Verbruggen      | Since light travels faster than sound, isn't
                        | that why some people appear bright until you
                        | hear them speak?



Tue, 05 Oct 2004 14:34:15 GMT  
 Split function in Perl

Quote:

> Can anyone help me with some Perl scripting to split a space delimited
> text file into a CSV. ie:

> aaaa             bbb  ccc
> aaaa             bbb  ccc
> aaaa             bbb  ccc

> into

> aaaa, bbb, ccc
> aaaa, bbb, ccc
> aaaa, bbb, ccc

> Thanks heaps!!!

perl -nli.bak -e 'BEGIN { $, = ", " } print split;' ...files...

--
print reverse( ",rekcah", " lreP", " rehtona", " tsuJ" )."\n";



Fri, 08 Oct 2004 00:06:48 GMT  
 Split function in Perl

Quote:

>I'm sure you know this, but just to be clear, the -l flag only
>chomp()s when used together with -n or -p.

Oh, blimey. No, I had overlooked that. Thanks for catching it.

--
        Bart.



Fri, 08 Oct 2004 00:56:12 GMT  
 Split function in Perl

Quote:

> On 18 Apr 2002 18:49:41 -0700,

> [Please, in the future, leave some quoted context in your message. It
> makes it much easier for people to understand what you're talking
> about.]

> > Bart,

> > Thanks!! that worked really well. Only, one more challenge presents
> > itself. The output of your method leaves an extra comma:

> > aaa, bbb, ccc,
> > aaa, bbb, ccc,
> > aaa, bbb, ccc,

> It works fine for me. You appear not to have told us everything.  I
> suspect that your data has some trailing spaces, which you didn't tell
> us about, since the example data in your original post didn't include
> any trailing spaces.

> BTW, you do realise that int he example code Bart gave you, the last
> field will contain a "\n"? Use chomp if you don't want that.

> Martien

I really appreciate the great response I have got from everyone - very
helpful for a newbie like myself.

You are absolutely correct - my data does have a good few trailing
spaces after the last field - something I can not easily control. So I
need a method that ignores trailing spaces after the last field. I am
using the following:

    #!/usr/local/bin/perl -ln

    local $, = ', ';
    while (<>) {
      print split / +/;
    }

I have tried various combinations of -l/n/p, but none get rid of the
extra comma. I think -lp gave me double spacing, which I do not want.

Any suggestions??

If anyone has the patience, I would also really appreciate a
description of what the $, = ', ' means and what the / +/ means.

Thankyou all again.



Fri, 08 Oct 2004 03:10:35 GMT  
 Split function in Perl
[Bart's code was:
        local $, = ', ';
        while(<>) {


        }      
]

Quote:
> Bart,

> Thanks!! that worked really well. Only, one more challenge presents
> itself. The output of your method leaves an extra comma:

> aaa, bbb, ccc,
> aaa, bbb, ccc,
> aaa, bbb, ccc,



array into "aaa, bbb, ccc, \n"

Quote:
> What I need is:

> aaa, bbb, ccc
> aaa, bbb, ccc
> aaa, bbb, ccc

Split on /\s+/ or on ' ' (which is magical to split, and acts sorta like
split /\s+/), or without any arguments.

local $, = ', ';
while( <> ) {
   print split;

Quote:
}

__END__
Or:
perl -nle 'BEGIN { $, = ", " } print split;'
Or:
#!/usr/bin/perl -nl
BEGIN { $, = ", " }
print split;
__END__

--
print reverse( ",rekcah", " lreP", " rehtona", " tsuJ" )."\n";



Fri, 08 Oct 2004 03:36:54 GMT  
 Split function in Perl

Quote:

> I am
>using the following:

>    #!/usr/local/bin/perl -ln

>    local $, = ', ';
>    while (<>) {
>      print split / +/;
>    }

If you're using -n, then don't use the explicit while loop:

    #!/usr/local/bin/perl -ln
    $, = ', ';
    print split / +/;

and there's no need to be polite (WRT to $,) if you're going to use this
command line  option anyway.

Otherwise, do a chomp() on each input line and either use -l, or set $,
to "\n" yourself.

        #!/user/local/bin/perl
        $\ = "\n"; $, = ", ";
        while(<>) {
            chomp;
            print split / +/;
        }

With the newline gone, split() will remove what it thinks of as empty
trailing fields.

Oh, and if these are fixed length records, you might reconsider using
unpack() with the "A123" style of template.

        print unpack 'A17A5A*', $_;

--
        Bart.



Fri, 08 Oct 2004 03:48:05 GMT  
 Split function in Perl
On 21 Apr 2002 19:10:35 -0700,

Quote:

>> On 18 Apr 2002 18:49:41 -0700,

>> [Please, in the future, leave some quoted context in your message. It
>> makes it much easier for people to understand what you're talking
>> about.]

>> > Bart,

>> > Thanks!! that worked really well. Only, one more challenge presents
>> > itself. The output of your method leaves an extra comma:

>> > aaa, bbb, ccc,
>> > aaa, bbb, ccc,
>> > aaa, bbb, ccc,

>> It works fine for me. You appear not to have told us everything.  I
>> suspect that your data has some trailing spaces, which you didn't tell
>> us about, since the example data in your original post didn't include
>> any trailing spaces.

>> BTW, you do realise that int he example code Bart gave you, the last
>> field will contain a "\n"? Use chomp if you don't want that.

> You are absolutely correct - my data does have a good few trailing
> spaces after the last field - something I can not easily control. So I
> need a method that ignores trailing spaces after the last field. I am
> using the following:

>     #!/usr/local/bin/perl -ln

>     local $, = ', ';
>     while (<>) {
>       print split / +/;
>     }

> I have tried various combinations of -l/n/p, but none get rid of the
> extra comma. I think -lp gave me double spacing, which I do not want.

You shouldn't be using -n (or -p) together with a loop like that. They
are supposed to replace the while(<>) loop. See the perlrun
documentation to find out what. You should also read the split() entry
in the perlfunc documentation, if you haven't done so already.

To solve your problem:

If you really want to split on any sequence of whitespace, instead of
a sequence of spaces, you can use some magical properties of split:

#!/usr/local/bin/perl -w
while (<>)
{
    print join ',', split;

Quote:
}

(see later for $,. I don't like using it very much, so you won't see
it in the code here.)

split() without any arguments splits $_ on any sequence of whitespace,
_and_ it removes trailing and leading empty fields, which includes the
newline bit. The equivalent with -n would be:

#!/usr/local/bin/perl -wn
print join ',', split;

since -n introduces an implicit while (<>) {} loop around your code.

BTW, you can get the same effect when splitting another variable than
$_ by specifying ' ' (or " ", but _not_ / /) as the first argument to
split:

    split ' ', $var;

If you don't want to split on any sequence of whitespace, but on a
sequence of spaces specifically, you could do:

#!/usr/local/bin/perl -wnl
print join ',', split / +/;

Which gets rid of any trailing empty fields, but not of any leading
ones (note that this only works because -n with -l strips of the
trailing newline). If you also need to get rid of the leading empty
fields, you could specifically strip empty fields alltogether:

#!/usr/local/bin/perl -wnl
print join ',', grep { $_ ne "" } split / +/;

(With this particular split pattern, it isn't possible to get empty
fields anywhere but at the start and end.)

If you don't want the implicit loop, then you lose -l's magical
"implicit chomp" property, wich it only does for -n and -p, so you
need to do it yourself:

#!/usr/local/bin/perl -wl
while (<>)
{
    chomp;
    print join ',', split / +/;

Quote:
}

Note that -l is still there to provide an automatic addition of a
newline ($\ really) for prints.

All of this is covered in the split() entry in perlfunc, and the
previous posts in this thread. You just need to sit down, read a bit,
and understanding will follow. A lot of this behaviour is just Perl
arcana that helps people write more concise code, and emulate awk
better. Without any use of -n, -l, and magical behaviour of split, and
with the addition of an explicit variable, and the strict pragma:

#!/usr/local/bin/perl -w
use strict;
while (my $line = <>)
{
    chomp $line;
    print join ',', split / +/, $line;
    print "\n";

Quote:
}

which shows explicitly all the bits and pieces that weregoing on in
the previous (except split's magical " " pattern handling).

Quote:
> If anyone has the patience, I would also really appreciate a
> description of what the $, = ', ' means and what the / +/ means.

The perlvar documentation documents special variables, and covers $,.
It is the output field separator. There is also $\, which is the
output record separator, and $/, which is the input record separator.
You probably want to have a little read of all three entries in
perlvar.

The first argument to split is a regular expression, and they are
covered in the perlre documentation.  This one means any sequence of 1
or more spaces.

Martien
--
                        |
Martien Verbruggen      | You can't have everything, where would you
Trading Post Australia  | put it?
                        |



Fri, 08 Oct 2004 03:59:45 GMT  
 Split function in Perl

Quote:

> On 21 Apr 2002 19:10:35 -0700,


> >> On 18 Apr 2002 18:49:41 -0700,

> >> [Please, in the future, leave some quoted context in your message. It
> >> makes it much easier for people to understand what you're talking
> >> about.]

> >> > Bart,

> >> > Thanks!! that worked really well. Only, one more challenge presents
> >> > itself. The output of your method leaves an extra comma:

> >> > aaa, bbb, ccc,
> >> > aaa, bbb, ccc,
> >> > aaa, bbb, ccc,

> >> It works fine for me. You appear not to have told us everything.  I
> >> suspect that your data has some trailing spaces, which you didn't tell
> >> us about, since the example data in your original post didn't include
> >> any trailing spaces.

> >> BTW, you do realise that int he example code Bart gave you, the last
> >> field will contain a "\n"? Use chomp if you don't want that.

> > You are absolutely correct - my data does have a good few trailing
> > spaces after the last field - something I can not easily control. So I
> > need a method that ignores trailing spaces after the last field. I am
> > using the following:

> >     #!/usr/local/bin/perl -ln

> >     local $, = ', ';
> >     while (<>) {
> >       print split / +/;
> >     }

> > I have tried various combinations of -l/n/p, but none get rid of the
> > extra comma. I think -lp gave me double spacing, which I do not want.

> You shouldn't be using -n (or -p) together with a loop like that. They
> are supposed to replace the while(<>) loop. See the perlrun
> documentation to find out what. You should also read the split() entry
> in the perlfunc documentation, if you haven't done so already.

> To solve your problem:

etc...

Thanks once again, I am getting a better picture of what is going on.

This gets stranger. I tried all the methods suggested - none worked. I
tried a different data file - all worked first go.

Then I discovered that if I simply went and opened my data file in vi,
changed NOTHING, but simply did a :w to write "changes" then quit, the
perl script worked fine!!

What did vi do to my file that would make it work?? What does this
mean about the condition of my file?? Is there an option that I can
use in perl to overcome this problem??

Thanks.



Sat, 09 Oct 2004 05:10:27 GMT  
 
 [ 17 post ]  Go to page: [1] [2]

 Relevant Pages 

1. How to use split function to split on a backslash

2. Problem with join function (and split function)

3. the perl function split

4. (Beginner) Perl split function...

5. Perl... the split function

6. using split function twice on same line

7. Split function

8. Using SPLIT function with a Period

9. Using a period as a delimiter in the split() function

10. split function that handles quoting..

11. perlcc does not compile when split to array function is used

12. Problem in 'split' function

 

 
Powered by phpBB® Forum Software