Sed/Awk help needed 
Author Message
 Sed/Awk help needed

I've purchased the book "sed & awk" put out by O'Reilly.  I'm on chapter
4 but I'm still pretty confused.  Although I will finish reading the
book I need to achieve a specific goal now.  So I need the help of this
group.

Below is a sample of the file:

ALLEN TALENT AGENCY
11755 WILSHIRE BLVD #1750
LOS ANGELES, CA 90025-1523
(213) 896-9372

[*] ALL STAR TALENT AGENCY
7834 ALABAMA AVE
CANOGA PARK, CA 91304-4905
(818) 346-4313

[**,P] ALPERN GROUP, THE
4400 COLDWATER CANYON AVE #125
STUDIO CITY, CA 91604
(818) 752-1877

[L] ANGEL CITY TALENT
1680 VINE ST #716
LOS ANGELES, CA 90028
(213) 463-1680

The first thing I want to do is to change the way the address's are
formatted so that I can import them into a data base.  But there are
many problems to be solved first, which I list below in what I believe
to be a logical order.

1) Change the general format of the file.

This:

ALLEN TALENT AGENCY
11755 WILSHIRE BLVD #1750
LOS ANGELES, CA 90025-1523
(213) 896-9372

[*] ALL STAR TALENT AGENCY
7834 ALABAMA AVE
CANOGA PARK, CA 91304-4905
(818) 346-4313

Becomes:

ALLEN TALENT AGENCY, 11755 WILSHIRE BLVD #1750, LOS ANGELES, CA
90025-1523, (213) 896-9372
[*] ALL STAR TALENT AGENCY, 7834 ALABAMA AVE, CANOGA PARK, CA
91304-4905, (818) 346-4313

Note: Here the files are being broken at the end of a line.  I don't
want that line break in the real file.  Basically each line should be a
comma delimited address.

2) All the fields are on one line now, each record is a line.  The CA
must be separated from the zip code.  All states in this file are CA.

3) If you look at the sample file you will see that some files have a
code at the start of the address contained in [].  On the files that
don't have that code I want to add this code [*].

4) Place a comma after []. Hence [*] ALL STAR TALENT AGENCY becomes [*],
ALL STAR TALENT AGENCY

5) Some [] have multiple codes separated by a comma.  Change that comma
to ;.  Hence, [*, L] becomes [*; L]

6) If there is more than one space between characters change it to one
space.

How I think this can be done:
1) I don't know how but change each newline NOT followed by a
consecutive newline/empty line into a comma followed by a space.  Then
delete one newline when there are to consecutive newlines?

2) with sed like this: s/ CA / CA, /   ?????

3) I don't know how.  Basically look for a [ at the beginning of a line,
if it's not there place [*]

4) With sed like this: s/]/], /        ?????

5) with sed like this /[/,/]/ {s/,/;/} ?????

6) With sed like this: s/ */ /g    Do the above sed scripts need the g?

Any help you can give me, especially with problem 1), would be great.

PS. After I'm finished with "sed & awk" I think I'll get the VI book.



Mon, 24 Apr 2000 03:00:00 GMT  
 Sed/Awk help needed



Quote:
> 1) Change the general format of the file.

> This:

> ALLEN TALENT AGENCY
> 11755 WILSHIRE BLVD #1750
> LOS ANGELES, CA 90025-1523
> (213) 896-9372

> [*] ALL STAR TALENT AGENCY
> 7834 ALABAMA AVE
> CANOGA PARK, CA 91304-4905
> (818) 346-4313

> Becomes:

> ALLEN TALENT AGENCY, 11755 WILSHIRE BLVD #1750, LOS ANGELES, CA
> 90025-1523, (213) 896-9372
> [*] ALL STAR TALENT AGENCY, 7834 ALABAMA AVE, CANOGA PARK, CA
> 91304-4905, (818) 346-4313

[posted and mailed]

Here's an awk solution, tested with gawk (GNU awk 3.0). It should do
everything you wanted bar one - it uses colons instead of commas as
output field separators (your input file contains records with commas
embedded in some of the fields). You can change this by modifying the
value of the OFS variable in the script below, but make sure you use a
unique character as the delimiter, otherwise you will have problems
when importing the final product into your database.

Some assumptions I've made about the input file based on the samples
you gave:

* no blank lines at the start and end of the file
* single blank line only between records
* each record is exactly 4 lines

Save everything between the "-----" to a file (e.g., reformat.awk) and
run it on the command line like this:

gawk -f reformat.awk input.dat > output.dat

Happy awking.

-----
BEGIN {
  # output field separator
  OFS = ":"
  # counter for the 4 lines in each record
  recline = 1

Quote:
}

{
  # if line is blank, print reformatted record and reset counter
  if ($0 == "") {
    print code, name, addr, loc, "CA", zip, tel
    recline = 1
    next
  }
  else {
    # get rid of any excess spaces
    gsub(/  +/, " ", $0)

    if (recline == 1) {
      # does the line begin with a code?
      if ($0 ~ /^\[.*\]/) {
        code = substr($0, 1, index($0, "]"))
        name = substr($0, index($0, "]") + 2)
        # replace any "," with ";"
        gsub(/,/, ";", code)
      }
      else {
        code = "[*]"
        name = $0
      }
      recline++
      next
    }

    if (recline == 2) {
      addr = $0
      recline++
      next
    }

    if (recline == 3) {
      loc = substr($0, 1, index($0, ", CA ") - 1)
      zip = substr($0, index($0, ", CA ") + 5)
      recline++
      next
    }

    if (recline == 4)
      tel = $0
  }

Quote:
}

END {
  # print last record
  print code, name, addr, loc, "CA", zip, tel
Quote:
}

-----

+--------------------------------------------------------------------+

| Brisbane Australia                                                 |
+--------------------------------------------------------------------+

Discovery consists in seeing what everyone else has seen and thinking
what no one else has thought.

- Albert Szent-Gyorgi



Thu, 04 May 2000 03:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Need help with sed or awk !!

2. Need awk (or sed) help PLEASE!

3. new to sed/awk, need help badly

4. sed/awk - grabbing column/pos help!!

5. help with awk/sed, please

6. HELP with awk and sed to parse email

7. sed help needed - replacing parts of a string

8. need help on using Ruby to replace some SED expressions

9. Newbie awk (sed??) question, regular expressions

10. Awk/Sed Filehandler question

11. sed, awk, perl

12. A very simple question on SED or AWK for a GURU, and an enjoyable problem

 

 
Powered by phpBB® Forum Software