String Parsing 
Author Message
 String Parsing

Hello Everyone,

I am trying to parse a string that is something like this:

001204 HOURLY OBSERVATION                         50.22     CODE TABLE

I want to parse the string up into a interger and tree substrings.  

interger = 001204
string 1 = HOURLY OBSERVATION
string 2 = 50.22
string 3 = CODE TABLE

Now I know that string1 can be a most 42 characters long, string2 at
most 10 characters long and string3 at most 11 characters long.  

I've tried using sscanf for something like this:

sscanf(lineBufr, "%6d %42[^\n]s %10[^\n]s %11[^n]s", &bufrCode, messBufrOne,
        messBufrTwo, messBufrThree);

I am only getting bufrCode and messBufrOne properly.  For some reason the
other strings aren't being read in.  

Does anyone know a way to do this using sscanf or another way?  If also
tried just plain using loops to get the substrings I want but that
probably insn't the most efficient method.

-----
Jeff Dunnett - Student
Computer and Information Science
University of Guelph

"If you love something write it in C; If it compiles, it is yours; If it
doesn't, it never was."



Tue, 08 Feb 2005 22:52:30 GMT  
 String Parsing
I'd use a regular expression package like http://ygrep.cjb.net/ and apply
and expression like:

\([0-9]{6}\)[ \t]+\([^ \t]{42}\)[ \t]+\([^ \t]{10}\)[ \t]+\([^ \t]{11}\)

This should do the work (even if this can be improved). Placing the
results in the sub-expression 1 to 4. Giving a match only if the
structure is OK (sub-patterns could be used to find the origin of an
error).

Have fun programming.
Yves R


says...

Quote:
> Hello Everyone,

> I am trying to parse a string that is something like this:

> 001204 HOURLY OBSERVATION                         50.22     CODE TABLE

> I want to parse the string up into a interger and tree substrings.  

> interger = 001204
> string 1 = HOURLY OBSERVATION
> string 2 = 50.22
> string 3 = CODE TABLE

> Now I know that string1 can be a most 42 characters long, string2 at
> most 10 characters long and string3 at most 11 characters long.  

> I've tried using sscanf for something like this:

> sscanf(lineBufr, "%6d %42[^\n]s %10[^\n]s %11[^n]s", &bufrCode, messBufrOne,
>         messBufrTwo, messBufrThree);

> I am only getting bufrCode and messBufrOne properly.  For some reason the
> other strings aren't being read in.  

> Does anyone know a way to do this using sscanf or another way?  If also
> tried just plain using loops to get the substrings I want but that
> probably insn't the most efficient method.

> -----
> Jeff Dunnett - Student
> Computer and Information Science
> University of Guelph

> "If you love something write it in C; If it compiles, it is yours; If it
> doesn't, it never was."



Wed, 09 Feb 2005 01:45:06 GMT  
 String Parsing

Quote:

>I am trying to parse a string that is something like this:

>001204 HOURLY OBSERVATION                         50.22     CODE TABLE

>I want to parse the string up into a interger and tree substrings.  

>interger = 001204
>string 1 = HOURLY OBSERVATION
>string 2 = 50.22
>string 3 = CODE TABLE

>Now I know that string1 can be a most 42 characters long, string2 at
>most 10 characters long and string3 at most 11 characters long.  

>I've tried using sscanf for something like this:

>sscanf(lineBufr, "%6d %42[^\n]s %10[^\n]s %11[^n]s", &bufrCode, messBufrOne,
>        messBufrTwo, messBufrThree);

Something *like* that, or *exactly* that?

If you used *exactly* that format string, then this:

Quote:
>I am only getting bufrCode and messBufrOne properly.  For some reason the
>other strings aren't being read in.  

is no surprise, because the directive "%42[^\n]s" means:

    Match at least one but no more than 42 characters that
    are in the (inverted) scanset [^\n], i.e., are not newlines.
    If that succeeds, store the matched characters via the
    supplied "char *" pointer and continue, otherwise stop
    with a matching or input failure as appropriate.

    Next, match a literal 's'.  If that fails, stop with
    a matching or input failure as appropriate.

Since there is no literal 's' in the example input -- given the
above, the "scan cursor" is at this poit on a single blank --
sscanf() must stop with a matching failure, and thus return 2.

Quote:
>Does anyone know a way to do this using sscanf or another way?

You are quite close; if you remove the "s"s following the scansets,
and fix the last scanset (which scans things that are not 'n's
rather than things that are not '\n's), it should work.  (Perhaps
by luck: the " " directives in your format string mean "match zero
or more whitespace characters", rather than "match exactly one
literal blank".  The first two " " directives will match one
blank each in this case, and the last -- in "%10[^\n] %11[^\n]"
-- would match none.)

Note, by the way, that scanning 42 non-newline characters with
"%42[^\n]" will write *43* bytes (not 42) to messBufrOne[0] through
messBufrOne[42] respectively, of which most will be trailing
blanks.  Chances are good that you will want to strip the trailing
blanks eventually.

Scanf, even in its semi-tamed "sscanf" form, is always something
of a wild beast, and tends to{*filter*}the programmer the moment he
stops paying strict attention.  For reading fixed-column input,
you might want to write your own function and abandon the scanf
family entirely.
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)





Wed, 09 Feb 2005 07:22:54 GMT  
 String Parsing

Quote:
> Hello Everyone,

> I am trying to parse a string that is something like this:

> 001204 HOURLY OBSERVATION                         50.22     CODE TABLE
...
> Now I know that string1 can be a most 42 characters long, string2 at
> most 10 characters long and string3 at most 11 characters long.

> I've tried using sscanf for something like this:

> sscanf(lineBufr, "%6d %42[^\n]s %10[^\n]s %11[^n]s", &bufrCode, messBufrOne,
>         messBufrTwo, messBufrThree);

> I am only getting bufrCode and messBufrOne properly.  For some reason the
> other strings aren't being read in.

Assuming (plausibly) your line buffer is terminated by a newline,
that will put up to 42 characters from the line, possibly including
the 50.22 (or whatever) if it or part of it fits, in messBufrOne, then
expects to find a lower-case 's' in the input, which presumably
isn't there and causes the sscanf to quit.  You could see this if
you checked the return value from sscanf, as you always should.

In your example data, string1 seems to be exactly 42 characters,
assuming the one space before it and the one space before 50.22
are separators not part of the data fields (or the space before 50.22
is a leading space or suppressed sign position).  If string1 can be
less than 42 characters, how do you (want to) determine where it
ends and string2 begins?

Quote:
> Does anyone know a way to do this using sscanf or another way?  If also
> tried just plain using loops to get the substrings I want but that
> probably insn't the most efficient method.

If properly written, it's probably just about as efficient as sscanf.
While standard library functions in general don't have to be implemented
in standard or portable C and can use implementation tricks, none
of these that I can think of help sscanf significantly.  If your data
lends itself reasonably to *scanf parsing, however, that code will be
shorter and thus probably easier to read, maintain, and debug.

--
- David.Thompson 1 now at worldnet.att.net



Fri, 11 Feb 2005 09:39:56 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. String Arrays and string parsing

2. String Parsing Functions and Libraries Help

3. String Parsing

4. string parsing question

5. string parsing

6. Qeustion about string parse

7. string parsing

8. string parsing

9. string parsing

10. newbie string parsing question

11. problem with string parsing

12. string parsing

 

 
Powered by phpBB® Forum Software