switch()'ing on strings... design question. 
Author Message
 switch()'ing on strings... design question.

First, let me say, this is *NOT* the FAQ.

Rather...

Everyone knows switch() switches on integer expressions.

Wouldn't It Be Nice if you could select from strings, though...

Well, perhaps one could write "str_switch()", which takes a string,
and an array of strings, and returns the index in the array of a matching
string.  Of course, this would be horribly inefficient - presumably at
least as expensive as the iterative strcmp() so often used as an
alternative, and possibly as bad as the endless if-else-if-else-if...
used in some programs.

But what if one were to build a table?  In every application I've had
where I wish to see which of a set of strings I have, I've had a known
predefined set of strings.  So, the behavior could be something like
        {
                static ss_table *sst;

                if (!sst) {
                        sst = str_stable(stringtable);
                        if (!sst)
                                return 0;
                }
                switch (str_switch(str, sst)) {
                case -1: /* failure */
                        break;
                case 0: /* 1st string in table */
                ...
                }
        }

The idea would be that "str_stable" would build a somewhat more efficient
table representation of what the strings are like.  I could see it being
done with hashes, or with something like what fgrep traditionally uses.

Any prior art?  Ideas I've missed?  I'm thinking about developing this,
since it would be useful in a current application, and I'd like to do it
right, so I don't repeat the work.

-s
--
Copyright 1997 Peter Seebach - seebs at solon.com - C/Unix Wizard

The *other* C FAQ, the hacker FAQ, et al.   http://www.*-*-*.com/ ~seebs
Unsolicited email (junk mail and ads) is unwelcome, and will be billed for.



Fri, 02 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

My simplest suggestion --> Steal whatever VAX BASIC does.  Cram it into the
next iteration of the standard.  All mortal C programmers will sing your
praises.

Another solution: Have the COMPILER sort the strings AT COMPILE TIME and
place them in an internal array.  Then do a BSEARCH of the array for the
input string to switch on.  Not only portable, but also efficient.  Not
only would they sing your praises, they would recount your glories to their
grandchildren and their great grandchildren.
;-)

/* what the programmer codes: */
char *Animal;
/* ... */
switch (Animal)
{
   case "BEAR":
        /* do stuff */
        break;
   case "LION":
        /* do stuff */
        break;
   case "TIGER":
        /* do stuff */
        break;
   case "EAGLE":
        /* do stuff */
        break;
   case "FROG":
        /* do stuff */
        break;
   case "PYTHON":
        /* do stuff */
        break;
   default:
        /* Warn about the 5 million other possibilities */

Quote:
}

/* What the compiler does: */
/* Fast search, hidden behind the scenes, that knows we are using pointers
*/
int fastbsearch( char **SwitchStuff, char *SwitchItem );

char *SwitchStrings[] =
{
         "BEAR",
         "EAGLE",
         "FROG",
         "LION",
         "PYTHON",
         "TIGER"

Quote:
};

/* Then we do the actual switch on return value for fastbsearch().  The
compiler actually replaces the string after case with a number which is in
the position of the list. */
/* ... */
switch (fastbsearch( SwitchStrings, Animal ) );
{
   case 0: /* "BEAR": */
        /* do stuff */
        break;
   case 4: /* "LION": */
        /* do stuff */
        break;
   case 6: /* "TIGER": */
        /* do stuff */
        break;
   case 2: /* "EAGLE": */
        /* do stuff */
        break;
   case 3: /* "FROG": */
        /* do stuff */
        break;
   case 5: /* "PYTHON":*/
        /* do stuff */
        break;
   default:
        /* Warn about the 5 million other possibilities */

Quote:
}

/* Why should I have to type all this in?  Less is more. */



Quote:
> First, let me say, this is *NOT* the FAQ.

> Rather...

> Everyone knows switch() switches on integer expressions.

> Wouldn't It Be Nice if you could select from strings, though...

> Well, perhaps one could write "str_switch()", which takes a string,
> and an array of strings, and returns the index in the array of a matching
> string.  Of course, this would be horribly inefficient - presumably at
> least as expensive as the iterative strcmp() so often used as an
> alternative, and possibly as bad as the endless if-else-if-else-if...
> used in some programs.

> But what if one were to build a table?  In every application I've had
> where I wish to see which of a set of strings I have, I've had a known
> predefined set of strings.  So, the behavior could be something like
>    {
>            static ss_table *sst;

>            if (!sst) {
>                    sst = str_stable(stringtable);
>                    if (!sst)
>                            return 0;
>            }
>            switch (str_switch(str, sst)) {
>            case -1: /* failure */
>                    break;
>            case 0: /* 1st string in table */
>            ...
>            }
>    }

> The idea would be that "str_stable" would build a somewhat more efficient
> table representation of what the strings are like.  I could see it being
> done with hashes, or with something like what fgrep traditionally uses.

> Any prior art?  Ideas I've missed?  I'm thinking about developing this,
> since it would be useful in a current application, and I'd like to do it
> right, so I don't repeat the work.

> -s
> --
> Copyright 1997 Peter Seebach - seebs at solon.com - C/Unix Wizard

there.
> The *other* C FAQ, the hacker FAQ, et al.  http://www.solon.com/~seebs
> Unsolicited email (junk mail and ads) is unwelcome, and will be billed
for.



Sun, 04 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Quote:

>Wouldn't It Be Nice if you could select from strings, though...

Whenever I've needed to switch on a string there's usually just
a small portion (sizeof long or less) of the string that's significant.
A technique that I've used is to turn those bytes into something that I
can switch on.  The OS that I use provides a MAKETYPE macro that, given
the address of something, will cast that something into another specified
type:
   #define MAKETYPE(v, type)  (*((type *)&v))

Then I just switch on MAKETYPE(myString, long) to turn into a long and
use #defines for the cases so I'll have meaningful labels there.

Note that this works for me.  It is definitely not portable, primarily
because byte-swapping will significantly alter the result.


Software Development Engineer     Check Solutions

#include <std.disclaimer>



Sun, 04 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Well, with some trepidation I'll enter the fray here ...
Be warned this answer gets rather long ;-)

After reading this if anyone wants the perl pre-processor discussed
I'll happily mail it out.

: Everyone knows switch() switches on integer expressions.
:
: Wouldn't It Be Nice if you could select from strings, though...
:
: ( ... snip ... )
:
: But what if one were to build a table?  In every application I've had
: where I wish to see which of a set of strings I have, I've had a known
: predefined set of strings.  So, the behavior could be something like
:
: ( ... snip ... )
:
: The idea would be that "str_stable" would build a somewhat more efficient
: table representation of what the strings are like.  I could see it being
: done with hashes, or with something like what fgrep traditionally uses.
:
: Any prior art?  Ideas I've missed?  I'm thinking about developing this,
: since it would be useful in a current application, and I'd like to do it
: right, so I don't repeat the work.

My solution may not be general enough - it is a compile time not run
time solution. Anyway, I had to parse a config file of key/value pairs
and wanted users to be able to write the text corresponding to the name
of the key rather than the numeric #define constant.

Given over 100 constants, I wasn't interested in the strcmp noise.
So I whipped out a perl critter that takes a data file and
gens a header file with #defines and a source file with a hashed array
(257 buckets) and a lookup function that returns the numeric equivalent
of each tag and deals with the collisions.  

Input:

    DB_USER                                 1
    DB_GROUP                                2
    DB_MEMBER                               3
    DB_APPT                                 4
    DB_CONTACT                              5
    DB_KEEPER                               6
    DB_SESSION                              7

Generated Header:

    #define DB_USER                 1
    #define DB_GROUP                2
    #define DB_MEMBER               3
    #define DB_APPT                 4
    #define DB_CONTACT              5
    #define DB_KEEPER               6
    #define DB_SESSION              7

    typedef struct tag_def_dtd
    {
        int         tag;                /* Field tag            */
        char        name[25];           /* Tag name             */
    } tag_def;
    extern tag_def         tag_array[];
    extern int tag_lookup(char *name);

Generated Source:

    tag_def tag_array[] = {
        { 0,                       "-",                     },
        { 0,                       "-",                     },
        ( ... snip ... )
        { DB_USER,                 "DB_USER"                },
        { 0,                       "-",                     },
        { 0,                       "-",                     },
        { DB_SESSION,              "DB_SESSION"             },
        ( ... snip ... )
        { 0,                       "",                      }
    };

    int
    tag_lookup(char *name)
    {
        int     hash;

        strupper (name);
        hash = hash_gen(name);

        while( tag_array[hash].tag != 0 )   /* collision resolution */
        {
            if ( strcmp(name, tag_array[hash].name) == 0 ) break;
            hash++;
        }

        return tag_array[hash].tag;
    }

Sample Usage:

    if ( (table = tag_lookup(argv[1])) == 0 )
    {
            fprintf(stderr, "\n");
            fprintf(stderr, "%s is not a known table name\n\n", argv[1]);
            exit (1);
    }

    or

    switch(tag_lookup(char *str))
    {
        ( ... snip ... )
    }

--
George Headley                             Director of Software Engineering

Berbee Information Networks                               Fax: 608/288-3007
5944 Seminole Centre Court                                 Madison WI 53711



Sun, 04 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Quote:

>Everyone knows switch() switches on integer expressions.

>Wouldn't It Be Nice if you could select from strings, though...

>Well, perhaps one could write "str_switch()", which takes a string,
>and an array of strings, and returns the index in the array of a matching
>string.  Of course, this would be horribly inefficient - presumably at
>least as expensive as the iterative strcmp() so often used as an
>alternative, and possibly as bad as the endless if-else-if-else-if...
>used in some programs.

>The idea would be that "str_stable" would build a somewhat more efficient
>table representation of what the strings are like.  I could see it being
>done with hashes, or with something like what fgrep traditionally uses.

>Any prior art?  Ideas I've missed?  I'm thinking about developing this,
>since it would be useful in a current application, and I'd like to do it
>right, so I don't repeat the work.

  When faced with this, I typically define a structure with two members, a
string (char *) and a function pointer, then I have an array (usually sorted
alphabetically) of such structures.  The parameters the function takes
usually depends upon the application, but it gives me a nice blend between
speed and flexibility.  

  For instance:

struct entry
{
  char  *name;
  void (*funct)(void);

Quote:
};

struct entry tabstring[] =
{
  { "aaa" , func1 },
  { "bbb" , func2 },
  /* ... */
  { "zzz" , func26 }

Quote:
};

#define TABSTRINGSIZE   (sizeof(tabstring) / sizeof(struct entry))

        /* code ... */
        struct entry *pse;
        char         *input;

        input = getinput();
        pse = bsearch(input,tabstring,TABSTRINGSIZE,sizeof(struct entry),cmp);
        if (pse) (*pse->funct)();

  -spc (Salt to taste, obviously)



Sun, 04 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Quote:

> First, let me say, this is *NOT* the FAQ.

> Rather...

> Everyone knows switch() switches on integer expressions.

> Wouldn't It Be Nice if you could select from strings, though...

I always use:

typedef (*funptr)();

typedef struct {
    char *name;
    funptr action;

Quote:
} entry_t;

#define E(foo) { #foo, do_##foo }

entry_t actions[] = {
   E(act1),
   E(act2),
   E(act3),
   E(act4)

Quote:
};

during initialization, I sort actions[] based on actions[].name, then
use a binary search to find matches.  It has the advantage that adding
a new command (typically I use this for quick command-line parsers)
requires only adding a E(whatever), line and a do_whatever() routine,
and is efficient enough for all but the most demanding applications.

This is also the example I use when people say token quoting and token
pasting are useless and should be avoided.  In this case, writing it
out can even introduce bugs, along the lines of:

    { "fmoo", do_foo },

---------------------------------------------------------------------------
Tim Hollebeek         | Disclaimer :=> Everything above is a true statement,
Electron Psychologist |                for sufficiently false values of true.

----------------------| http://wfn-shop.princeton.edu/~tim (NEW! IMPROVED!)



Sun, 04 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.


Quote:

> Wouldn't It Be Nice if you could select from strings, though...

[snip-ola]

Quote:
> Any prior art?  Ideas I've missed?  I'm thinking about developing this,
> since it would be useful in a current application, and I'd like to do it
> right, so I don't repeat the work.

You might check out my Associative Array library on my web site,
which is what I think you're talking about.  It allows creating
an associative array context, which you may place string keys
that translate into either other strings, or a general void *
pointer.  The string key may be either stored externally to the
AA context, or duplicated internally.  It uses a hash table
algorithm which will dynamically increase the hash table size
as the number of entries increase.  I have attached the
documentation file.

I intentionally made the names short, i.e., aaPut and aaGet to
make it convenient to use.  I have to say that this is one of
those libraries that you just didn't know you needed until you
have it available, cheap and easy (kind of like Perl associative
arrays :-) )

I should note that my legal language (which I deleted here)
allows this library to be used in commercial applications.

--------

1.0     Description

This document describes subroutines that may be used in a manner similiar
to "Associative Arrays" as in Perl.  They use a very efficient hash
table algorithm for quick translation of C string Keys to strings or
general pointers.  Features:

    * Use external or internal storage for key string.
    * Use external or internal storage for key value.  In the case of
      internal storage, the value must be a string.  For external
      storage, the value may be any general pointer.
    * Dynamically increases/decreases size of hash table depending on
      ratio of entries / hash table size.
    * Uses 'CRC' algorithm for good distributions, even on similiar keys.

1.1     Files

aa.c            Associative array library routines
aa.h            Associative array public definitions
aaP.h           Associative array private definitions

2.0     Subroutines

2.1     aaContext *aaCreateContext(long flags)

        flags
            AA_CASEINSIG        Use case insignificant keys

Creates Associative Array context.

2.2     void aaDestroyContext(aaContext *aa)

        aa              Context to destroy.

Destroys Associative Array context, and all internally allocated keys
and values.  If a key or value was specified as external, nothing is
done to that key or value. Parameters:

2.3     int aaPut(aaContext *aa, char *key, void *value, long flags)

        aa              Associative array
        key             Key to store.
        value           Value, string or general pointer.
        flags
            AA_DUPKEY   Duplicate key using internal storage.
            AA_DUPVAL   Duplicate string value using internal storage.
            AA_EXTKEY   Store pointer to passed key, rather than duplicate
                        key internally.
            AA_EXTVAL   Store pointer to external value, rather than
                        duplicate internally.  May be general pointer in
                        this case (such as to a structure).
            AA_NOREP    Do not replace value if key already exists.

One of (AA_DUPKEY,AA_EXTKEY) and (AA_DUPVAL,AA_EXTVAL) must be given.

Store a key/value pair into an associative array.  Overwrites previous
value if key already exists, unless the AA_NOREP flag is given.
Key is case significant, unless AA was created with AA_CASEINSIG flag.
Returns -1 if error (No KEY or VAL flag specified, allocation error),
0 if key did not exist, and 1 if key already existed.  May be used
from within an aaScanBegin loop.

The intention of having no default for internal or external storage
is to cut down the risk of errors, even though it makes it a bit more
long-winded.  A default can obviously be added easily.

2.4     void *aaGet(aaContext *aa, char *key)

        aa              Associative array
        key             Key to search.

Retrieve value of key; returns NULL if key not found.

2.4     int aaDelete(aaContext *aa, char *key)

        aa              Associative array
        key             Key to delete.

Delete key from associative array.  Returns 0 if key existed, otherwise -1.
May be used within an aaScanBegin loop.

2.5     int aaScanBegin(aaContext *aa, char *key)

        aa              Associative array context.
        key             Optional starting key, or NULL.  Key must exist.

Scan keys in hash table (that is, random) order.  In practice, passing
in a start key is not that useful, but it could be used to restart a
scan for whatever reason.  aaScanEnd() should be used when the scan is
complete.  Note that during an active scan, the hash table will
not be reconfigured from adds or deletes in order to preserve a stable
hash order.  When the scan is completed by calling aaScanEnd, the hash
table's size is reconfigured (if necessary).

2.6     char *aaScanNext(aaContext *aa, void **value)

        aa              Associative array context.
        value           Optional value of key, or NULL.

After calling aaScanBegin, returns pointer to next key, or NULL.  Note
it is safe to use aaDelete and aaPut from within an aaScanBegin loop.

2.7     void aaScanEnd(aaContext *aa)

        aa              Associative array context.

Terminates a scan.  If any aaDelete's or aaPut's where done during a scan,
then the hash table is reconfigured at this time.

2.8     void aaDumpTable(aaContext *aa)

        aa              Associative array context.

Dumps out an associative array to stdout for debugging purposes.

3.0     Sample program

#include <stdio.h>
#include "aa.h"

int main(void)
{
    struct aaContext *aa;
    char *p;
    char key[20], value[20];
    char key2[20];
    int i;

    aa = aaCreateContext(0);

    /* Create some entries */

    for (i = 0; i < 100; ++i) {
        sprintf(key, "KEY%d", i);
        sprintf(value, "VALUE%d", i);
        aaPut(aa, key, value, AA_DUPKEY|AA_DUPVAL);
    }
    aaDumpTable(aa);

    /* Read back the entries */

    for (i = 0; i < 100; ++i) {
        sprintf(key2, "KEY%d", i);
        p = aaGet(aa, key2);
        printf("aaGet (%s): value = '%s'\n", key2, p);
    }

    /* Scan and delete 90% of the entries */

    aaScanBegin(aa, NULL);
    i = 0;
    while ((p = aaScanNext(aa, NULL)) != NULL) {
        if (i++ % 10 == 0)      /* Leave some left over */
            continue;
        printf("Deleting: %s\n", p);
        aaDelete(aa, p);
    }
    aaScanEnd(aa);

    /* Dump out the entries that are left */

    printf("After delete:\n");
    aaDumpTable(aa);

    aaDestroyContext(aa);
    return;

Quote:
}

--
==========================================================================

| "Judge all, and be prepared to be judged by all."                      |
==========================================================================


Sun, 04 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Quote:

>My simplest suggestion --> Steal whatever VAX BASIC does.  Cram it into the
>next iteration of the standard.  All mortal C programmers will sing your
>praises.

And all the mortal customers will scream with outrage at the US-only
software and go and buy something else.

Write 100 times:
        hardwired strings in a program are a barrier to localisation.

Quote:
>Another solution: Have the COMPILER sort the strings AT COMPILE TIME and
>place them in an internal array.  Then do a BSEARCH of the array for the
>input string to switch on.  Not only portable, but also efficient.

Actually, NOT efficient.  If you have N strings of length S, bsearch()
can take O(S.lgN) time, whereas if you use a suitably represented trie,
it can be done in O(S) time.

Quote:
>Not only would they sing your praises, they would recount your glories
>to their grandchildren and their great grandchildren.

Strings that are strictly internal to the operation of a program can be
anything you want, but strings that may be of interest to an end-user
had better be in the end-user's language and script.  And if you are
going to match strings the user typed against strings in some kind of
table, you had better worry about whether the match should be case
sensitive or not and you had better worry about the ISO 10646
canonicalisation rules and you had better worry about whether to
distinguish between "APL LETTER ALPHA" and "GREEK LETTER ALPHA" or not,
&c &c &c.

It has always been true that hardwiring strings into a program was
a silly thing to do unless you had no alternative.  For the past 10
or 12 years it has been OBVIOUSLY silly.  By now, when Windows NT
uses Unicode inside, and UNIX systems come with support for multibyte
characters, and the X Window System has support for this stuff, it is
PAINFULLY obvious that hardwiring ASCII string literals into your
program is a good way to produce a product with no market outside
your own country.

"mbc", "wchar_t", locales, and so on are there in C for a *REASON*.
--
My tertiary education cost a quarter of a million in lost income
(assuming close-to-minimum wage); why make students pay even more?
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.



Mon, 05 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Quote:

> But what if one were to build a table?  In every application I've had
> where I wish to see which of a set of strings I have, I've had a known
> predefined set of strings.

This looks suspiciously similar to what GPERF does.  You give it a table
of strings, and it builds a perfect hash function for your table,
spewing out the appropriate C code.  I've not used it myself in real
code, but it looks quite handy.
--
[mdw], who avoids mentioning the obvious use of `bsearch'.

`How can you be so mean to someone so meaningless?'
                -- Selina Kyle



Mon, 05 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.


: >Wouldn't It Be Nice if you could select from strings, though...

: Whenever I've needed to switch on a string there's usually just
: a small portion (sizeof long or less) of the string that's significant.
: A technique that I've used is to turn those bytes into something that I
: can switch on.  The OS that I use provides a MAKETYPE macro that, given
: the address of something, will cast that something into another specified
: type:
:    #define MAKETYPE(v, type)  (*((type *)&v))

: Then I just switch on MAKETYPE(myString, long) to turn into a long and
: use #defines for the cases so I'll have meaningful labels there.

: Note that this works for me.  It is definitely not portable, primarily
: because byte-swapping will significantly alter the result.

Seems to me like what you guys are trying to re-invent (seebs used
the term "prior art") are associative arrays, indexed on a string,
with an integer data type as data.  I.e., the *opposite* of an array of
char pointers.  To my knowledge, a good implementation for this
must be in the source code for Perl, because that language lives and
dies by the efficiency of its $value("some string") constructs.  I thinks
it's done by means of hash tables.

*******************************************************************

Novell IS & T, Bracknell, England             +44-1344-724031
*******************************************************************
*        if (status = UNDER_NUCLEAR_ATTACK)                       *
*               launch_full_counterstrike();                      *
*******************************************************************
(C) 1997 M.F.Sohnius -- free distribution on USENET
       (Not a spokesperson  -- just a cyclist!)



Mon, 05 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.


Quote:

>First, let me say, this is *NOT* the FAQ.

>Rather...

>Everyone knows switch() switches on integer expressions.

>Wouldn't It Be Nice if you could select from strings, though...

>Well, perhaps one could write "str_switch()", which takes a string,
>and an array of strings, and returns the index in the array of a matching
>string.  Of course, this would be horribly inefficient - presumably at
>least as expensive as the iterative strcmp() so often used as an
>alternative, and possibly as bad as the endless if-else-if-else-if...
>used in some programs.

I think the most efficient technique to str_switch would be to write an
optimized state machine to recognize each of the strings.  That way, at
worst you'll have the cost of the iterative strcmp() solution, and at
best, it'll be (waving hands) much better.

If you want to code your state machine yourself, more power to you.  I'd
use lex.

Although there is something to be said for hand-tuning them when your
set of input strings is small.  I can write a state machine for the
strings { "this", "that", "other" } without much difficulty at all, and
it'll likely be smaller and faster than lex.

Me?  I use perl for string-intensive stuff these days.  Apologies. ;)
--


    Free time?  There's no such thing.  It just comes in varying prices...



Mon, 05 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.


: *******************************************************************
: *        if (status = UNDER_NUCLEAR_ATTACK)                       *
: *               launch_full_counterstrike();                      *
: *******************************************************************

We can only hope for one of two things --

- that's not meant to be C, or:
- you've never done any contract work for NORAD.

--
//.\\/\\/\\\\//////\//\/\/\/\\\/\\\/\\\\/\//\/\\\//\\\\/\\\//\/\\//\\\/\\//
\        Jason Tyler            "What's so unpleasant about being drunk?" \

///\\/\\/\\\\\\/\\/\/\\/\\\/\//\\///\\\\\///\//\\\\\\\///\/\\\/\/\\\//\\//\



Thu, 08 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Ok, I'll try to say something usefel here as well ... Actually I have no
time to think about this, just couldn't resist to pass on some partial
thoughts. Fasten seatbelts, here we go:

(Really, this is Q about algorithms rather than C, but who cares these
days anymore)

After trying it all I have come to the conclusion that sorting
alphbetically is a good idea.

(I tried sort by stringlength as well, but it won't buy you anything.
Hashing is off since there is a lot of more useful things you could have
done while racing through your teststring.)

----------------------------------------------------------------------
The way I understand the problem, the set of valid strings won't change
during runtime.

The set of strings to test might, on the other hand, be supplied by the
user (so there is always one more.)
----------------------------------------------------------------------

What did I say? Oh yes, the strings are _already_sorted_ alphabetically.
The proper way would then be to perform a bSearch for the first letter in
your test string

Assume you found it, Ok? Then test the second letter, and it was
(surprise) also valid.

But! In this example, as you go on, the third letter was not valid ...
 (Oh shit)

Since the strings are sorted, the value will either be too small, meaning
you still have a chance, or too big, meaning that there is no such
string ...

At this point it would be nice to have a bit of help, no? Simply testing
the next string at index 'n' will most probably send you into the void
space of undefined behaviour.

So we cheat and do some precalculations (on the valid strings)!

Lets have an array of int (or short, or char). One for each string in our
valid library.

The value of this int is equal to the number of chars that are unchanged
between "this string" and "the next string".

If that (next) value is less than our current index, then we have lost.
Else there is still a chance ... (repeat from top :)

// (void) Jens M Andreasen



Thu, 08 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.



Quote:

> >My simplest suggestion --> Steal whatever VAX BASIC does.  Cram it into
the
> >next iteration of the standard.  All mortal C programmers will sing your
> >praises.

> And all the mortal customers will scream with outrage at the US-only
> software and go and buy something else.

> Write 100 times:
>    hardwired strings in a program are a barrier to localisation.

> >Another solution: Have the COMPILER sort the strings AT COMPILE TIME and
> >place them in an internal array.  Then do a BSEARCH of the array for the
> >input string to switch on.  Not only portable, but also efficient.

> Actually, NOT efficient.  If you have N strings of length S, bsearch()
> can take O(S.lgN) time, whereas if you use a suitably represented trie,
> it can be done in O(S) time.

Well, you make an excellent point.  But to store strings in a resource
file, you still have to relink, don't you?  So we can build the table at
link time instead of at run time. The bsearch() function will only take
O(s*log(n)) time if the strings differ only in the last character.  Again,
at link time, the C system could recognize this and start the string
compare at the last character.  If there is difficult data, with randomly
different locations for the string differences, I suspect that the trie
structure search will take just as long because we have a difficult entropy
problem.

Quote:
> >Not only would they sing your praises, they would recount your glories
> >to their grandchildren and their great grandchildren.

> Strings that are strictly internal to the operation of a program can be
> anything you want, but strings that may be of interest to an end-user
> had better be in the end-user's language and script.  And if you are
> going to match strings the user typed against strings in some kind of
> table, you had better worry about whether the match should be case
> sensitive or not and you had better worry about the ISO 10646
> canonicalisation rules and you had better worry about whether to
> distinguish between "APL LETTER ALPHA" and "GREEK LETTER ALPHA" or not,
> &c &c &c.

Since I am doing an exact match, even memcmp() will work.  I can do any
binary comparison at all (even consider the data as an array of ints if the
lengths happen to be evenly divisible by sizeof int).
My solution is not bound by strings being internal to the program.  If the
strings actually change during the running of the program so that the table
itself is not constant, then the proposed solution would require a sort of
the strings before the bsearch at program start time, and probably not be
worth the bother unless the program runs for a long time and uses the
switch heavily.  I believe that you also may have neglected the effort to
convert the strings into a trie structure (if the strings are not known at
start time).  In such a case, while the lookup will be fast, you still have
a cost of building your data structure.  The data structure I proposed has
no extra memory overhead except for the n array positions.

Quote:
> It has always been true that hardwiring strings into a program was
> a silly thing to do unless you had no alternative.  For the past 10
> or 12 years it has been OBVIOUSLY silly.  By now, when Windows NT
> uses Unicode inside, and UNIX systems come with support for multibyte
> characters, and the X Window System has support for this stuff, it is
> PAINFULLY obvious that hardwiring ASCII string literals into your
> program is a good way to produce a product with no market outside
> your own country.

I did not make that requirement.  I posted a simple example.  If we
consider such things as resource files (outside the bounds of the C
language -- but we can generalize to 'data files') then the method I
suggested works unless the strings change during the execution of the
program itself.  You would (of course) need quite an intelligent system to
recognize all of this, I'll admit.


Thu, 08 Jul 1999 03:00:00 GMT  
 switch()'ing on strings... design question.

Quote:

>My simplest suggestion --> Steal whatever VAX BASIC does.  Cram it into the
>next iteration of the standard.  All mortal C programmers will sing your
>praises.

>Another solution: Have the COMPILER sort the strings AT COMPILE TIME and
>place them in an internal array.  Then do a BSEARCH of the array for the
>input string to switch on.  Not only portable, but also efficient.  Not
>only would they sing your praises, they would recount your glories to their
>grandchildren and their great grandchildren.
>;-)

>/* what the programmer codes: */
>char *Animal;
>/* ... */
>switch (Animal)
>{
>   case "BEAR":
>    /* do stuff */
>    break;
>   case "LION":
>    /* do stuff */

Well it's only portable if it gets into the standard.  There is some
prior art for this.  The languages Eh and Zed, (the other descendants
from B), had such a construct.  It turns out not to be a great winner.
It was easy to code, and "close enough" to what was needed that
programmers tended not to expend the extra effort to do a good job.
The result was programs that had very rigid user interfaces.  Programs
tended to be excessively case sensitive, and could not be easily
changed to accept abbreviations that seemed natural to the users.

Several library routines to do different types of string look ups
and map to a integer that is switched on is more useful.
Sometimes exact matches and speed are important, in which case
binary searches of perfact hash look ups are what is needed.
Other times you really want to define a "pattern" (not necessarily
a GREP type pattern), that defines what should match.

We routinely use some home grown routines for command option parsing
that take a table of strings like
 optable[] {
      "KeyWord",
      "OtherOption",
      NULL
 }
Where the lowercase letters are one the user is allowed to omit.



Thu, 08 Jul 1999 03:00:00 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Q:Frequent malloc'ing and free'ing

2. Q!!'scanf'ing strings with char**s

3. return()'ing a string

4. question about cat-ing strings

5. Question : xalloc'ing a large (128K) char array (MS-DOS)

6. how to prevent 'free'-ing memory twice

7. read()'ing from a child's piped stdout

8. trouble with realloc'ing array of char *'s

9. Quickie on free()'ing malloc()'d mem

10. Let's talk about local variables (just question of design)

11. typedef'ing iterators in MSVC's STL (not the HP STL)

12. CComboBox 'Sort'ing Under Program Control?

 

 
Powered by phpBB® Forum Software