Case insensitive substring searching 
Author Message
 Case insensitive substring searching

What is the best way to implement a case-insensitive STL string?  In particular, I need to do a 'strstr' equivalent...  I'm getting configused as to if it is possible to make a predicate to an algorithm or something slick like
that...

TIA,

STeve



Sun, 02 Jan 2005 23:18:55 GMT  
 Case insensitive substring searching
    First of all, you don't want a "case-insensitive STL string", you want a
case-insensitive STL string comaprision function.  Now, you can treat the
string as a string, or as a vector of characters.  However, the string
member functions don't allow predicates; and, while the algorithms for use
with vectors do allow predicates, they don't understand substrings (ie, two
character next to each other being related), so we can't use either of
those.  So, let's start from scratch:

// WARNING : Untested code ahead......
//
string::iterator NoCaseFind(const string& big_string, const string&
little_string)
{
        char    C = toupper(little_string[0]);
        string:iterator iter = big_string.begin();
        string:iterator end = big_string.end() - little_string.lenght() + 1;
        while (iter != end)
        {
            if (toupper(*iter) == C)
                if (NoCaseCompareRest(iter+1, big_string.end(),
little_string.begin()+1, little_string.end())
                    return iter;
        }
        return string::npos;

Quote:
}

bool NoCaseCompareRest(string:iterator iterB, string:iterator iterBE,
                                            string:iterator iterL,
string:iterator iterLE)
{
    while (iterB != iterBE)
    {
        if (toupper(*iterB) != toupper(*iterL)
            return false;    // we found a mismatch

        ++iterB;
        ++iterL;
        if (iterL == iterLE)
            return true;    // we reach the end of the small string without
a mismatch
    }
    return false;            // we reach the end of the big string before we
reached the end of the little string.

Quote:
}

--
Truth,
James Curran
www.NovelTheory.com  (Personal)
www.NJTheater.com   (Professional)
www.aurora-inc.com   (Day job)


Quote:
> What is the best way to implement a case-insensitive STL string?  In

particular, I need to do a 'strstr' equivalent...  I'm getting configused as
to if it is possible to make a predicate to an algorithm or something slick
like
Quote:
> that...

> TIA,

> STeve



Mon, 03 Jan 2005 03:15:21 GMT  
 Case insensitive substring searching


Quote:
> What is the best way to implement a case-insensitive STL string?  In

particular, I need to do a 'strstr' equivalent...  I'm getting configused as
to if it is possible to make a predicate to an algorithm or something slick
like

Quote:
> that...

Here's my attempt (no warranty).
I double the set of characters to be searched for, including both upper- and
lowercase variants of each character.
Then I use the standard find_first_of method...

string::size_type myStrstr(const string& str, const string& charSet) {
  string set2;
  for (int i=0;i<charSet.length();i++)
    set2 += tolower(charSet[i]);
  for (int i=0;i<charSet.length();i++)
    set2 += toupper(charSet[i]);
  string::size_type result = str.find_first_of(set2);
  if (result != string::npos)
    if (result >= str.length())
      return result / 2; // Uppercase matched
  return result; // lowercase or no match

Quote:
}

// Luc


Mon, 03 Jan 2005 03:52:30 GMT  
 Case insensitive substring searching
    As far as I can see, that's actually doing a case-insensitive version of
strcspn()

--
Truth,
James Curran
www.NovelTheory.com  (Personal)
www.NJTheater.com   (Professional)
www.aurora-inc.com   (Day job)


Quote:



> > What is the best way to implement a case-insensitive STL string?  In
> particular, I need to do a 'strstr' equivalent...  I'm getting configused
as
> to if it is possible to make a predicate to an algorithm or something
slick
> like
> > that...

> Here's my attempt (no warranty).
> I double the set of characters to be searched for, including both upper-
and
> lowercase variants of each character.
> Then I use the standard find_first_of method...

> string::size_type myStrstr(const string& str, const string& charSet) {
>   string set2;
>   for (int i=0;i<charSet.length();i++)
>     set2 += tolower(charSet[i]);
>   for (int i=0;i<charSet.length();i++)
>     set2 += toupper(charSet[i]);
>   string::size_type result = str.find_first_of(set2);
>   if (result != string::npos)
>     if (result >= str.length())
>       return result / 2; // Uppercase matched
>   return result; // lowercase or no match
> }

> // Luc



Mon, 03 Jan 2005 21:20:12 GMT  
 Case insensitive substring searching
Do I misunderstand find_first_of()?
I thought it did the same thing as strstr, i.e. searching the first
occurrence of one of a set of characters.
I thought that the equivalent of strcspn() would be find(), not
find_first_of()...
IMHO, to write a case-insensitive version of strcspn(), one could convert
both strings to uppercase (or lowercase), and then use find().

Or did I get it all wrong?

Luc


Quote:
>     As far as I can see, that's actually doing a case-insensitive version
of
> strcspn()

> --
> Truth,
> James Curran
> www.NovelTheory.com  (Personal)
> www.NJTheater.com   (Professional)
> www.aurora-inc.com   (Day job)





> > > What is the best way to implement a case-insensitive STL string?  In
> > particular, I need to do a 'strstr' equivalent...  I'm getting
configused
> as
> > to if it is possible to make a predicate to an algorithm or something
> slick
> > like
> > > that...

> > Here's my attempt (no warranty).
> > I double the set of characters to be searched for, including both upper-
> and
> > lowercase variants of each character.
> > Then I use the standard find_first_of method...

> > string::size_type myStrstr(const string& str, const string& charSet) {
> >   string set2;
> >   for (int i=0;i<charSet.length();i++)
> >     set2 += tolower(charSet[i]);
> >   for (int i=0;i<charSet.length();i++)
> >     set2 += toupper(charSet[i]);
> >   string::size_type result = str.find_first_of(set2);
> >   if (result != string::npos)
> >     if (result >= str.length())
> >       return result / 2; // Uppercase matched
> >   return result; // lowercase or no match
> > }

> > // Luc



Mon, 03 Jan 2005 21:35:55 GMT  
 Case insensitive substring searching
Actually, I think you're misunderstanding strstr().

strstr("Now is the time for all good men to come to the aid of the party",
"come");
// returns a pointer to the word "come", the first appearance of the exact
string given by the second parameter.

strcpn("Now is the time for all good men to come to the aid of the party",
"come");
// returns a pointer to the "o" in "Now", the first appearance of any char
in the second parameter.

--
Truth,
James Curran
www.NovelTheory.com  (Personal)
www.NJTheater.com   (Professional)
www.aurora-inc.com   (Day job)


Quote:
> Do I misunderstand find_first_of()?
> I thought it did the same thing as strstr, i.e. searching the first
> occurrence of one of a set of characters.
> I thought that the equivalent of strcspn() would be find(), not
> find_first_of()...
> IMHO, to write a case-insensitive version of strcspn(), one could convert
> both strings to uppercase (or lowercase), and then use find().

> Or did I get it all wrong?

> Luc



> >     As far as I can see, that's actually doing a case-insensitive
version
> of
> > strcspn()

> > --
> > Truth,
> > James Curran
> > www.NovelTheory.com  (Personal)
> > www.NJTheater.com   (Professional)
> > www.aurora-inc.com   (Day job)





> > > > What is the best way to implement a case-insensitive STL string?  In
> > > particular, I need to do a 'strstr' equivalent...  I'm getting
> configused
> > as
> > > to if it is possible to make a predicate to an algorithm or something
> > slick
> > > like
> > > > that...

> > > Here's my attempt (no warranty).
> > > I double the set of characters to be searched for, including both
upper-
> > and
> > > lowercase variants of each character.
> > > Then I use the standard find_first_of method...

> > > string::size_type myStrstr(const string& str, const string& charSet) {
> > >   string set2;
> > >   for (int i=0;i<charSet.length();i++)
> > >     set2 += tolower(charSet[i]);
> > >   for (int i=0;i<charSet.length();i++)
> > >     set2 += toupper(charSet[i]);
> > >   string::size_type result = str.find_first_of(set2);
> > >   if (result != string::npos)
> > >     if (result >= str.length())
> > >       return result / 2; // Uppercase matched
> > >   return result; // lowercase or no match
> > > }

> > > // Luc



Mon, 03 Jan 2005 22:22:29 GMT  
 Case insensitive substring searching
You're right - I inverted the two!
So, the solution for the OP could be to transform both strings to uppercase,
and to use find()...

Luc


Quote:
> Actually, I think you're misunderstanding strstr().

> strstr("Now is the time for all good men to come to the aid of the party",
> "come");
> // returns a pointer to the word "come", the first appearance of the exact
> string given by the second parameter.

> strcpn("Now is the time for all good men to come to the aid of the party",
> "come");
> // returns a pointer to the "o" in "Now", the first appearance of any char
> in the second parameter.

> --
> Truth,
> James Curran
> www.NovelTheory.com  (Personal)
> www.NJTheater.com   (Professional)
> www.aurora-inc.com   (Day job)



> > Do I misunderstand find_first_of()?
> > I thought it did the same thing as strstr, i.e. searching the first
> > occurrence of one of a set of characters.
> > I thought that the equivalent of strcspn() would be find(), not
> > find_first_of()...
> > IMHO, to write a case-insensitive version of strcspn(), one could
convert
> > both strings to uppercase (or lowercase), and then use find().

> > Or did I get it all wrong?

> > Luc



> > >     As far as I can see, that's actually doing a case-insensitive
> version
> > of
> > > strcspn()

> > > --
> > > Truth,
> > > James Curran
> > > www.NovelTheory.com  (Personal)
> > > www.NJTheater.com   (Professional)
> > > www.aurora-inc.com   (Day job)





> > > > > What is the best way to implement a case-insensitive STL string?
In
> > > > particular, I need to do a 'strstr' equivalent...  I'm getting
> > configused
> > > as
> > > > to if it is possible to make a predicate to an algorithm or
something
> > > slick
> > > > like
> > > > > that...

> > > > Here's my attempt (no warranty).
> > > > I double the set of characters to be searched for, including both
> upper-
> > > and
> > > > lowercase variants of each character.
> > > > Then I use the standard find_first_of method...

> > > > string::size_type myStrstr(const string& str, const string& charSet)
{
> > > >   string set2;
> > > >   for (int i=0;i<charSet.length();i++)
> > > >     set2 += tolower(charSet[i]);
> > > >   for (int i=0;i<charSet.length();i++)
> > > >     set2 += toupper(charSet[i]);
> > > >   string::size_type result = str.find_first_of(set2);
> > > >   if (result != string::npos)
> > > >     if (result >= str.length())
> > > >       return result / 2; // Uppercase matched
> > > >   return result; // lowercase or no match
> > > > }

> > > > // Luc



Mon, 03 Jan 2005 22:45:14 GMT  
 Case insensitive substring searching
    But that gets complicated.... You shouldn't change the original strings,
so you'd have to make copies (which could be expensive), then, after you
found it in the copy, you'd have to translate that location to it's
equivalent position in the original string.

--
Truth,
James Curran [MVP]
www.NJTheater.com     (Professional)
www.NovelTheory.com  (Personal)
MVP = Where your high-priced consultant goes for free answers


Quote:
> You're right - I inverted the two!
> So, the solution for the OP could be to transform both strings to
uppercase,
> and to use find()...

> Luc



> > Actually, I think you're misunderstanding strstr().

> > strstr("Now is the time for all good men to come to the aid of the
party",
> > "come");
> > // returns a pointer to the word "come", the first appearance of the
exact
> > string given by the second parameter.

> > strcpn("Now is the time for all good men to come to the aid of the
party",
> > "come");
> > // returns a pointer to the "o" in "Now", the first appearance of any
char
> > in the second parameter.

> > --
> > Truth,
> > James Curran
> > www.NovelTheory.com  (Personal)
> > www.NJTheater.com   (Professional)
> > www.aurora-inc.com   (Day job)



> > > Do I misunderstand find_first_of()?
> > > I thought it did the same thing as strstr, i.e. searching the first
> > > occurrence of one of a set of characters.
> > > I thought that the equivalent of strcspn() would be find(), not
> > > find_first_of()...
> > > IMHO, to write a case-insensitive version of strcspn(), one could
> convert
> > > both strings to uppercase (or lowercase), and then use find().

> > > Or did I get it all wrong?

> > > Luc



> > > >     As far as I can see, that's actually doing a case-insensitive
> > version
> > > of
> > > > strcspn()

> > > > --
> > > > Truth,
> > > > James Curran
> > > > www.NovelTheory.com  (Personal)
> > > > www.NJTheater.com   (Professional)
> > > > www.aurora-inc.com   (Day job)





> > > > > > What is the best way to implement a case-insensitive STL string?
> In
> > > > > particular, I need to do a 'strstr' equivalent...  I'm getting
> > > configused
> > > > as
> > > > > to if it is possible to make a predicate to an algorithm or
> something
> > > > slick
> > > > > like
> > > > > > that...

> > > > > Here's my attempt (no warranty).
> > > > > I double the set of characters to be searched for, including both
> > upper-
> > > > and
> > > > > lowercase variants of each character.
> > > > > Then I use the standard find_first_of method...

> > > > > string::size_type myStrstr(const string& str, const string&
charSet)
> {
> > > > >   string set2;
> > > > >   for (int i=0;i<charSet.length();i++)
> > > > >     set2 += tolower(charSet[i]);
> > > > >   for (int i=0;i<charSet.length();i++)
> > > > >     set2 += toupper(charSet[i]);
> > > > >   string::size_type result = str.find_first_of(set2);
> > > > >   if (result != string::npos)
> > > > >     if (result >= str.length())
> > > > >       return result / 2; // Uppercase matched
> > > > >   return result; // lowercase or no match
> > > > > }

> > > > > // Luc



Tue, 04 Jan 2005 13:51:18 GMT  
 Case insensitive substring searching
Sorry - the only way to do it is copy two strings to temporary instances,
uppercase (or lowercase) them both, and compare them.

One very quick (and for most cases good) optimization is to compare string
lengths beforehand, and not do the actual string compare if a mismatch is
found.  Works like charm with ANSI and Unicode strings. (Not sure about MBCS
in some Far Eastern locales, though)

K. Lilov
Str Library at http://www.utilitycode.com/str


Quote:
> What is the best way to implement a case-insensitive STL string?  In

particular, I need to do a 'strstr' equivalent...  I'm getting configused as
to if it is possible to make a predicate to an algorithm or something slick
like
Quote:
> that...

> TIA,

> STeve



Tue, 04 Jan 2005 18:10:12 GMT  
 Case insensitive substring searching
Thank you all for your ideas!

I too solved it without too much difficulty via C programming, as follows...

const char * stristr ( const char *s, const char *u ) {

        if ( !s )
                return NULL;
        if ( !u || !*u )
                return s;

        int ls = strlen ( s );
        int lu = strlen ( u );

        if ( lu > ls )
                return NULL;

        for ( int i = 0; i < ( ls - lu + 1 ); i++ )
                if ( ( strnicmp ( s++, u, lu ) ) == 0 )
                        return --s;

        return NULL;

Quote:
}

I was seeking some slick STL approach I hadn't though of.  Is there any gain in deriving
from string and implementing the functionality there?

Steve

Quote:

>     First of all, you don't want a "case-insensitive STL string", you want a
> case-insensitive STL string comaprision function.  Now, you can treat the
> string as a string, or as a vector of characters.  However, the string
> member functions don't allow predicates; and, while the algorithms for use
> with vectors do allow predicates, they don't understand substrings (ie, two
> character next to each other being related), so we can't use either of
> those.  So, let's start from scratch:

> // WARNING : Untested code ahead......
> //
> string::iterator NoCaseFind(const string& big_string, const string&
> little_string)
> {
>         char    C = toupper(little_string[0]);
>         string:iterator iter = big_string.begin();
>         string:iterator end = big_string.end() - little_string.lenght() + 1;
>         while (iter != end)
>         {
>             if (toupper(*iter) == C)
>                 if (NoCaseCompareRest(iter+1, big_string.end(),
> little_string.begin()+1, little_string.end())
>                     return iter;
>         }
>         return string::npos;
> }

> bool NoCaseCompareRest(string:iterator iterB, string:iterator iterBE,
>                                             string:iterator iterL,
> string:iterator iterLE)
> {
>     while (iterB != iterBE)
>     {
>         if (toupper(*iterB) != toupper(*iterL)
>             return false;    // we found a mismatch

>         ++iterB;
>         ++iterL;
>         if (iterL == iterLE)
>             return true;    // we reach the end of the small string without
> a mismatch
>     }
>     return false;            // we reach the end of the big string before we
> reached the end of the little string.
> }

> --
> Truth,
> James Curran
> www.NovelTheory.com  (Personal)
> www.NJTheater.com   (Professional)
> www.aurora-inc.com   (Day job)



> > What is the best way to implement a case-insensitive STL string?  In
> particular, I need to do a 'strstr' equivalent...  I'm getting configused as
> to if it is possible to make a predicate to an algorithm or something slick
> like
> > that...

> > TIA,

> > STeve



Tue, 04 Jan 2005 21:57:25 GMT  
 Case insensitive substring searching
    Actually, if you check the code I posted on Wednesday, you'll see it can
be without copying the strings.

    Further, your optimization will only work when you want an exact
comparison.  The OP actually want a strstr() function (finding one string
within another).

--
Truth,
James Curran
www.NovelTheory.com  (Personal)
www.NJTheater.com   (Professional)
www.aurora-inc.com   (Day job)


Quote:
> Sorry - the only way to do it is copy two strings to temporary instances,
> uppercase (or lowercase) them both, and compare them.

> One very quick (and for most cases good) optimization is to compare string
> lengths beforehand, and not do the actual string compare if a mismatch is
> found.  Works like charm with ANSI and Unicode strings. (Not sure about
MBCS
> in some Far Eastern locales, though)

> K. Lilov
> Str Library at http://www.utilitycode.com/str



> > What is the best way to implement a case-insensitive STL string?  In
> particular, I need to do a 'strstr' equivalent...  I'm getting configused
as
> to if it is possible to make a predicate to an algorithm or something
slick
> like
> > that...

> > TIA,

> > STeve



Tue, 04 Jan 2005 22:06:40 GMT  
 Case insensitive substring searching

Quote:

> class ci_char_traits : public char_traits<char>
> {
>     static bool eq(char c1, char c2)
>     { return toupper(c1) == toupper(c2); }
>     static bool lt(char c1, char c2)
>     { return toupper(c1) < toupper(c2); }
> };

> then, declare a case-insensitive string:

> typedef basic_string <char, ci_char_traits> ci_string;

> This leaves open the question of what happens when comparing a ci_string
> with a string (should it be case sensitive or not?). But you get the

idea...

I've tested this, or something like it, and it's pretty nice, but there's a
few problems with it:

1.  You can't assign this class to a standard string class.

2.  You can't compare a standard string at all to this class.

However, some judiciously places .c_str() calls can help with both of those
problems.

Ultimately, it can be useful.



Tue, 04 Jan 2005 23:58:25 GMT  
 Case insensitive substring searching


Quote:
> What is the best way to implement a case-insensitive STL string?  In

particular, I need to do a 'strstr' equivalent...  I'm getting configused as
to if it is possible to make a predicate to an algorithm or something slick
like

Quote:
> that...

I liked very much the solution from Exceptional c++, by Herb Sutter. As we
know, the STL string is actually:

typedef basic_string<char> string;

but basic_string takes 2 more parameters: char_traits and an allocator
(which take default values, so we don't see them above). The traits class is
what interests us, because it provides info on how to treat each char,
including comparison of two chars. So, what we need to do is provide a
different implementation of char_traits, one that doesn't care about case
when comparing 2 chars.

class ci_char_traits : public char_traits<char>
{
    static bool eq(char c1, char c2)
    { return toupper(c1) == toupper(c2); }
    static bool lt(char c1, char c2)
    { return toupper(c1) < toupper(c2); }

Quote:
};

then, declare a case-insensitive string:

typedef basic_string <char, ci_char_traits> ci_string;

This leaves open the question of what happens when comparing a ci_string
with a string (should it be case sensitive or not?). But you get the idea...

I did not test the code, it might not compile as is...

HTH
iuli



Wed, 05 Jan 2005 00:37:02 GMT  
 Case insensitive substring searching
You are correct on the second item.  Too much caffeine obviously :)

I was thinking about a generic algorithm that would solve the exact
case-insensitive comparision of two strings, not a substring find.

K. Lilov
Str Library at http://www.utilitycode.com/str


Quote:
>     Actually, if you check the code I posted on Wednesday, you'll see it
can
> be without copying the strings.

>     Further, your optimization will only work when you want an exact
> comparison.  The OP actually want a strstr() function (finding one string
> within another).

> --
> Truth,
> James Curran
> www.NovelTheory.com  (Personal)
> www.NJTheater.com   (Professional)
> www.aurora-inc.com   (Day job)



> > Sorry - the only way to do it is copy two strings to temporary
instances,
> > uppercase (or lowercase) them both, and compare them.

> > One very quick (and for most cases good) optimization is to compare
string
> > lengths beforehand, and not do the actual string compare if a mismatch
is
> > found.  Works like charm with ANSI and Unicode strings. (Not sure about
> MBCS
> > in some Far Eastern locales, though)

> > K. Lilov
> > Str Library at http://www.utilitycode.com/str



> > > What is the best way to implement a case-insensitive STL string?  In
> > particular, I need to do a 'strstr' equivalent...  I'm getting
configused
> as
> > to if it is possible to make a predicate to an algorithm or something
> slick
> > like
> > > that...

> > > TIA,

> > > STeve



Thu, 06 Jan 2005 23:28:17 GMT  
 
 [ 14 post ] 

 Relevant Pages 

1. search substring case insensitive

2. basic_string and case insensitive search....

3. case-insensitive strstr - my code, need help

4. Case Insensitive strstr !!

5. case insensitive string compare

6. case insensitive strstr

7. case insensitive strstr()

8. case insensitive in dbx?

9. Case insensitive

10. C/370 is case insensitive

11. case-insensitive strstr?

12. Looking for a case-insensitive version of strstr()

 

 
Powered by phpBB® Forum Software