make subset of the search result 
Author Message
 make subset of the search result

Hi, all

I 've seen some web search engine( like google ) that  list search result
with a subset of the data ( like the first 30 items ) then have a button or
link to get more of the search hits..  but i don't know how to accomplish
it in perl..  can someone give me an idea or some .. since i know there are
always another way (for better ) to do .. :-)  

thanks in advance !

Qiang

P.S  prefer not to creat additional files to store the subset of search
result .. or if i have to , still love to hear it . excuse my picky ...



Sat, 15 May 2004 03:10:04 GMT  
 make subset of the search result

Quote:

>I 've seen some web search engine( like google ) that  list search result
>with a subset of the data ( like the first 30 items ) then have a button or
>link to get more of the search hits..  but i don't know how to accomplish
>it in perl..  can someone give me an idea or some .. since i know there are
>always another way (for better ) to do .. :-)  

The simplest way is to simply repeat the search every time your CGI (or
whatever) runs and show different parts of the results.  Your "next 10
results" link would then pass a parameter to do the same query but
display a different part of it.

This may seem inefficient, but often searches are done against a
database, and some databases support a way of limiting query results.
For instance, MySQL allows you to do things like this:

    select foo, bar
        from mytable
        where foo like "%mystring%"
        limit 20,10

Here the "limit 20,10" will make it return only rows 21 through 30.

If you're doing that, you'll probably also want to do something
like this

    select count(*)
        from mytable
        where foo like "%mystring%"

so that you actually know how many results there are.

You could get even fancier with the database-oriented approach by
creating a table in which to store temporary search results; you could
then have a bot that goes through and cleans old entries out of this
table every few minutes.

The other approach is to do the whole query all at once and store the
extra results somewhere.  Storing them in a temporary file is not a
half bad idea, although you end up having to parse the temporary file
every time.  In some cases, that won't actually speed anything up.  You
could write a daemon to store temporary search results (and maybe even
to do the search for you) and allow your CGI (etc.) to retrieve them
through some sort of inter-process communication.  Again, only in some
cases would that be a good idea.

  - Logan
--
"In order to be prepared to hope in what does not deceive,
 we must first lose hope in everything that deceives."

                                          Georges Bernanos



Sat, 15 May 2004 04:48:24 GMT  
 make subset of the search result

[snip]

Quote:
> The other approach is to do the whole query all at once and store the
> extra results somewhere.  Storing them in a temporary file is not a
> half bad idea, although you end up having to parse the temporary file
> every time.  In some cases, that won't actually speed anything up.  You
> could write a daemon to store temporary search results (and maybe even
> to do the search for you) and allow your CGI (etc.) to retrieve them
> through some sort of inter-process communication.  Again, only in some
> cases would that be a good idea.

>   - Logan

wow, A lot of  helpful information... thanks Logan
hmm, above approach sounds need lots HD space..or maybe i can delete the
tmp file using a cron job.  but i am insterested in it.  
apart from this , i am thinking using search indexer .here is a sketch.
please feel free to give me advice.
1. index and save all text of file and each text has the corresponding file
names which contains the text under it.
2. during the search, get the  files list which match user keyword from
index file ,then search the files and return result.. detail i will follow
logan's idea.


Sat, 15 May 2004 06:35:12 GMT  
 make subset of the search result

Quote:


>> The other approach is to do the whole query all at once and store the
>> extra results somewhere.  Storing them in a temporary file is not a
>> half bad idea, although you end up having to parse the temporary file
>> every time.  In some cases, that won't actually speed anything up.
>>only in some cases would that be a good idea.
>hmm, above approach sounds need lots HD space..or maybe i can delete the
>tmp file using a cron job.  but i am insterested in it.  

You could store only the 20 or so most recently used results, using a
LRU algorithm (as common in virtual memory / disk cache managers). That
would most definitely keep some requirements under control.

You could also expire the search and discard the search result if people
didn't ask for it again within, say, 5 minutes. (time-out)

In addition I'd like to add the people usually aren't interested in
browsing through more than, say, 5 pages, before giving up and refining
the search. So if you show 10 results per page, you needn't store more
than the first 50 or so. There most definitely isn't any need at all to
store thousands.

As Logan said: starting to search again from scratch might well be
faster than refining the search based upon stored results. The chance of
introducing bugs is smaller that way, too.

--
        Bart.



Sat, 15 May 2004 13:07:53 GMT  
 make subset of the search result

Quote:
> You could store only the 20 or so most recently used results, using a
> LRU algorithm (as common in virtual memory / disk cache managers). That
> would most definitely keep some requirements under control.

i am sure this way searching will be faster ..

Quote:
> You could also expire the search and discard the search result if people
> didn't ask for it again within, say, 5 minutes. (time-out)

I think LFU algorithm can be invited to do thing like that..

Quote:
> As Logan said: starting to search again from scratch might well be
> faster than refining the search based upon stored results.

why ?  in general searching from old temp  is faster than searching from
scratch, isn't it .


Sat, 15 May 2004 20:58:16 GMT  
 make subset of the search result

Quote:

> I 've seen some web search engine( like google ) that  list search
> result with a subset of the data ( like the first 30 items ) then
> have a button or link to get more of the search hits..  but i don't
> know how to accomplish it in perl..  can someone give me an idea or
> some .. since i know there are always another way (for better ) to
> do it .. :-)

So, you're running the search result in your cgi script, storing it
somewhere (an array?), and displaying some results.

Pass along an additional parameter in that button at the botton that
tells which 'n' items to print from the array.  (No page # == print
1-30, page #2 == print 31-60, etc.)

Unless I'm missing something...

Arne

(I suppose re-running the search is not optimal, but there's always
that processing/memory tradeoff...)
--
Arne Jamtgaard
Boulder DevTest
1-720-562-6331



Sun, 16 May 2004 01:27:37 GMT  
 make subset of the search result

Quote:

>> As Logan said: starting to search again from scratch might well be
>> faster than refining the search based upon stored results.
>why ?  in general searching from old temp  is faster than searching from
>scratch, isn't it .

Not necessarily, because in saving the temp data, and loading it in
afterwards, you've got a lot of extra overhead. Databases are likely
optimised for fast searches, there's no writing to files, not all of the
results need even be returned if only a small part of them would be
used.

But if you save a large list of results, *all* of it must be retrieved,
formatted and saved to a text file. Afterwards, in the next run, you
must read it back and interpret it, before you can even do your search
refinement. A lot of extra work, with doubtful return.

--
        Bart.



Mon, 17 May 2004 05:16:43 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Search results problem is making me crazy

2. Remote Search Engine Results

3. Programmatically Controlling Remote Search Engines and Consolidating Results

4. Retrieving Links from a Search Engine's Results Page

5. -----How to deal with multiple search results??

6. Flatfile Database Search - Displaying Results Query

7. Unexpected result with LOOK.PL binary search

8. pics with search engine results?

9. search engine results with pic?

10. Maximum Number of search results (Perl LDAP, Exchange 5.5)

11. Help: Display Search Result

12. HELP: Creating Search Results List

 

 
Powered by phpBB® Forum Software