Why wordpos is better then binary search?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Why wordpos is better then binary search?

Oleg A. Kulikov

Hi NetRexxers!
Producing an HTML scanner & editor program using NetRexx I had to choose
between three opportunities for storing Web-names as:

1. a simple non-ordered list of a names;    => wordpos method as an access
tool;
2. an ordered list of a names;                      => BinSearch method as
an access tool;
3. an ordered indexed list of a names;       => ABinSearch method as an
access tool;

Very simple and short variants of a BinSearch are given below.
I have tested three methods mentioned above using a list of 129
Web-names which were stored in my MS IE 4.0 cache. The number of
requests varied from 100 to 4000 and consisted of an equal parts of
a words from the Web-names list and from the words absent in the Web-names
list. The results are: WORDPOS is always the best, BinSearch for an
indexed
strings is always the second and BinSearch for nonindexed strings - the
last.
All three methods asymptotically tends to give the same timing.

It would be natural to assume according to these results that a Rexx
string
class is an ordered collection of the words and wordpos is an efficiently
realized binary search. On the other hand absence of a SORT method in
the Rexx string class contradicts to this assumption.

So would anybody say where an inefficiency of my ABinSearch comparing
wordpos lies? ( As for BinSearch - subword is time-wasting for sure ).
( I suppose also that s[i] operation is TIMES FASTER then wordpos ).

-- Duplicates are absent in the argument lists
-- w  - input word which presense in the list is checked.
-- s  - input ordered list of words
-- wp - current word of the list
-- nl - number of the words on the left including wp
-- nr - number of the words on the right of wp
-- non-strict compare is used to encrease speed.

class test binary public

    method BinSearch( w = Rexx, s = Rexx ) public static returns boolean
       If ( s = "" ) | ( w = "" ) Then return boolean 0

        n  = int s.words()      
       nl = int ( n + 1 ) % 2  
       nr = int n - nl        
       wp = s.word( nl )

       Loop forever
           if ( w = wp )       Then return boolean 1
           if ( w < wp )  Then
           Do
                if ( nl <= 1 ) Then return boolean 0
                s  = s.subword( 1, nl - 1 )
                n  = nl - 1
                nl = nl % 2
           End
           Else
           Do
                if ( nr = n ) -
                |  ( nr = 0 )  Then return boolean 0
                s  = s.subword( nl + 1 )
                n  = nr
                nl = ( nr + 1 ) % 2
           End
           nr = n - nl
           wp = s.word( nl )
       End


method ABinSearch( w = Rexx, s = Rexx ) public static returns boolean
       If ( w = "" ) Then return boolean 0

       n  = s[0]              
       nl = ( n + 1 ) % 2      
       nr = n - nl            
       wp = s[nl]

       Loop forever
           if ( w = wp )       Then return boolean 1
           if ( w < wp )  Then
           Do
                if ( nl <= 1 ) Then return boolean 0
                n  = nl - 1
                nl = ( n + 1 ) % 2
           End
           Else
           Do
                if ( nr = n ) -
                |  ( nr = 0 )  Then return boolean 0
                nl = nl + ( nr + 1 ) % 2
           End
           nr = n  - nl
           wp = s[nl]
       End

Sorry for the poor English and too long novice question,
Oleg Kulikov,
PIE SI,
[hidden email],
[hidden email]
.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To unsubscribe from this mailing list ( ibm-netrexx ), please send a note to
[hidden email]
with the following message in the body of the note
unsubscribe ibm-netrexx <e-mail address>

Reply | Threaded
Open this post in threaded view
|

RE: Why wordpos is better then binary search?

Patrick McPhee-3
> Oleg A. Kulikov[SMTP:[hidden email]] wrote
>
[...]

        Producing an HTML scanner & editor program using NetRexx I had
to choose
        between three opportunities for storing Web-names as:

        1. a simple non-ordered list of a names;    => wordpos method as
an access
        tool;
        2. an ordered list of a names;                      => BinSearch
method as
        an access tool;
        3. an ordered indexed list of a names;       => ABinSearch
method as an
        access tool;

[...]

        The results are: WORDPOS is always the best, BinSearch for an
indexed strings is always the second and BinSearch for nonindexed
strings - the last.

[...]

        It would be natural to assume according to these results that a
Rexx
        string class is an ordered collection of the words and wordpos
is an efficiently
        realized binary search. On the other hand absence of a SORT
method in
        the Rexx string class contradicts to this assumption.

It never pays to assume to much. word() will be relatively fast for
positions close to the beginning of the string, but it will be slow for
words near the end of the string, since it has to do a linear search
through the string for word delimiters. The trouble with ABinSearch is
that it uses the Rexx associative array mechanism the way you would use
a normal array, and it does arithmetic with variables of type Rexx,
which is relatively slow. If you put
 options binary
at the top of your .nrx file, change the line that assigns s[0] to n:
  n = int s[0]
and make the s argument of ABinSearch to be of type rexx[], then
ABinSearch will be faster than using word().

However, NetRexx does provide a better way of doing this. You can use
the associative array mechanism of class Rexx like an associative array:
  values = 0
  values["potato"] = 1
  values["tomato"] = 1

   if values["potato"] = 1 | values["tomato"] = 1 then
letscallthewholethingoff()

This should be the fastest way to get the result you're after.

--
Patrick TJ McPhee
DataMirror Corporation
[hidden email]


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To unsubscribe from this mailing list ( ibm-netrexx ), please send a note to
[hidden email]
with the following message in the body of the note
unsubscribe ibm-netrexx <e-mail address>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Why wordpos is better then binary search?

Oleg A. Kulikov
In reply to this post by Oleg A. Kulikov

 ----
From: Patrick McPhee <[hidden email]>
To: [hidden email]; 'Oleg A. Kulikov' <[hidden email]>
Date: 16 сентября 1997 г. 19:05
Subject: RE: Why wordpos is better then binary search?

>> Oleg A. Kulikov[SMTP:[hidden email]] wrote
>>
>
>It never pays to assume to much. word() will be relatively fast for
>positions close to the beginning of the string, but it will be slow for

But nevertheless it can't be slower then wordpos for the words absent in
the list!

>words near the end of the string, since it has to do a linear search
>through the string for word delimiters. The trouble with ABinSearch is
>that it uses the Rexx associative array mechanism the way you would use
>a normal array, and it does arithmetic with variables of type Rexx,

I've tested this program with int variables in ABinSearch: it results in a
WORSE timing due to
conversions when accessing  s[i].

>which is relatively slow. If you put
> options binary

I've placed "binary" into class definition and I guess it is the same as
"options binary".

>at the top of your .nrx file, change the line that assigns s[0] to n:
>  n = int s[0]
>and make the s argument of ABinSearch to be of type rexx[], then

It is impossible to use "rexx[]" because of  name list is growing in the
caller method.

>ABinSearch will be faster than using word().
>
>However, NetRexx does provide a better way of doing this. You can use
>the associative array mechanism of class Rexx like an associative array:
>  values = 0
>  values["potato"] = 1
>  values["tomato"] = 1
>
>   if values["potato"] = 1 | values["tomato"] = 1 then

The list I use in ABinSearch contains an indexes which I use further in the
same manner. But it is
said in NetRexx document that "loop  i over" provide an unpredictable order
of an index values and I
need full control.

>letscallthewholethingoff()
>
>This should be the fastest way to get the result you're after.
>
>--
>Patrick TJ McPhee
>DataMirror Corporation
>[hidden email]
>

Thanks for remarks ,
O'Kulikov.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To unsubscribe from this mailing list ( ibm-netrexx ), please send a note to
[hidden email]
with the following message in the body of the note
unsubscribe ibm-netrexx <e-mail address>

Reply | Threaded
Open this post in threaded view
|

Re: Why wordpos is better then binary search?

Oleg A. Kulikov
In reply to this post by Oleg A. Kulikov
>>Patrick McPhee  <[hidden email]>  wrote:

[...]