Loop with a binary search for same values

Question

mixelplik 0 Newbie Poster

10 Years Ago

How can I loop through a data field with a binary search, looking for multiple values? Please, I don't want to know another, more efficient way, I want to do it this way to understand the concepts. Also my data is indeed sorted before the call is made. The data are all of type double and I'm asking for an int (97 here) It works, kinda. I keep getting numbers that fall outside the range 93, 92, 90 for example. Also, my target values lie smack in the middle of the data, so the midpoint, and 4 or 5 values above and below it are also 97. Any help would be appreciated.

// Loop calling the search
int j = bSearch(ave, count, 97.0);
int i = bSearch(ave, count, 97.0);
int k;
while(k != -1 )
{
    k = bSearch(ave, j, 97);
    cout << j << " :"<< id[j] << " " << ave[j] << endl;
    cout << i << ":"<< id[i] << " " << ave[i] << endl;
    j--;
    i++;
}

// search (very standard)
int bSearch(double array[], int size, double value)
{
    int first = 0;
    int last = size - 1;
    int middle = (first + last)/2;
    bool found = false;
    int pos = -1;

    while(!found && first <= last)
    {
        middle = (first + last)/2;

        if(array[middle] == value)
        {
            pos = middle;
            found = true;
        }
        else if(array[middle] > value)
            first = middle + 1;
        else
            last = middle - 1;

    }
    return pos;
}

Also, when I actually search for the decimal is drops a steamer on me, somehow only returning a couple values out of bounds. ACTUALLY this isn't working at all. When I ask for the exact decimal value it says not found, and when I ask for a value that appears only once ex 93 or 93.1754 (the exact value. But when I search for a value that there's like a dozen of it finds them all, and then throws a couple extras for fun. I am about to chuck my computer through a wall.

c++

2 Contributors
4 Replies
1K Views
3 Hours Discussion Span
Latest Post 10 Years Ago Latest Post by mixelplik

All 4 Replies

mike_2000_17 2,669 21st Century Viking

10 Years Ago

First of all, from your algorithm, it seems that the array is sorted from greatest value to lowest value, right? Otherwise your if(array[middle] > value) has the wrong < / > sign.

What seems bizarre with your binary-search algorithm is that it doesn't seem to allow for or detect the possibility that the number does not exist anywhere in the range. First, a good binary search algorithm should detect that (and return an error). Second, this is usually indicative of errors in the logic of algorithm.

You have to understand that the iterations of a binary search algorithm always assumes that (1) the sought value is within the bounds (first,last), and possibly, (2) that the bounds have been checked. And generally, if you inforce those assumptions, you will naturally detect error situations, and avoid other special problems.

For example, when you change your bounds, with first = middle + 1; and last = middle - 1;, how do you know that you are not actually increasing the interval, or that you are moving outside the array? If the sought value does not exist in the array, that can happen with this method.

Here is an alternative implementation:

// search (very standard)
int bSearch(double array[], int size, double value)
{
  int first = 0;
  int last = size - 1;

  // first, check the interval:
  if( ( array[first] < value ) || 
      ( array[last]  > value ) )
    return -1;  // "not found"

  // then, check the bounds:
  if( array[first] == value )
    return first;
  if( array[last]  == value )
    return last;

  // loop while there is at least 
  //  one unchecked element in the interval:
  while(first + 1 < last)
  {
    int middle = (first + last) / 2;

    if(array[middle] == value)
      return middle;
    else if(array[middle] > value)
      first = middle;
    else
      last  = middle;

  }
  return -1;  // "not found"
}

The point here is that by setting first = middle; and last = middle;, I guarantee that the interval always shrinks. And because middle is checked (for equality with the value), setting one of the boundary to it does not break the assumption of "checked boundaries". And, of course, the loop only needs to continue as long as there is still at least one unchecked element squeezed between the two checked

Finally, I don't know how you get the numbers that populate your array, but you should always be careful about comparing floating-point values with an equality (==), it is very rare, in general, that floating point values are exactly the same, i.e., they could differ by a tiny amount (like 1e-15). Typically, you should to a "within a tolerance" test for equality:

if( std::fabs(array[middle] - value) < 1e-6 )

Or better, use a tolerance variable instead of 1e-6.

Missing the nearly-equal case is often the source for crazy behaviors.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mixelplik 0 Newbie Poster · Answer 1 · 2013-10-20T23:27:16+00:00

Wow, there is just so much helpful stuff in here. Thank you.

Also, my main question, how do you check for and report back dulplicate values in your array. Say you had an array of test scores and a bunch were 97, or 86, and you (well) I have a 2 parallel arrays that hold an ID and a corresponding score. Even using int(s) across the board I keep getting weird stuff.

I don't need code or anything just an idea of how to approach it. Though I will most definatley be implimenting your suggestions, they're great and exactly how I want to be coding. I found the testing for tolerance especially insightful.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 2 · 2013-10-20T23:44:45+00:00

The general approach to find the "range" of elements of the same value is to just look for the lower-bound and upper-bound elements. Like the standard functions std::lower_bound and std::upper_bound. The lower-bound is the first element that is not less-than the value. The upper-bound is the first element for which the value is less-than that element. In other words, if the array does not contain the value you are looking for, then those two elements will be the same, i.e., have a distance of 0 between them. And if the value is in the array, then you will know how many there are.

Modifying the binary-search algorithm to get either one of these bounds is very straight-forward, as shown in the reference pages. Also notice that the "lower-bound" element is also the first equal element, if there is one, meaning that it can be used as the primary algorithm too.

Generally, this is the way you would do it. However, another simple solution is to find an element equal to the value you are looking for, and then, see how many adjacent elements have the same value.

mixelplik 0 Newbie Poster · Answer 3 · 2013-10-21T00:04:03+00:00

mixelplik 0 Newbie Poster

10 Years Ago

Fantastic! Thank you, just what I needed.

Loop with a binary search for same values

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers