Fastest way to search the table ( or array )

Question

prayami 0 Newbie Poster

17 Years Ago

Hi,

I got one, two dimensional array.
MyArray[1000][2].

Both the array i.e.
MyArray[0][0]........
MyArray[0][1].......

are filled with some values.
Now I got he given values for both: e.g. value_0 = 4.056 and value_1 = -201.375.

I have to search the arrayMyArray[0]...MyArray[999] and find the closest value row for
the given value. I have to compare the value in row not separate.
Means:
When MyArray[0][0] will be compared with value_0 = 4.056 then MyArray[0][1] will be
compared with value_1 = -201.375. When MyArray[1][0] will be compared with value_0 = 4.056 at the same time MyArray[1][1] will be compared with value_1 = -201.375.

Can anybody suggest me the fatest way to find the closest match.

Thanks,

c++

5 Contributors
11 Replies
176 Views
4 Days Discussion Span
Latest Post 17 Years Ago Latest Post by prayami

Ancient Dragon 5,243 Achieved Level 70

17 Years Ago

Not sure right now if this will work, but create a third array with 1000 rows that will contain the difference between the value in the array and the search value. Example:

array[0][0] = 2.00   array[0][1] = -200.00  
delta = (4.056 - 2.00) + abs(-200.375) -  abs(-200.0)) = 2.431

array[1][0] = 3.50   array[1][1] = -300.50  
delta = (4.056 - 3.50) + abs(-200.375) - abs(-300.5) = -99.569

In the above, array[0] wins because its delta is closer to 0 then array[1]'s delta.

Ancient Dragon 5,243 Achieved Level 70

17 Years Ago

>>Why create an array?
You're right -- an array isn't necessary. Just keep track of the row that contains the value nearest 0 (whether positibe or negative)

if( abs(delta) <  closest_match)
{
    closest_match = abs(delta)
    closest_row = counter; // row number of array that contains the value
}

Ancient Dragon 5,243 Achieved Level 70

17 Years Ago

I don't think there is a "best solution". Binary search won't work because you have to compare both array dimensions to determine which is the closest. You can't do that with a binary search. The only way to do it is with a linear search and compare every pair of values. Doing anything else will just consume even more time than a linear search.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

hammerhead 19 Posting Whiz in Training · Answer 1 · 2008-03-13T04:10:26+00:00

Why create an array? A simple if can do the job unless I am missing something

if (fabs(value-value_to_be_compared)<.001)
{
// match found
}

prayami 0 Newbie Poster · Answer 2 · 2008-03-13T06:53:40+00:00

Thanks for reply,

I understand the delta etc. But I am actually looking
for best approach to search the match efficiently.
My values are not in any order like ascending or descending.
e.g.
1. The simplest method is compare the all 1000
results(elements).

2. Look up table or Binary Search or Hash Table or Tree etc.

Which will be the best approach and how can I
implement it.

Thanks,

hammerhead 19 Posting Whiz in Training · Answer 3 · 2008-03-13T12:22:12+00:00

If you create two binary search trees while insertion itself then the search will become faster. If you plan to search using the matrices then you will have to compare with each element.

dougy83 74 Posting Whiz in Training · Answer 4 · 2008-03-13T15:00:03+00:00

You will likely find searching the array will be faster if you declare it as a MyArray[2][1000], this way the compiler/cache/whatever is more likely to hit.

As ancient dragon said, you'll have to search the whole array and compare all values. _Unless_ you can make a look up device that has some order to it...

You haven't stated what 'closest' row means.. is it where the sum of the error is smallest, or where the euclidean distance i.e. sqrt(column1error^2 + column2error^2) in smallest. I would expect you would be after the latter.

Heres my example:
You could make an additional array containing the lengths i.e. (column1^2 + column2^2) of each row, as well as the row that the calculated length relates to in the original MyArray[][]. This list must be ordered.

By ordering the list, you can now make assumptions about other elements in the list - allowing you to speed up the search.

So, to do a search:
1. calculate the length of the value we're searching for
2. do a speedy search in the order list of lengths for the closest length
3. use the row number that is stored with the length element in the list to find the original row data. This should be the best match... i'm thinking.

dougy83 74 Posting Whiz in Training · Answer 5 · 2008-03-14T06:44:18+00:00

oops. I may have jumped the gun in my last post. The number cannot be represented by length alone. In the ordered array, you'll have to store {rowIndex, length, angle}. Order by length (or angle), then by angle (or length).

vmanes 1,165 Posting Virtuoso · Answer 6 · 2008-03-14T09:54:04+00:00

You will likely find searching the array will be faster if you declare it as a MyArray[2][1000], this way the compiler/cache/whatever is more likely to hit.
<snip>

You've just made the cache problem worse. Now the two paired items are 1000 elements apart in memory. Are you an old FORTRAN hand?

dougy83 74 Posting Whiz in Training · Answer 7 · 2008-03-14T12:13:38+00:00

Yeah I'm an idiot. The cache should be happier before. I did say compiler/cache/whatever would be better - maybe the compiler/whatever will be better off. I think I was thinking (and I use that word weakly) that the compiler could more easily increment two pointers by 1 rather than 2 (I'm not really sure if this was what I was thinking).

Anyway, as a test, lets see which one works faster...

#include <windows.h>
#include <iostream>

#define ARRAYORDER // comment this out for second test

#ifdef ARRAYORDER
   #define arr(a,b) array[a][b]
   double array[1000][2];
#else
   #define arr(a,b) array[b][a]
   double array[2][1000];
#endif

#define sqr(x) (x)*(x)
main(){
   LARGE_INTEGER f, t1, t2;
   
   double v1 = 5.7, v2 = 6.5, minerr = 11111;
   int minerri = 0;

   for(int heapsa = 0; heapsa < 5; heapsa++){
      std::cout << "Searching...";
      QueryPerformanceCounter(&t1);
      for(int loop = 0; loop < 1000000; loop++){
         // load some values
         for(int i = 0; i < 1000; i++){
            arr(i,0) = i *.15;
            arr(i,1) = i /11;
            }
            
         v1 = 5.7;
         v2 = 6.5;
         minerr = 11111;
         minerri = 0;
         
         // find the closest
         for(int i = 0; i < 1000; i++){
            double err = sqr(arr(i,0)-v1) + sqr(arr(i,1)-v2);
            
            if(err < minerr){
               minerr = err;
               minerri = i;
               }
            }
         }
      QueryPerformanceCounter(&t2);
      QueryPerformanceFrequency(&f);
      std::cout << "done." << std::endl;
      
      std::cout << "closest num = " << arr(minerri,0) << ", " << arr(minerri,1) << "; index = " << minerri << std::endl;
      std::cout << "time = " << ((t2.QuadPart - t1.QuadPart) / (double)f.QuadPart) << " seconds" << std::endl;
      }
         
   system("pause");
   }

output with #define ARRAYORDER line included

Searching...done.
closest num = 6.6, 4; index = 44
time = 20.7871 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 19.6722 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 19.5185 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 19.9915 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 19.6517 seconds
Press any key to continue . . .

output with #define ARRAYORDER commented

Searching...done.
closest num = 6.6, 4; index = 44
time = 18.3829 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 18.327 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 18.7345 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 18.6278 seconds
Searching...done.
closest num = 6.6, 4; index = 44
time = 18.4635 seconds
Press any key to continue . . .

It does seem that the execution time is somewhat different for the two programs... my version runs faster for whatever reason; must be my FORTRAN (I don't know what you meant by that).

prayami 0 Newbie Poster · Answer 8 · 2008-03-17T03:30:55+00:00

Thanks everybody,

We just decided to add one criteria to search, so that we can have more perfect result.
We will have minimum and maximun value for the column.
e.g.

For value_0 = 4.056 the minimum_v_0 = 4.05 and maximum_v_0 = 4.057
And
value_1 = -201.375 the minimum_v_1 = -201.380 and maximum_v_1 = -201.370

Now with each element in the array:
MyArray[0][0]......MyArray[0][1000] will be checked, whether it is falling with minimum_v_0
and maximum_v_0
AND
MyArray[1][0]......MyArray[1][1000] will be checked, whether it is falling with minimum_v_1
and maximum_v_1.

Means,
If both the column elements in a particular row will have values between corresponding
minimum and maximum value for that column then that row will be selected.

Make sure here both the column values must fall within corresponding range.

Please help me to do the fastest method for reaching to the value. We may require to arrange the
existing data in some structured form and develop the search algorithm.

Thanks,