Hi there,

I am stuck with a problem, and don't know where to start. I need to calculate the closest pair of points on the Earth using a divide and conquer method. I know I should calculate the orthodromic distance (or in other words the great-circle distance) but I don't know what to do. This calculation has to last at most nlog(n). I found this:
"The problem can be solved in O(n log n) time using the recursive divide and conquer approach, e.g., as follows:[1]
Sort points according to their x-coordinates.
Split the set of points into two equal-sized subsets by a vertical line x=xmid.
Solve the problem recursively in the left and right subsets. This yields the left-side and right-side minimum distances dLmin and dRmin, respectively.
Find the minimal distance dLRmin among the set of pairs of points in which one point lies on the left of the dividing vertical and the other point lies to the right.
The final answer is the minimum among dLmin, dRmin, and dLRmin."
But this algorithm is for (x,y) points. I have to calculate (x,y,z) points on the Earth. I have to cin>> longitude and cin>>latitude. Then do some calculations and return two points with a minimal distance. What can I do? Do you have any suggestions or ideas?
PS. I'm not a native English speaker, so please don't correct my mistakes. I just want to communicate with you. I don't care, if there are mistakes.

Your description is a little confusing. I see you mentioning (x,y,z) calculations, but you say you are inputling the latitude and longitude, which I assume is x and y. What is z? Altitude? You say that the points are "on the Earth", which would imply there is no altitude, so again what is z? You make it sound like you have it solved for (x,y), but adding the new dimension is where you are stuck.

Dealing with differences using latitude and longitude is more complicated than dealing with a nice flat grid as far as the math goes. Not only are the longitudinal lines not parallel, but they wrap around, so consider that when you split everything into two pieces and decide what is "less than" something else. You are also dealing with arcs rather than straight lines.

Pencil and paper first, figure out the geometry part. Then write a function that takes two points and returns the distance between them, then figure out how to "sort" your values (again, things wrap around, so the distance between -179 and +179 longitudinally is 2 degrees, not 358 degrees, so keep that in mind when "sorting" and calculating).

You say you're looking for a starting point. If it was me, I'd write the whole C++ program, get it perfect using a BRUTE FORCE (n squared) method of calculating the distance between every single pair and taking the minimum first, and then and only then change that Brute-Force method into a more efficient Divide-And-Conquer (nlogn) method, which is your actual question, right? Makes it easier to convert and to help once we see the brute force code.

And once again, what is z?

Oh, excuse me. I wasn't sure what should I do. You're right. I need to calculate (x,y) points and do it on the paper. Then write a c++ code for brute force (which won't be accepted by my university) and after that write a c++ code for divide and conquer algorithm.
Thank you very much.

Before we start, this isn't particular hard problem to solve once you get a few things out of the way.

First, how to calculate distance. You need to know the standard method which is https://en.wikipedia.org/wiki/Haversine_formula
This forumula is on the web in almost every language so I'll not write more about this.

Second, how to iterate throw the set of points. Since we want to find the two points closest to each other, that's pretty easy as iterations go. Ready?
TWO LOOPS. In psuedo code:

read in the dataset to our position array.lat, array.long.
init min_distance to a very large float number.
declare x, y to int type.
declare distance to float.

for x=0, x<sizeof(array),x++)
 for y=0, y<sizeof(array),y++)
   if x#y 
     distance = haversincedistance(array(x), array(y))
     if distance < min_distance 
             savex = x;
             savey = y;
             min_distance = distance;
      } end of loop x
   } end of loop y
cout the answers.

Unsure why we would divide and conquer this one.

I need to make divide and conquer algorithm because my university professor won't accept the brute force answers. It is really about knowing the divide and conquer algorithm and using it correctly.
Thank you, rproffitt.

Who said my method is brute force? It's from a very real world hunk of code we use on GPS trackers.

You may of heard of us. "Flowers By Irene." (not really but I'm not lying that we have this code in a real world app.)

If nothing else I have given you the lead on how to get the distance.

Who said my method is brute force?

It's brute force in that it makes no attempt at taking advantage of some facts of geometry and calculates the distance of every single pair of points.

If you partition the globe into sections, let's arbitrarily say 16 sections for a certain iteration, labeled A through P, you could compare all the points WITHIN section A and WITHIN section B, C, D, etcetera, through P, and get the minimum distance so far. Let's say that's 100 kilometers. We're not quite done yet because sections A and B could touch each other and you could have a point in A and a point in B, the distance between them being less than 100 km. However if you design your partitions correctly, you'll be able to take some partition pairs, let's say A and F, and with some geometry, prove that any point in A is at least X distance from any point in F. If X is greater than or equal to 100 km, don't bother comparing any point in A to any point in F. Let's say A and F each contain 1000 points. That's roughly 1000 times 1000 = 1 million distance calculations that you do not have to make since you know none will be smaller than 100 km.

I imagine the Divide-And-Conquer algorithm will involve growing or shrinking partitions, chosen well to take advantage of geometrical theorems regarding distance in order to avoid many pairwise comparisons.

We did have a PHD try to optimize this routine and in every case it took longer than running through the set.

Here's the thing. Since our company actually does this IRL (in real life) I'll share the solution.

We did have a PHD try to optimize this routine and in every case it took longer than running through the set.

That doesn't surprise me. Very often people get over-infatuated with achieving a lower Big-O value and forget that that doesn't always correspond to increased efficiency IRL. Sometimes brute force is a good thing.

PS. Normally the divide and conquer noted at http://www.geeksforgeeks.org/closest-pair-of-points/ would have worked but in our GPS tracker the points were in Lat/Long rather than on a square grid.

Since this is an academic exercise, I think you need to ignore me here as we went through this a few too many times with the PHD never delivering better than the old tried and true system. My bet is today's CPU with FPU builtin made the costs of computing the distance cheap.

I wrote something like this:

void function(){
    if(number_of_points==2) {
        do calculations and remember the answer (in vector)
    else if(number_of_points==3) {
        do the calculations with all three points
        remember the minimum answer of the distance(in vector)
    else {
        divide the array of objects into 2 subarrays with (n/2) elements
        calling the function() on the first subarray
        calling the function() on the second subarray

// in main():
Then find a minimum in vector and return it as the answer.
It works correctly for tiny tests. The answer of some of big tests is incorrect. Did I do something wrong?

I reiterate my earlier posts. You should get everything working perfectly using the brute force method of comparing every point pair and finding the minimum. Then and only then, when it works for the tiny tests, the big tests, and everything you can throw at it to break it, should you attempt the divide-and-conquer part of the problem.

Why do I suggest this approach? I believe that this is dividing and conquering your PROGRAM. Consider some possibilities that could cause your program to fail...

  1. An error in accepting input.
  2. An error in parameter passing or harnessing the divide-and-conquer function's results.
  3. A storage error or an error accessing that storage (ie the vector).
  4. An error displaying the result.
  5. An error in the divide-and-conquer function.
  6. An error in the function computing the distance between points.
  7. Bad test data.
  8. Add any other possibilities to the list.

This thread states that you are most concerned with #5. It is, however, possible that you do #5 perfectly and still get an error due to a bug somewhere other than #5. The natural inclination would be to assume that the problem was in the divide-and-conquer part, which is the part you are least confident in and "debug" that, even if there is no bug there.

It is essential when debugging in an area you have little confidence in (ie the divide-and-conquer part) to narrow down the other possibilities as much as possible (ie get the parts of the program you KNOW how to do perfect, and confirm that they are perfect) so that you don't have to chase down non-existent bugs. If you get the brute force program perfect, that allows you to focus with confidence on #5, knowing that is where the problem is.

Getting the brute force program perffect also removes the need for pseudocode in your posts here. In your pseudocode , I see a function called "function" (perhaps rename this function to something descriptive?) that takes no parameters and returns nothing. I'm thus focusing in on point 2 on the list above. There's also too much meat missing from your pseudocode and how you are calling it to make an intelligent guess as to what the problem might be. In addition, you mention that some tests work, some tests fail, but give us no details on what data works, what data fails, the actual results, and the correct results, making it impossible to replicate. If you post the test data, the entire brute force program that works, and the divide and conquer function that doesn't work, it's easy for us to quickly replicate and zone in on the problem. We cannot do that with what you posted.

commented: I'm reminded that "Clever is the enemy of clear." +12