Recommended reading: Algorithms, second edition, by Robert Sedgewick, pages 347-414.
The following notes are based on Sedgewick.
A simple closed path is a path that visits all the vertices of a set (a path), returns to the first vertex visited (closed) and never crosses itself (simple).
One method for finding a simple, closed path is to find the lowest point in the set (call this the origin) and then for every other point, X, in the set, compute the angle between X, the origin, and a point to the right of the origin infinitely far away. We first visit the origin, then the other points in order of the value of this computed angle, and then finally return to the origin.
Instead of actually computing the angle (which requires calculating the arc-tangent which may be slow), we use Sedgewick's theta function, which has the some order property as the angle but is simpler to calculate.
To calculate theta is constant time. The running time for this algorithm is dominated by the time to sort the points, and hence is O(n log n).
The convex hull is the smallest convex polygon that includes all the points of the set. The vertices of the convex hull must be points of the set. One method to find the convex hull is the Graham Scan. The first step is to compute a simple, closed path, as in the above section. We then traverse the path, checking which way we turn (cw or ccw) at each step. If we ever make a clockwise turn, the polygon won't be convex, so we can eliminate the point at the hinge of the angle. Otherwise, that point is added (perhaps temporarily) to the convex hull. Note that when we do find a clockwise turn, we must backtrack and check that the previous point added to the convex hull does not make a clockwise turn to the new point. It it does, we remove it from the convex hull and check the previous point, and so on. Since each point is visited once and may be removed once from the convex hull, the running time for this algorithm is the running time to find the simple, closed path, plus linear time.
An alternative approach to finding the convex hull is package wrapping. We think of this as having a ray that starts at the newly added point of the convex hull which is rotated counter-clockwise until it hits some other point of the set. That point is added to the convex hull and the step is repeated. We start at the lowest point, with the ray going off to the right and we end when we return to the starting point. To implement this, we note that we don't have to check every angle of the ray (as there are an infinite number), but simply to check the angle from each new point to the current point on the convex hull and then going off to the right. The algorithm is coded by storing the current point of the hull and the value of the current angle. We then iterate through all other points of the set that are not known to be on the convex hull, compute the angle to the current point, ignore those that are less than the current angle, and find the minimum of the remainder. The point with the minimum such angle is the next point of the convex hull and the angle to that point becomes the current angle. We can again use the theta function for this calculation. The time of this algorithm is O(M*N) where M is the number of points that are on the convex hull.
The problem is to find the two closest points in a set of points. One solution is to use a divide-and-conquer approach. The main step is to break the points into two equal sets (by their x-values). We recurse and find the closest pair in each set. Say that delta1 is the distance between the closest pair in the left set, delta2 is the distance for the right set, and delta is the minimum of delta1 and delta2. Either one of these pairs of points is the closest for the whole set, or there are points that straddle the border between the sets that form the closest pair. We first note that such points must be within delta of the border, and so we only have to check points along a vertical strip along the border which extends delta-wide to either side. If we consider one point in this strip on the left side of the border, we note that the point on the right side cannot be higher than delta above the point, nor lower than delta below it. Thus it must be within a box that is 2*delta high and delta wide that is on the right-hand side of the border. But how many points can be in this box (and hence candidates for checking against the point on the left)? If we draw the picture, we see that we can put at must 6 points in such a box (the four corners and the middle of the vertical sides). Thus, each point on the border must be compared to at most 6 points on the other side.
If T(n) is the number of comparisons that it takes to compute that closest pair, then we can use the following formulas to find the exact running time.
T(2) = 0 (1) T(n) = 2*T(n/2) + c*n (2)
That is, if there are only two points, no comparisons are necessary. If there are more than two points, then we solve the problem for two subproblems of half the size (so 2*T(n/2)), but we also must divided the points into two equal sets (which takes linear time if we sort all the points by their x-coordinates first), and check the points on the strip against the points on other side (at most 6). Aside from the sort, which is done as a pre-processing step and take O(n log n) time, everything else takes linear time (thus c*n, for some constant c).
Equations of this form, where T for n is defined in terms of T for smaller quantities, are called recurrence relations.
We will use a simple method for solving them using this one as an example. Since we know what T(2) is, we will write T(n) in terms of T(m) for smaller and smaller m. In particular, we will first find expressions for T(n/2), then T(n/4) and T(n/8) and use them to re-write T(n). We generalize that equation and use it to write T(n) in terms of T(2) which is known. In the following, we assume for the moment that n is a power of 2.
By substituting n/2 for n, and then n/4 for n, and n/8 for n in (2), we get:
T(n/2) = 2*T(n/4) + c*(n/2) (3) T(n/4) = 2*T(n/8) + c*(n/4) (4) T(n/8) = 2*T(n/16) * c*(n/8) (5)So, we have
T(n) = 2*T(n/2) + c*n T(n) = 2*[2*T(n/4) + c*(n/2)] + c*n = 4*T[(n/4)] + 2*c*nby substituting the right-hand side of (3) for the left-hand side of (3) in (2).
Using the other equations, (4) and (4), we get:
T(n) = 4*[2*T(n/8) + c*(n/4)] + 2*c*n = 8*T(n/8) + 3*c*n T(n) = 8*[2*T(n/16) * c*(n/8)] + 3*c*n = 16*T(n/16) + 4*c*nWe can generalize these four equations by introducing a new parameter, i, which goes from 1 to 4.
T(n) = 2^i * T(n/2^i) + i*c*n (6)We know what T(2) is, so we can write T(n) in terms of T(2), by finding an i such that n/2^i = 2. Solving this equation yields i=log2(n)-1. Substituting this back for i into equation (6) and simplifying yields,
T(n) = n/2 * T(2) * c*n*(log2(n)-1), or T(n) = c*n*(log2(n)-1)and hence the solution is O(n log n).