approximate nearest neighbors time complexity - algorithm

I'm reading this paper Product quantization for nearest neighbor search.
On the last row of table II page 5 it gives
the complexity
given in this table for searching the k smallest elements
is the average complexity for n >> k and when the
elements are arbitrarily ordered
which is n+klogkloglogn.
I guess we can use linear selection algorithm to get unsorted k nearest neighbors with O(n), and sort the k nearest neighbors with O(klogk), so can we have O(n+klogk) in total. But where does the loglogn term come from?
The paper gives a reference to the TAOCP book for this, but I don't have the book at hand, could anyone explain it for me?

First, Table II reports the complexity of each of the step, therefore you have to add all the terms to measure the complexity of ADC.
In the last line of the table, it is a single complexity both for SDC and ADC, which is:
n + k log k log log n
The term corresponds to the average algorithmic cost of the selection algorithm that we employ to find the k smallest values in a set of n variables, that we have copy/pasted from Donald Knuth book [25].
I don't have the book in hands to I can not check, but it sounds right.
From the authors

Related

Fast way to calculate k nearest points to point P

I can not decide the fastest way to pick the k nearest points to some point P from an n-points set. My guesses are below:
Compute the n-distance, order it and pick the k smallest values;
Compute pointwise distance and update a k-sized point stack;
Any other manners are welcome.
Getting the median is O(n) operation, thus the whole problem has minimum complexity of O(n) - compute all distances and find the kth smallest element partitioning the whole set by this threshold.
One can also work in chunks of K >> k.
The maximum of the first k distances works as a preliminary threshold: all the points further than that do not need to be considered. One will instead place all the points smaller than that into an array, and after the array size is close to K, one can use the kth element linear algorithm to re-partition the array.
Finding the smallest k elements is O(n), for any value of k. It's O(n log k) if you need the k elements to be sorted. This is the Partition Algorithm.
You're best off reading the algorithm on Wikipedia. It's quicksort, but you only need to 'recurse' on one side because the other side is guaranteed to be completely out or completely in. There are expensive tricks that guarantee O(n log n) instead of that merely being the average.

Amortized Analysis

I came across this problem while studying which asks to consider a data structure where a sequence of n operations are performed. If the kth operation has a cost of k if it is a perfect square and a cost of 1 otherwise, what is total cost of the operations and what is the amortized cost of each operation.
I am having a bit of difficulty coming up with a summation formula that provides the definition of a perfect square where I can see what the sum yields. Any thoughts/advice?
The sum of i^2 from 1 to n can be calculated as n(n+1)(2n+1)/6. I found it in a math book, can't find a simple formula online. But check out http://mathworld.wolfram.com/Sum.html, formula (6).
To calculate this sum, let n be the square root of k, rounded down. The formula is proportional to n^3 which is sqrt(k)^3 = k^(3/2). This gives an amortized time of O(k^(3/2)).

Find the i-th greatest element

I want to use a Divide-and-Conquer procedure for the computation of the i-th greatest element at a row of integers and analyze the asymptotic time complexity of the algorithm.
Algorithm ith(A,low,high){
q=partition(A,low,high);
if (high-i+1==q) return A[q];
else if (high-i+1<q) ith(A,low,q-1);
else ith(A,q+1,high);
}
Is it right? If so, how could we find its time complexity?
The time complexity is described by the following recurrence relation:
T(n)=T(n-q)+T(q-1)+Θ(n)
But how can we solve this recurrence relation, without knowing the value of q?
Or is there an algorithm with less time complexity that computes the i-th greatest element at a row of integers?
This is a variation of the quick select algorithm (which finds the i-th smallest element rather than i-thgreatest element). It has a running time of O(n^2)in the worst case, and O(n) in the average case.
To see the worst case, assume you are searching for the nth largest element, and it happens that the algorithm always picks q to be the largest element in the remaining range. so you will be calling the ith function n times. In addition the partition subroutine takes O(n) so the total running time is O(n^2).
To understand the average case analysis , check the explanation given by professor Tim Roughgarden here.

Find the optimal sorting algorithm by inversions number/Pearson's r

is it possible to find a optimal sorting algorithm with a given number of elements in a presorted sequence and a inversions number or a Pearson's r of that sequence?
For example I have a presorted sequence of 262143 elements.
The maximum amount of inversions is donated by (n(n-1))/2 where n is the number of elements in the sequence (see here page 2 for this assumption). For this example the maximum is therefor 34359345153.
Now the number of inversions of my presorted sequence is 1299203725 which is 3.78% of the maximum. My Pearson's r is 0.9941. By my understanding this should be a presorted sequence with a high "sortedness" (Please correct my if I'm wrong).
I found many references to the number of inversions and the Person's r as a way to define the "sortedness" of a sequence but I could not get some kind of comparison for which number of elements and inversions/Pearson's r which sorting algorithm is the preferred one.
Thanks for your help.
It is probably very hard to beat the worst-case O(n log n) time of traditional sorting algorithms like merge sort, if you assume that your sorting algorithm is comparison-based. I believe in order to do as good or better you would probably have to assume that the number of inversions is O(n log n), much smaller than the worst-case O(n^2). Then something like bubble sort could run as fast as O(n) time if you have O(n) inversions and you keep swapping backwards as long as an element forms an inversion with its left neighbor.

Algorithm - Find the the number of rectangles covering a given rectangle area

This is not a homework problem . Its an interview question . I am not able to come up with good solution for this problem .
Problem :
Given an n*n (bottom left(0,0) , top right(n,n)) grid and n rectangles with sides parallel to the coordinate axis. The bottom left and top right coordinates for the n rectangles are provided in the form (x1,y1)(x1',y1') .... (xn,yn)(xn',yn'). There are M queries which asks for the number of rectangles that cover a rectangle with coordinates (a,b)(c,d). How do I solve it in an efficient way ? Is there a way to precompute for all coordinate positions so that I can return the answer in O(1) .
Constraints:
1<= n <= 1000
It is straightforward to create, in O(n^4) space and O(n^5) time, a data structure that provides O(1) lookups. If M exceeds O(n^2) it might be worthwhile to do so. It also is straightforward to create, in O(n^2) space and O(n^3) time, a data structure that provides lookups in O(n) time. If M is O(n^2), that may be a better tradeoff; ie, take O(n^3) precomputation time and O(n^3) time for O(n^2) lookups at O(n) each.
For the precomputation, make an n by n array of lists. Let L_pq denote the list for cell p,q of the n by n grid. Each list contains up to n rectangles, with lists all ordered by the same relation (ie if Ri < Rj in one list, Ri < Rj in every list that pair is in). The set of lists takes time O(n^3) to compute, taken either as "for each C of n^2 cells, for each R of n rectangles, if C in R add R to L_C" or as "for each R of n rectangles, for each cell C in R, add R to L_C".
Given a query (a,b,c,d), in time O(n) count the size of the intersection of lists L_ab and L_cd. For O(1) lookups, first do the precomputation mentioned above, and then for each a,b, for each c>a and d<b, do the O(n) query mentioned above and save the result in P[a,b,c,d] where P is an appropriately large array of integers.
It is likely that an O(n^3) or perhaps O(n^2 · log n) precomputation method exists using either segment trees, range trees, or interval trees that can do queries in O(log n) time.

Resources