Optimal percentile root algorithm problem?

Optimal percentile root algorithm problem? - algorithm

I have a non-decreasing function f such that f is always in [0,1] with f(0)=0 and we know y such that f(y)=1.
I need the most optimal algorithm that finds a set x1,…,xm in [0,y] such that f(xk) is in [(k-1)/(m+1),k/(m+1)].
This should work for any m selected as a hyperparameter, for background this is being used to find percentiles on a very complex CDF.
I thought about using a bisection method for each xi but there must be a more efficient way that incorporates all the information.
Bonus : the function computes such that run{f,(x1,…,xn)} is O(n+1) i.e as n gets large it is asymptotically 2 x faster per element executed than just executing with 1 element. If this can be incorporated then great!

Related

Find minimum steps to convert all elements to zero

You are given an array of positive integers of size N. You can choose any positive number x such that x<=max(Array) and subtract it from all elements of the array greater than and equal to x.
This operation has a cost A[i]-x for A[i]>=x. The total cost for a particular step is the
sum(A[i]-x). A step is only valid if the sum(A[i]-x) is less than or equal to a given number K.
For all the valid steps find the minimum number of steps to make all elements of the array zero.
0<=i<10^5
0<=x<=10^5
0<k<10^5
Can anybody help me with any approach? DP will not work due to high constraints.

Just some general exploratory thoughts.
First, there should be a constraint on N. If N is 3, this is much easier than if it is 100. The naive brute force approach is going to be O(k^N)
Next, you are right that DP will not work with these constraints.
For a greedy approach, I would want to minimize the number of distinct non-zero values, and not maximize how much I took. Our worst case approach is take out the largest each time, for N steps. If you can get 2 pairs of entries to both match, then that shortened our approach.
The obvious thing to try if you can is an A* search. However that requires a LOWER bound (not upper). The best naive lower bound that I can see is ceil(log_2(count_distinct_values)). Unless you're incredibly lucky and the problem can be solved that quickly, this is unlikely to narrow your search enough to be helpful.
I'm curious what trick makes this problem actually doable.
I do have an idea. But it is going to take some thought to make it work. Naively we want to take each choice for x and explore the paths that way. And this is a problem because there are 10^5 choices for x. After 2 choices we have a problem, and after 3 we are definitely not going to be able to do it.
BUT instead consider the possible orders of the array elements (with ties both possible and encouraged) and the resulting inequalities on the range of choices that could have been made. And now instead of having to store a 10^5 choices of x we only need store the distinct orderings we get, and what inequalities there are on the range of choices that get us there. As long as N < 10, the number of weak orderings is something that we can deal with if we're clever.
It would take a bunch of work to flesh out this idea though.

I may be totally wrong, and if so, please tell me and I'm going to delete my thoughts: maybe there is an opportunity if we translate the problem into another form?
You are given an array A of positive integers of size N.
Calculate the histogram H of this array.
The highest populated slot of this histogram has index m ( == max(A)).
Find the shortest sequence of selections of x for:
Select an index x <= m which satisfies sum(H[i]*(i-x)) <= K for i = x+1 .. m (search for suitable x starts from m down)
Add H[x .. m] to H[0 .. m-x]
Set the new m as the highest populated index in H[0 .. x-1] (we ignore everything from H[x] up)
Repeat until m == 0
If there is only a "good" but not optimal solution sought for, I could imagine that some kind of spectral analysis of H could hint towards favorable x selections so that maxima in the histogram pile upon other maxima in the reduction step.

Determine the value of M and does M depends on k?

Here is an exercise I'm struggling with:
One way to improve the performance of QuickSort is to switch to
InsertionSort when a subfile has <= M elements instead of recursively calling itself.
Implement a recursive QuickSort with a cutoff to InsertionSort for subfiles with M or less elements. Empirically determine the value of M for which it performs fewest key comparisons on inputs of 60000 random natural numbers less than K for K = 10,100,1000, 10000, 100000, 1000000. Does the optimal value M depend on K?
My issues:
I would like to know whether the value of M differs from statement 1 and statement 3. If so, what would be the array size, and how to vary the random numbers ? How to compare M and K? Do i have any mathematical equation or i should it just do it using my code ?

Implement the sort algoritm as requested.
Add support for recording the number of comparisons (e.g. increment a global)
Generate 5 sets of input data for each k. So 30 files with 1,800,000 lines in total.
Run the sort on every set for every K and guess M a couple of times. Start with the low-valued inputs and make the favorable M guide your guesses as you progress towards high-valued inputs.
Describe your observations about the influence of M over K.
Pass the exercise like a pro

Algorithm for finding all combinations of (x,y,z,j) that satisfy w+x = y+j, where w,x,y,j are integers between -N...N inclusive

I'm working on a problem that requires an array (dA[j], j=-N..N) to be calculated from the values of another array (A[i], i=-N..N) based on a conservation of momentum rule (x+y=z+j). This means that for a given index j for all the valid combinations of (x,y,z) I calculate A[x]A[y]A[z]. dA[j] is equal to the sum of these values.
I'm currently precomputing the valid indices for each dA[j] by looping x=-N...+N,y=-N...+N and calculating z=x+y-j and storing the indices if abs(z) <= N.
Is there a more efficient method of computing this?
The reason I ask is that in future I'd like to also be able to efficiently find for each dA[j] all the terms that have a specific A[i]. Essentially to be able to compute the Jacobian of dA[j] with respect to dA[i].
Update
For the sake of completeness I figured out a way of doing this without any if statements: if you parametrize the equation x+y=z+j given that j is a constant you get the equation for a plane. The constraint that x,y,z need to be integers between -N..N create boundaries on this plane. The points that define this boundary are functions of N and j. So all you have to do is loop over your parametrized variables (s,t) within these boundaries and you'll generate all the valid points by using the vectors defined by the plane (s*u + t*v + j*[0,0,1]).
For example, if you choose u=[1,0,-1] and v=[0,1,1] all the valid solutions for every value of j are bounded by a 6 sided polygon with points (-N,-N),(-N,-j),(j,N),(N,N),(N,-j), and (j,-N).

So for each j, you go through all (2N)^2 combinations to find the correct x's and y's such that x+y= z+j; the running time of your application (per j) is O(N^2). I don't think your current idea is bad (and after playing with some pseudocode for this, I couldn't improve it significantly). I would like to note that once you've picked a j and a z, there is at most 2N choices for x's and y's. So overall, the best algorithm would still complete in O(N^2).
But consider the following improvement by a factor of 2 (for the overall program, not per j): if z+j= x+y, then (-z)+(-j)= (-x)+(-y) also.

Mapping functions to reduce Time Complexity? PhD Qual Item

This was on my last comp stat qual. I gave an answer I thought was pretty good. We just get our score on the exam, not whether we got specific questions right. Hoping the community can give guidance on this one, I am not interested in the answer so much as what is being tested and where I can go read more about it and get some practice before the next exam.
At first glance it looks like a time complexity question, but when it starts talking about mapping-functions and pre-sorting data, I am not sure how to handle.
So how would you answer?
Here it is:
Given a set of items X = {x1, x2, ..., xn} drawn from some domain Z, your task is to find if a query item q in Z occurs in the set. For simplicity you may assume each item occurs exactly once in X and that it takes O(l) amount of time to compare any two items in Z.
(a) Write pseudo-code for an algorithm which checks if q in X. What is the worst case time complexity of your algorithm?
(b) If l is very large (e.g. if each element of X is a long video) then one needs efficient algorithms to check if q \in X. Suppose you are given access to k functions h_i: Z -> {1, 2, ..., m} which uniformly map an element of Z to a number between 1 and m, and let k << l and m > n.
Write pseudo-code for an algorithm which uses the function h_1...h_k to check if q \in X. Note that you are allowed to preprocess the data. What is the worst case time complexity of your algorithm?
Be explicit about the inputs, outputs, and assumptions in your pseudocode.

The first seems to be a simple linear scan. The time complexity is O(n * l), the worst case is to compare all elements. Note - it cannot be sub-linear with n, since there is no information if the data is sorted.
The second (b) is actually a variation of bloom-filter, which is a probabalistic way to represent a set. Using bloom filters - you might have false positives (say something is in the set while it is not), but never false negative (say something is not int the set, while it is).

Stochastic optimization algorithms

Say we have 2 stochastic optimization algorithms (Genetic Algorithms, Particle Swarm Optimization, Cuckoo Search, etc.), A and B, and we want to find the global maxima of a function. Then, if algorithm A performs better than algorithm B at optimizing function F on a 1-dimensional search space does it also perform better than B at optimizing function F on a N-dimensional search space?
I shall refer to function F in N dimensions by F_ND.
Note that F_1D and F_ND is the same function, except in a different number of dimensions; the "landscape" is exactly the same, only of different dimensionality.
Ex: for the DeJong function we have:
F_1D(x) = x[0]*x[0]
F_5D(x) = x[0]*x[0] + x[1]*x[1] + x[2]*x[2] + x[3]*x[3] + x[4]*x[4]
F_1D and F_5D have the same "type"/"aspect"
...put otherwise:
If general_performance(A,F_1D) > general_performance(B,F_1D) then does general_performance(A,F_ND) > general_performance(B,F_ND) (for a larger N, of course) hold also?

It is currently not known if such a property would hold. The No Free Lunch theorem (NFL) does not fully apply here since you're talking about a very restricted set of problems. The problem that you have drawn is still independent in higher dimensions (one can optimize every variable separately and reach the global optimum). In this case one could argue that you could divide that problem into 5 problems of 1 dimension and solve each dimension separately. This should be always cheaper than solving them combined (assuming that no dimensions are free).
I think it depends a lot on the type of problem, but in general I would not believe that such a property holds and that for some problem and some N you can find an algorithm B such that A is better than B <=> dimension < N and B is better than A <=> dimension >= N.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio