So I wanted to ask a data structures questions. For a given array if we need to work with max/min we work with heaps. It is only possible because the given data is constant. They do not change. What if the data is set of polynomials.
Following assumptions can be taken.
The polynomial is a linear equation of ax+b. And we have a large set of a and b.
a and b are only positive.
Basic requirements.
Changing values of certain a&b should be possible.
for a given x, we need a polynomial with maximum resultant.
Can you suggest me a data structure or guide me through. I am kind of stuck here.
Related
Something I'm struggling with is figuring out a precise way of knowing how our size n should be defined.
To demonstrate what I mean by that; take for example binary search. The input size n of T(n) is defined to be high - low + 1. I don't really understand how we can
Figure that out from just the algorithm without taking an "educated guess"
confirm that we are not wasting our time proving the recurrence equation with a falsely derived input size
I would really appreciate some advise, thanks in advance.
Indeed the input size is usually not arbitrary. It is necessary to correctly determine what that is.
So how do we do it?
First of all you have to understand the algorithm - what it does and why do you even use it. Usually you can brute force any problem quite easily, but you still decide to use something better. Why? Answering that question should make everything clear.
Let's take a look at your binary search example. You want to find a specific value by calling a monotonic function that will tell you if your choice is too low or too high. For example, you may want to find a largest value less than a specific number in an array.
What is a brute force approach? Well, you can ask for every value possible. What affects the complexity? The number of values you can choose. That's your input size. To decrease the complexity you may want to use the binary search, which let's you do much less queries. The input size remains the same.
Let's have a look at some other algorithms:
Euclidean algorithm for GCD - brute force? Search for every number less or equal to the minimum of your input. What affects the complexity? The number of values you want to check, that is the numbers for which you want to find the GCD.
BFS/DFS (graph traversal) - you want to visit each node once. For each node you have to additionally check every edge. In total: the input size is the number of nodes and edges.
KMP/Karp-Rabin/any other pattern finding algorithm - you want to find occurences of pattern in a text, the input size will obviously be a text size, but also a pattern size.
Once you understand the problem the algorithm solves, think of a brute force approach. Determine what affects the speed, what can be improved. That is most likely your input size. In a more complex cases, the input size is stated in an algorithm's description.
Which programs/algorithms change the representation of their data structure at runtime in order to obtain beter performance?
Context:
Data structures "define" how real-world concepts are structured and represented in computer memory. For different kinds of computations a different data structure should/can be used to achieve acceptable performance (e.g., linked-list versus array implementation).
Self-adaptive (cf. self-updating) data structures are data structures that change their internal state according to a concrete usage pattern (e.g., self balancing trees). These changes are internal, i.e., depending the data. Moreover, these changes are anticipated upon by design.
Other algorithms can benefit from an external change of representation. In
matrix-multiplication, for instance, it is a well know performance trick to transpose "the second matrix" (such that caches are used more efficiently). This is actually changing the matrix representation from row-major to column major order. Because "A" is not the same as "Transposed(A)", the the second matrix is transposed again after the multiplication to keep the program semantically correct.
A second example is using a linked-list at program start-up to populate "the data structure" and change to an array based implementation once the content of the list becomes "stable".
I am looking for programmers that have similar experiences with other example programs where an external change of representation is performed in their application in order to have better performance. Thus, where the representation (chosen implementation) of a data structure is changed at runtime as an explicit part of the program.
The pattern of transforming the input representation in order to enable a more efficient algorithm comes up in many situations. I would go as far as to say this is an important way to think about designing efficient algorithms in general. Some examples that come to mind:
HeapSort. It works by transforming your original input list into a binary heap (probably a min-heap), and then repeatedly calling the remove-min function to get the list elements in sorted order. Asymptotically, it is tied for the fastest comparison-based sorting algorithm.
Finding duplicates in a list. Without changing the input list, this will take O(n^2) time. But if you can sort the list, or store the elements in a hash table or Bloom filter, you can find all the duplicates in O(n log n) time or better.
Solving a linear program. A linear program (LP) is a certain kind of optimization problem with many applications in economics and elsewhere. One of the most important techniques in solving LPs is duality, which means converting your original LP into what is called the "dual", and then solving the dual. Depending on your situation, solving the dual problem may be much easier than solving the original ("primal") LP. This book chapter starts with a nice example of primal/dual LPs.
Multiplying very large integers or polynomials. The fastest known method is using the FFT; see here or here for some nice descriptions. The gist of the idea is to convert from the usual representation of your polynomial (a list of coefficients) to an evaluation basis (a list of evaluations of that polynomial at certain carefully-chosen points). The evaluation basis makes multiplication trivial - you can just multiply each pair of evaluations. Now you have the product polynomial in an evaluation basis, and you interpolate (opposite of evaluation) to get back the coefficients, like you wanted. The Fast Fourier Transform (FFT) is a very efficient way of doing the evaluation and interpolation steps, and the whole thing can be much faster than working with the coefficients directly.
Longest common substring. If you want to find the longest substring that appears in a bunch of text documents, one of the fastest ways is to create a suffix tree from each one, then merge them together and find the deepest common node.
Linear algebra. Various matrix computations are performed most efficiently by converting your original matrix into a canonical form such as Hermite normal form or computing a QR factorization. These alternate representations of the matrix make standard things such as finding the inverse, determinant, or eigenvalues much faster to compute.
There are certainly many examples besides these, but I was trying to come up with some variety.
Given a collection of points in the complex plane, I want to find a "typical value", something like mean or mode. However, I expect that there will be a lot of outliers, and that only a minority of the points will be close to the typical value. Here is the exact measure that I would like to use:
Find the mean of the largest set of points with variance less than some programmer-defined constant C
The closest thing I have found is the article Finding k points with minimum diameter and related problems, which gives an efficient algorithm for finding a set of k points with minimum variance, for some programmer-defined constant k. This is not useful to me because the number of points close to the typical value could vary a lot and there may be other small clusters. However, incorporating the article's result into a binary search algorithm shows that my problem can be solved in polynomial time. I'm asking here in the hope of finding a more efficient solution.
Here is way to do it (from what i have understood of problem) : -
select the point k from dataset and calculate sorted list of points in ascending order of their distance from k in O(NlogN).
Keeping k as mean add the points from sorted list into set till variance < C and then stop.
Do this for all points
Keep track of set which is largest.
Time Complexity:- O(N^2*logN) where N is size of dataset
Mode-seeking algorithms such as Mean-Shift clustering may still be a good choice.
You could then just keep the mode with the largest set of points that has variance below the threshold C.
Another approach would be to run k-means with a fairly large k. Then remove all points that contribute too much to variance, decrease k and repeat. Even though k-means does not handle noise very well, it can be used (in particular with a large k) to identify such objects.
Or you might first run some simple outlier detection methods to remove these outliers, then identify the mode within the reduced set only. A good candidate method is 1NN outlier detection, which should run in O(n log n) if you have an R-tree for acceleration.
I have a set of disjoint integer intervals and want to check whether a given integer lies in one of these intervals. Of course, this can be achieved by means of a binary search in logarithmic time. However, the vast majority of the queries return false, i.e., only very few integers lie in any interval. To speedup the application, I'm looking for a probabilistic, constant-time algorithm (some sort of hash function) that tells me whether a given integer is definitely not or maybe in an interval. Here is a sketch of the intended algorithm, where magic_data_structure is initialized with the intervals stored in tree:
x = some_integer;
if(!magic_data_structure.find(x))
return false; // definitely not in any interval
return tree.find(x); // binary search on tree
Any ideas or hints for literature? Thank you very much in advance for your help!
P.S.: Does anybody know improvements of interval trees for non-overlapping intervals which (unlike the ones described above) may include other intervals?
This is a naive solution, but constant.
If you are not dealing with extremely large quantities of numbers, you could just use a hash table where the keys are the numbers and the values are a pointer to the set they're in. But of course if there is a lot of data it might take too long (and too much memory) to index it this way.
Looks like there are various disjoint-set data structures and algorithms to store/search them, but I doubt if any of them have constant times.
The original problem was discussed in here: Algorithm to find special point k in O(n log n) time
Simply we have an algorithm that finds whether a set of points in the plane has a center of symmetry or not.
I wonder is there a way to prove a lower bound (nlogn) to this algorithm? I guess we need to use this algorithm to solve a simplier problem, such as sorting, element uniqueness, or set uniqueness, therefore we can conclude that if we can solve e.g. element uniqueness by using this algorithm, it can be at least nlogn.
It seems like the solution is something to do with element uniqueness, but i couldn't figure out a way to manipulate this into center of symmetry algorithm.
Check this paper
The idea is if we can reduce problem A to problem B, then B is no harder than A.
That said, if problem B has lower bound Ω(nlogn), then problem A is guaranteed the same lower bound.
In the paper, the author picked the following relatively approachable problem to be B: given two sets of n real numbers, we wish to decide whether or not they are identical.
It's obvious that this introduced problem has lower bound Ω(nlogn). Here's how the author reduced our problem at hand to the introduced problem (A, B denote the two real sets in the following context):
First observe that that your magical point k must be in the center.
build a lookup data structure indexed by vector position (O(nlog n))
calculate the centroid of the set of points (O(n))
for each point, calculate the vector position of its opposite and check for its existence in the lookup structure (O(log n) * n)
Appropriate lookup data structures can include basically anything that allows you to look something up efficiently by content, including balanced trees, oct-trees, hash tables, etc.