Prove that any algorithm for solving disjoint sets takes at least nlog n - complexity-theory

Disjoint sets problem
Let A and B be two sets , are they disjoint ?
Question
Prove that any algorithm for solving disjoint sets takes at least O(nlog n).
The idea I thought about is to prove that sorting can be reduced to disjoint set problem.
How do I do that ?

I don't really understand your question exactly , there are some pretty fast linear sorting algorithms that take O(n) like radix bucket and counting sorts which maybe suitable depending on the nature of your input .
Your question was if you can reduce IN POLYNOMIAL TIME sorting to disjoint sets,and even in taht case this could very possibly not solve your problem because if you can reduce sorting to disjoint sets in polynomial time it would mean that disjoint sets is atleast as hard as sorting which means that an algorithm solving disjoint sets might take longer than the algorithm for solving sorting .

Related

How does Kruskal and Prim change when edge weights are in the range of 1 to |V| or some constant W?

I'm reading CLRS Algorithms Edition 3 and I have two problems for my homework (I'm not asking for answers, I promise!). They are essentially the same question, just applied to Kruskal or to Prim. They are as follows:
Suppose that all edge weights in a graph are integers in the range from 1 to |V|. How fast can you make [Prim/Kruskal]'s algorithm run? What if the edge weights are integers in the range from 1 to W for some constant W?
I can see the logic behind the answers I'm thinking of and what I'm finding online (ie sort the edge weights using a linear sort, change the data structure being used, etc), so I don't need help answering it. But I'm wondering why there is a difference between the answer if the range is 1 to |V| and 1 to W. Why ask the same question twice? If it's some constant W, it could literally be anything. But honestly, so could |V| - we could have a crazy large graph, or a very small one. I'm not sure how the two questions posed in this problem are different, and why I need two separate approaches for both of them.
There's a difference in complexity between an algorithm that runs in O(V) time and O(W) time for constant W. Sure, V could be anything, as could W, but that's not really the point: one is linear, one, is O(1). The question is then for which algorithms could having a restricted range of edge-weights impact complexity (based, as you suggest on edge-weight sort time and choice in data-structure), and what would the actual new optimal complexity be for linearly bounded edge-weights vs. for edge-weights bounded by a constant, W.
Having bounded edge-weights could open up new possibilities for sorting algorithms for Kruskal's, and might change the data structure you'd want to use to implement the queue for Prim's along with the most optimal way you could implement extract-min and update-key operations for that queue. The extent to which edge-weights are bounded can impact whether a particular change in data structure or implementation is even beneficial to make in terms of final complexity.
For example, knowing that the n elements of a list are bounded in value by a constant W makes it so that a switch to radix sort would improve the asymptotic complexity of sorting them, but if I instead only knew that they were bounded in value by 2^n there would be no advantage in changing to radix sort over the traditional methods and their O(n*logn) sorting complexity.

Is there any algorithm to generate all subsets of a given set in O(n^2) time?

I am just trying out different algorithms for practice and I came across this problem for which I have to generate all the subsets of a given set of elements (non-duplicate).
the naive solution for this is using bitmasking technique. But that has an exponential time complexity i.e O(2^n). In the worst case scenario, value of n can be upto 100000 so in this scenario O(n^2) solution is not very eddicient.
I am just wondering if there is any other more efficient algorithm which can solve this in ~O(n^2)?
If you want to for example print all the subsets or store them explicitly anywhere there is no such algorithm. Notice that there are 2^n subsets of a set of n elements and simply enumerating them requires an exponential time.
While there is clearly no polynomial time way to generate all subsets of an n element set, some ways of doing so are more efficient than others. In particular, if you use a Gray code you can go from 1 subset to another by adding or removing a single element from successive subsets. This would make e.g. brute-force knapsack solutions noticeably more efficient (for e.g. n around 30 or smaller).

pseudo polynomial analysis of algorithms

When learning for exam in Algorithms and Data Structures i have stumbled upon a question, what does it mean if an algorithm has pseudo polynomial time efficiency( analysis)
Did a lot of searching but turned empty handed
It means that the algorithm is polynomial with respect to the size of the input, but the input actually grows exponentially.
For example take the subset sum problem. We have a set S of n integers and we want to find a subset which sums up to t.
To solve this problem you can just check the sum of each subset, so it is O(P) where P is the number of subsets. However in fact the number of subsets is 2^n so the algorithm has exponential complexity.
I hope this introduction helps to understand the wikipedia's article about it http://en.wikipedia.org/wiki/Pseudo-polynomial_time :)

Time complexity of one algorithm cascaded into another?

I am working with random forest for a supervised classification problem, and I am using the k-means clustering algorithm to split the data at each node. I am trying to calculate the time complexity for the algorithm. From what I understand the the time complexity for k-means is
O(n · K · I · d )
where
n is the number of points,
K is the number of clusters,
I is the number of iterations, and
d is the number of attributes.
The k, I and d are constants or have an upper bound, and n is much larger as compared to these three, so I suppose the complexity is just O(n).
The random forest, on the other hand, is a divide-and-conquer approach, so for n instances the complexity is O(n · logn), though I am not sure about this, correct me if i am wrong.
To get the complexity of the algorithm do i just add these two things?
In this case, you don't add the values together. If you have a divide-and-conquer algorithm, the runtime is determined by a combination of
The number of subproblems made per call,
The sizes of those subproblems, and
The amount of work done per problem.
Changing any one of these parameters can wildly impact the overall runtime of the function. If you increase the number of subproblems made per call by even a small amount, you increase exponentially the number of total subproblems, which can have a large impact overall. Similarly, if you increase the work done per level, since there are so many subproblems the runtime can swing wildly. Check out the Master Theorem as an example of how to determine the runtime based on these quantities.
In your case, you are beginning with a divide-and-conquer algorithm where all you know is that the runtime is O(n log n) and are adding in a step that does O(n) work per level. Just knowing this, I don't believe it's possible to determine what the runtime will be. If, on the other hand, you make the assumption that
The algorithm always splits the input into two smaller pieces,
The algorithm recursively processes those two pieces independently, and
The algorithm uses your O(n) algorithm to determine which split to make
Then you can conclude that the runtime is O(n log n), since this is the solution to the recurrence given by the Master Theorem.
Without more information about the internal workings of the algorithm, though, I can't say for certain.
Hope this helps!

complexity of polygon-construction

I want to proof that the complexity of the problem to construct a simple polygon out of a given set of points (2D) is at least O(nlogn), i.e. every correct algorithm takes at least O(nlogn) steps to solve this problem in the worst case.
Is it possible to reduce this problem to a sorting problem somehow or how can this be shown?
Yes, use the baseline algorithm:
Find two points with lowest and highest x-value: O(n)
Divide other points in two sets lying above and below the baseline (line joining these points): O(n)
Sort upper set in "ascending y/ascending x"-order: O(n*logn)
Sort lower set in "descending y/desending x"-order: O(n*logn)
Join upper set in order, join lower set in order: O(n)
Yes it is possible to prove it by a way of comparing it to the sorting problem, but you must reduce it the other way around. That is, reduce the sorting problem to the polygon-construction problem in O(n) time.

Resources