Given any two sequences of n real numbers, say (a1,a2,...,an) and (b1,b2,...,bn), how to tell if one sequence (which can also be viewed as a vector) is a permutation of the other?
I plan to develop an algorithm and run it on Matlab to do this job. I can only think of an algorithm that costs n! times: just try all the permutations in n.
Is there a faster algorithm?
First of all, why n! ? if for every ai you search a match in bi you will get O(n^2).
Anyway it is more efficient to use sort with O(nlogn) complexity.
A=[3,1,2,7];
B=[2,3,1,7];
isPermutated=isequal(sort(A),sort(B))
Just sort both sequences and compare sorted results.
In some situations you might find useful create sets/map/dictionary (with counters if multiple elements are possible) from both sequences and check every element presence in another set.
Related
Please suggest if there is quicker way to find the negative number in a given array, provided that the array has only one negative number. I think sorting is an option, but it will be helpful if there is a quicker way.
Sorting won't be quicker than going through all the elements of the array (because to sort you also have to do that).
The fastest possible thing to do is to go through the all array and stop once you detect one negative number.
Just traverse the array. That is order n. Sorting is at best order n(log n); at worst n2.
Probably the fastest is to just scan the array until you find it.
If you're just doing this once, and don't need the array sorted for other purposes, it'll be faster to scan for the negative number than to do the sort. If, however, you need (or can use) the sorting for other purposes, or you may need to find the negative number several times, then sorting can end up saving time. Likewise, with some programs, spending extra time in preparation to get faster response when really crucial can be justified (but I've no idea whether that applies here or not).
Say I have a list of pairs of indices in an array of length N. I want to determine if an arbitrarily sorted list is sorted after doing
for pair in pairs:
if list_to_sort[pair.first] > list_to_sort[pair.second]:
swap(
element_a_index=pair.first,
element_b_index=pair.second,
list=list_to_sort
)
Obviously, I could test all permutations of the N-element list. Is there a faster way? And if there is, what is it? What is it called? Is it provably the fastest solution?
What you're describing is an algorithm called a sorting network. Unfortunately, it's known that the problem of determining whether a series of swaps yields a valid sorting network is co-NP-complete, meaning that unless P = NP there is no polynomial-time algorithm for checking whether your particular choice of pairs will work correctly.
Interestingly, though, you don't need to try all possible permutations of the input. There's a result called the zero-one principle that states that as long as your sorting network will correctly sort a list of 0s and 1s, it will sort any input sequence correctly. Consequently, rather than trying all n! possible permutations of the inputs, you can check all 2n possible sequences of 0s and 1s. This is still pretty infeasible if n is very large, but might be useful if your choice of n is small.
As for the best possible way to build a sorting network - this is an open problem! There are many good sorting networks that run quickly on most inputs, and there are also a few sorting networks that are known to be optimal. Searching for "optimal sorting network" might turn up some useful links.
Hope this helps!
This was inspired by a question at a job interview: how do you efficiently generate N unique random numbers? Their security and distribution/bias don't matter.
I proposed a naive way of calling rand() N times and eliminating dupes by trial and error, thus getting inefficient and flawed solution. Then I've read this SO question, these algorithms are great for getting quality unique numbers and they are O(N).
But I suspect there are ways to get low-quality unique random numbers for dummy tasks in less than O(N) time complexity. I got some possible ideas:
Store many precomputed lists each containing N numbers and retrieve one list randomly. Complexity is O(1) for fixed N. Storage space used is O(NR) where R is number of lists.
Generate N/2 unique random numbers and then divide them by 2 inequal parts (floor/ceil for odd numbers, n+1/n-1 for even). I know this is flawed (duplicates can pop up) and O(N/2) is still O(N). This is more of a food for thought.
Generate one big random number and then squeeze more variants from it by some fixed manipulations like bitwise operations, factorization, recursion, MapReduce or something else.
Use a quasi-random sequence somehow (not a math guy, just googled this term).
Your ideas?
Presumably this routine has some kind of output (i.e. the results are written to an array of some kind). Populating an array (or some other data-structure) of size N is at least an O(N) operation, so you can't do better than O(N).
You can consequently generate a random number, and if the result array contains it, just add to it the maximum number of already generated numbers.
Detecting if a number already generated is O(1) (using a hash set). So it's O(n) and with only N random() calls.
Of course, this is an assumption that we do not overflow the upper limit (i.e. BigInteger).
I got asked this question once and still haven't been able to figure it out:
You have an array of N integers, where N is large, say, a billion. You want to calculate the median value of this array. Assume you have m+1 machines (m workers, one master) to distribute the job to. How would you go about doing this?
Since the median is a nonlinear operator, you can't just find the median in each machine and then take the median of those values.
Depending on the Parallel Computation Model, algorithms could vary. (Note: the pdf linked to in previous sentence just contains some of the many possible ones).
Finding the median is a special case of finding the ith element. This problem is called 'selection problem', so you need to search the web for parallel selection.
Here is one paper (unfortunately, not free) which might be useful: Parallel Selection Algorithms With Analysis on Clusters.
And google's first link for the query "Parallel Selection" gives: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html which actually uses the median of medians for the general problem and not just median finding.
You could do a highly parallelizable sort (like merge sort) and get the median from the result.
Would sorting the array be overkill? If not, then divide up the array and then merge the results together is my suggestion.
How are algorithms analyzed? What makes quicksort have an O(n^2) worst-case performance while merge sort has an O(n log(n)) worst-case performance?
That's a topic for an entire semester. Ultimately we are talking about the upper bound on the number of operations that must be completed before the algorithm finishes as a function of the size of the input. We do not include the coeffecients (ie 10N vs 4N^2) because for N large enough, it doesn't matter anymore.
How to prove what the big-oh of an algorithm is can be quite difficult. It requires a formal proof and there are many techniques. Often a good adhoc way is to just count how many passes on the data the algorithm makes. For instance, if your algorithm has nested for loops, then for each of N items you must operate N times. That would generally be O(N^2).
As to merge sort, you split the data in half over and over. That takes log2(n). And for each split you make a pass on the data, which gives N log(n).
quick sort is a bit trickier because in the average case it is also n log (n). You have to imagine what happens if your partition splits the data such that every time you get only one element on one side of the partition. Then you will need to split the data n times instead of log(n) times which makes it N^2. The advantage of quicksort is that it can be done in place, and that we usually get closer to N log(n) performance.
This is introductory analysis of algorithms course material.
An operation is defined (ie, multiplication) and the analysis is performed in terms of either space or time.
This operation is counted in terms of space or time. Typically analyses are performed as Time being the dependent variable upon Input Size.
Example pseudocode:
foreach $elem in #list
op();
endfor
There will be n operations performed, where n is the size of #list. Count it yourself if you don't believe me.
To analyze quicksort and mergesort requires a decent level of what is known as mathematical sophistication. Loosely, you solve a discrete differential equation derived from the recursive relation.
Both quicksort and merge sort split the array into two, sort each part recursively, then combine the result. Quicksort splits by choosing a "pivot" element and partitioning the array into smaller or greater then the pivot. Merge sort splits arbitrarily and then merges the results in linear time. In both cases a single step is O(n), and if the array size halves each time this would give a logarithmic number of steps. So we would expect O(n log(n)).
However quicksort has a worst case where the split is always uneven so you don't get a number of steps proportional to the logarithmic of n, but a number of steps proportional to n. Merge sort splits exactly into two halves (or as close as possible) so it doesn't have this problem.
Quick sort has many variants depending on pivot selection
Let's assume we always select 1st item in the array as a pivot
If the input array is sorted then Quick sort will be only a kind of selection sort!
Because you are not really dividing the array.. you are only picking first item in each cycle
On the other hand merge sort will always divide the input array in the same manner, regardless of its content!
Also note: the best performance in divide and conquer when divisions length are -nearly- equal !
Analysing algorithms is a painstaking effort, and it is error-prone. I would compare it with a question like, how much chance do I have to get dealt two aces in a bridge game. One has to carefully consider all possibilities and must not overlook that the aces can arrive in any order.
So what one does for analysing those algorithms is going through an actual pseudo code of the algorithm and add what result a worst case situation would have. In the following I will paint with a large brush.
For quicksort one has to choose a pivot to split the set. In a case of dramatic bad luck the set splits in a set of n-1 and a set of 1 each time, for n steps, where each steps means inspecting n elements. This arrive at N^2
For merge sort one starts by splitting the sequence into in order sequences. Even in the worst case that means at most n sequences. Those can be combined two by two, then the larger sets are combined two by two etc. However those (at most) n/2 first combinations deal with extremely small subsets, and the last step deals with subsets that have about size n, but there is just one such step. This arrives at N.log(N)