Average number of swaps performed in Bubble Sort - algorithm

I came across this problem right now:
http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=24&page=show_problem&problem=3155
The problems asks to calculate the average number of swaps performed by a Bubble Sort algorithm when the given data is a shuffled order of first n (1 to n listed randomly) natural numbers.
So, I thought that:
Max no of swaps possible=n(n-1)/2. (When they are in descending order.)
Min no of swaps possible=0. (When they are in ascending order.)
So, the mode of this distribution is (0+n(n-1)/2)/2 =n(n-1)/4.
But, this turned out to be the answer.
I don't understand why did the mode coincide with mean.

Since the inputs to be sorted can be any random number with equal probability of occurrence, the distribution is symmetrical.
It is a property of symmetrical distributions that their mean, median and modes coincide which is why the mean and mode are same.

Every swap reduces the number of inversions in the array by exactly 1.
The sorted array has no inversions, thus the number of swaps is equal to the number of inversions in the initial array. Thus we need to compute the average number of inversions in a shuffled array.
The pair of indices i, j, with i < j, is an inversion in exactly half of the shuffled arrays. There are n * (n-1) / 2 such pairs, thus we have n * (n-1) / 4 inversions on average.

Related

How can i find the minimum interval that contains half of n numbers?

If I have n numbers , how do I find the minimum interval [a,b] that contains half of those numbers ?
Sort numbers
Set left index to 0, right to n/2-1, get difference A[right]-A[left]
Walk n/2 steps in for-loop, incrementing both indexes, calculating difference again, remember the smallest one and corresponding indexes.
Sort the numbers increasingly and compute all the differences A[i+n/2] - A[i]. The solution is given by the index that minimizes the difference.
Explanation:
There is no need to search among the intervals that contain less than n/2 numbers (because they do not satisfy the conditions) nor those that contain more elements (because if you find a suitable interval, it won't be minimal because you can remove the extreme elements).
When the elements are sorted, any sequence in the array is bounded by its first and last elements. So it suffices to sort the numbers and slide a window of n/2 elements.
Now it is more challenging to tell if this O(n log n) approach is optimal.
How about the following?
Sort the given series of numbers in ascending order.
Start a loop with i from 1st to ([n/2])th number
Calculate the difference d between i + ([n/2])th and ith number. Store the numbers i, i + [n/2] and d in an iteratable collection arr.
End loop
Find the minimum value of d from the array arr. The values of i and i + [n/2] corresponding to this d is your smallest range.

Find triplets in an array whose sum is some integer X

Given a sorted array[1..n] where each element ranging from 1 to 2n. Is there a way to find triplet whose sum is given integer x. I know O(n^2) solution. Is there any algorithm better than n^2 ones.
It is possible to achieve an O(n log n) time complexity using that fact the maximum value of each element is O(n).
For each 1 <= y <= 4 * n, let's find the number of pairs of elements that sum up to y. We can create a polynomial of 2 * n power, where the i-th coefficient of this polynomial is the number of occurrences of the number i in the given array. Now we can find the square(I'll call it s) of this polynomial in O(n log n) time using Fourier's Fast Transform. The i-th coefficient of s is exactly the number of pairs of elements that sum up to i.
Now we can iterate over the given array. Let's assume that the current element is a. Then we just need to check the number of pairs that sum up to X - a. We have already done it in the step 1).
If all triplets must consist of different elements, we need to subtract the number of such triplets that sum up to X but contain duplicates. We can do it in O(n log n) time, too(for triplets that consist of three equal elements, we just need to subtract the number of occurrences of X / 3 in the given array. For triplets with one duplicate, we can just iterate over the element that is repeated twice(a) and subtract the number of occurrences of X - 2 * a).
If we need to find a triplet itself, not just count them, we can do the following:
Count the number of triplets and pairs as suggested above.
Find such an element that there is a pair that sums up to X with it.
Find two elements that sum up to the desired sum of this pair.
All these steps can be accomplished in linear time(using the fact that all sums are O(n)).
Your problem is apparently the non-zero sum variant of the 3SUM problem.
Because you know the possible range of the integers beforehand, you can achieve lower bounds than the general case, according to the second paragraph of the article:
When the elements are integers in the range [-N, ..., N], 3SUM can be
solved in O(n + N log N) time by representing the input set S as a
bit vector, computing the set S + S of all pairwise sums as a discrete
convolution using the Fast Fourier transform, and finally comparing
this set to -S.
In your case, you would have to pre-process the array by subtracting n + X/3 before running the algorithm.
One thing to note is that the algorithm assumes you're working with a set of numbers, and I'm not sure what (if any) implications there may be on running time if your array may include duplicates.

Counting number of contiguous subarrays with positive sum in O(nlogn) using O(n) transformation

I am interested in finding the number of contiguous sub-arrays that sum to a positive value (sum>0).
More formally, Given an array of integers A[1,...,n] I am looking to count the pairs of integers (i,j) such that 1<=i<=j<=n and A[i]+...+A[j]>0.
I am familiar with Kadane's algorithm for finding the maximum sum sub-array in O(n), and using a similar approach I can count the number of these sub-arrays in O(n^2).
To do this I take the cumulative sum T(i). I then compute T(j)-T(i-1) for all j=1,...,n and i=1,...,j and just record the differences that end up positive.
Apparently though there is an O(n) time routine that transforms this problem into a problem of counting the number of inversions (which can be achieved in O(nlogn) using say merge-sort). Try as I may though, I have been unable to find this transformation.
I do understand though that somehow I must match this inversion to the fact that the sum of elements between a pair (i,j) is positive.
Does anyone have any guidance as to how to do this? Any help is greatly appreciated.
EDIT: I am actually looking for the O(n) transformation rather than an alternative (not based on transformation plus inversion counting) solution to finding the number of sub arrays.
Using the original array A, build another array sumA such that:
sumA[i] = A[0] + A[1] + ... + A[i].
Now in this sumA[] array if there are two indices i, j (i < j) such that sumA[i] < sumA[j],
Then sumA[j] - sumA[i] > 0. This is exactly the sum of all elements between indices i and j.
Hence the problem reduces to finding the number of inversions for the reverse of this array. This can be conducted by sorting the array sumA[] in descending order using mergesort and calculating the number of inversions encountered in this process.

How to generate random permutations fast

I read a question in an algorithm book:
"Given a positive integer n, choose 100 random permutations of [1,2,...,n],..."
I know how to generate random permutations with Knuth's algorithm. but does there exist any fast algorithms to generate large amount of permutations ?
Knuth shuffles require you to do n random swaps for a permutation of n elements (see http://en.wikipedia.org/wiki/Random_permutation#Knuth_shuffles) so the complexity is O(n) which is about the best you can expect if you receive a permutation on n elements.
Is this causing you a practical problem? If so, perhaps you could look at what you are doing with all those permutations in practice. Apart from simply getting by on less, you could think about deferring generating a permutation until you are sure you need it. If you need a permutation on n objects but only look at k of those n objects, perhaps you need a scheme for generating only those k elements. For small k, you could simply generate k random numbers in the range [0, n) at random, repeating generations which return numbers which have already come up. For small k, this would be unlikely.
There exist N! permutations of numbers from 1 to N. If you sort them lexicographically, like in a dictionary, it's possible to construct permutation knowing it order in a list of sorted permutations.
For example, let N=3, lexicographically sorted list of permutations is {123,132,213,231,312,321}. You generate number between 1 and 3!, for example 5. 5-th permutaion is 312. How to construct it?
Let's find the 1-st number of 5-th permutation. Let's divide permutations into blocks, criteria is 1-st number, I means such groups - {123,132},{213,231},{312,321}. Each group contains (n-1)! elements. The first number of permutation is the block number. 5-th permutation is in ceil(5/(3-1)!)=3 block. So, we've just found the first number of 5-th permutation it's 3.
Now I'm looking for not 5-th but (5-(n-1)!*(ceil(5/2)-1))=5-2*2=1-th permutation in
{3,1,2},{3,2,1}. 3 is determined and the same for all group members, so I'm actually searching for 1-th permutation in {1,2},{2,1} and N now is 2. Again, next_num = ceil(1/(new_N-1)!) = 1.
Continue it N times.
Hope you got the idea. Complexity is O(N) - because you constructing permutation elements one by one with arithmetical tricks.
UPDATE
When you got next number by arithmetical opearions you should also keep used array and instead of X take X-th unused Complexity becomes NlogN because logN needed for getting X-th unused element

Explanation of the Median of Medians algorithm

The Median of medians approach is very popular in quicksort type partitioning algorithms to yield a fairly good pivot, such that it partitions the array uniformly. Its logic is given in Wikipedia as:
The chosen pivot is both less than and greater than half of the elements in the list of medians, which is around n/10 elements (1/2 * (n/5)) for each half. Each of these elements is a median of 5, making it less than 2 other elements and greater than 2 other elements outside the block. Hence, the pivot is less than 3(n/10) elements outside the block, and greater than another 3(n/10) elements outside the block. Thus the chosen median splits the elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm.
Can somebody explain it a bit lucidly for me. I am finding it difficult to understand the logic.
Think of the following set of numbers:
5 2 6 3 1
The median of these numbers is 3. Now if you have a number n, if n > 3, then it is bigger than at least half of the numbers above. If n < 3, then it is smaller than at least half of the numbers above.
So that is the idea. That is, for each set of 5 numbers, you get their median. Now you have n / 5 numbers. This is obvious.
Now if you get the median of those numbers (call it m), it is bigger than half of them and smaller than the other half (by definition of median!). In other words, m is bigger than n / 10 numbers (which themselves were medians of small 5 element groups) and bigger than another n / 10 numbers (which again were medians of small 5 element groups).
In the example above, we saw that if the median is k and you have m > k, then m is also bigger than 2 other numbers (that were themselves smaller than k). This means that for each of those smaller 5 element groups where m was bigger than its median, m is also bigger than two other numbers. This makes it at least 3 numbers (2 numbers + the median itself) in each of those n / 10 small 5 element groups, that are smaller than m. Hence, m is at least bigger than 3n/10 numbers.
Similar logic for the number of elements m is bigger than.

Resources