algorithm proof - building least number after deleting k digits from an n-digit number - algorithm

Problem: given an n-digit number, which k (k < n) digits should be deleted from it to make the number left is the smallest among all cases (the relative sequence of remaining digits should not be changed). e.g. delete 2 digits from '24635', the smallest left number is '235'.
A solution: Delete the first digit (from left to right) which is larger than or equal to its right neighbor, or the last digit, if we cannot find one as such. Repeat this procedure for k times. (see codecareer for reference. There are other solutions such as geeksforgeeks, stackoverflow, but I thought the one described here is more intuitive, so I prefer this one.)
The problem now is, how to prove the solution above is correct, i.e. how can it guarantee the final number is smallest by making it the smallest after deleting a single digit at each step.

Suppose k = 1.
Let m = Σi=0,...,n aibi and n+1 digit number anan-1...a1a0 with base b, i.e. 0 ≤ ai < b ∀ 0 ≤ i ≤ n (e.g. b = 10).
Proof
∃ j > 0 with aj > aj-1 and let j be maximal.
This means aj is the last digit of a (not necessary strictly) increasing sequence of consecutive digits.
Then the digit aj is now removed from the number and the resulting number m' has the value
m' = Σi=0,...,j-1 aibi + Σi=j+1,...,n aibi-1
The aim of this reduction is to maximize the difference m-m'. So lets take a look:
m - m' = Σi=0,...,n aibi - (Σi=0,...,j-1 aibi + Σi=j+1,...,n aibi-1)
= ajbj + Σi=j+1,...,n (aibi - aibi-1)
= anbn + Σi=j,...,n-1 (ai - ai+1)bi
Can there be a better choice of j to get a bigger difference?
Since an...aj is an increasing sub sequence, ai-ai+1 ≥ 0 holds. So choosing j' > j instead of j, you get more zeros where you now have a positive number, i.e. the difference gets not bigger, but lower if there exists an i with ai+1 < ai (strict smaller).
j is supposed to be maximal, i.e. aj-1-aj < 0. We know
bj-1 > Σi=0,...,j-2(b-1)bi = bi-1-1
This means, that if we choose `j' < j', we get a negative addition to the difference, so it also gets not bigger.
If ∄ j > 0 with aj > aj-1 the above proof works for j = 0.
What is left to do?
This is only the proof that your algorithm works for k = 1.
It is possible to extend the above proof to multiple sub sequences of (not necessary strictly) increasing digits. It's exact the same proof but much less readable, due to the number of indexes you need.
Maybe you can also use induction, since there are no interactions between the digits (blocking following next choices or something).

Here is a simple argument that your algorithm works for any k. Suppose there is a digit in the mth place that is less than or equal to it's right (m+1)th digit neighbor, and you delete the mth digit but not the (m+1)th. Then you can delete the (m+1)th digit instead of the mth, and you will get an answer less than or equal to your original answer.

notice: this proof is for building the maximum number after removing k digits, but the thinking is similar
key lemma: maximum (m + 1)-digit number contains maximum m-digit
number for all m = 0, 1, ..., n - 1
proof:
greedy solution to delete one digit from some number to get the maximum
result: just delete the first digit which next digit is greater than it, or the last digit if digits are in non-ascending order. This is very easy to prove.
we use contradiction to proof the lemma.
suppose the first time the lemma is broken when m = k, so S(k) ⊄ S(k + 1). Notice that the S(k) ⊂ S(n) as the initial number contains all sub optimal ones, so there must exist a x that S(k) ⊂ S(x) and S(k) ⊄ S(x - 1), k + 2 <= x <= n
we use the greedy solution above to delete only one digit S[X][y] from S(x) to get S(x - 1), so S[X][y] ∈ S(x) and S[X][y] ∉ S(x - 1) and S(k) must contain it. We now use contradiction to prove that S(k) does not need to contain this digit .
According to our greedy solution, all digits from beginning to S[X][y] are
in non-ascending order.
if S[X][y] is at the tail, then S(k) can be the first k digits of S(x) ---> contradiction!
otherwise, we firstly know that all digits in S[X][1, 2,..., y] are in S[k]. If there is a S[X][z] is not inS(k), 1 <= z <= y - 1, then we can shift digits of S(k) that in range S[X][z + 1, y] to left one unit to get a greater or equal S(k). Therefore, there are at least 2 digit after S[X][y] that are not in S(k) as x >= k + 2. Then, we can follow the prefix of S(k) to S[X][y], but we do not use S[X][y], we use from S[X][y + 1]. As S[X][y + 1] > S[X][y], we can build a greater S(k) -------> contradiction!
so, we prove lemma. If we have got S(m + 1), and we know S(m + 1) contains S(m), then S(m) must be the maximum number after removing one digit from S(m + 1)

Related

Find The number of sides of triangle

I am giving a Array A consists of integer, I have to choose any two integer from array A and any third integer from range [L,R] , such that all three integer forms a valid triangle.
I have to find the number of integer in range [L,R] which can be used to form a valid triangle, by choosing any two values from array A.
I know if i know two sides then third side must be range a-b<x<a+b
where a and b are any two integer from A.
How to find the number of valid integer in [L,R] in O(N) N= Size of A time.
L and R and be very large upto 10^20
To my understanding, complete enumeration will yield an efficient solution by the following argument. The maximum number of possible solutions occurs in the case where any selection of elements of A yields a consistent choice of side lengths for a triangle. Such an input, for instance, would be an input of N times the value 1. However, the number of triples chosen from A can be bounded by
(n choose 3) = n!/(3!(n-3))
= n!/(6(n-3)!)
= (1/6)*(n-2)*(n-1)*n
<= n^3
(where choose is meant to denote the binomial coefficient) which is a polynomial number of choices. Any choice can be checked for validity in constant time, as only 3 values are involved.
Now the contest has ended, so here is my way to solve it in the contest.
The problem is asking how many number x in [L,R]can form triangle with any pair (a_i, a_j) in A.
Naive method is to brute all pairs which is O(N^2 * (R-L+1)) = O(N^2 * (R-L))
But indeed, we do not need to test all O(N^2) pairs, we only need to test O(N) pairs IF A is sorted, namely, all adjacent pair (a_i-1, a_i) for i > 0
Why? Because in sorted A:
If there is some pair (a_j, a_i) where a_j < a_i and j != i-1, which can form triangle with x
Then (a_i-1, a_i) must form a triangle with x too:
a_i-1 + a_i > a_j + a_i > x
x + a_i-1 > x + a_j > a_i
x + a_i > x + a_i-1 > a_j
Therefore checking all (a_i-1, a_i) is sufficient, this is the first core idea to solve this problem.
So now we have a O(NlgN + N*(R-L+1)) = O(N*(R-L)) algorithm.
For the original contest problem, it is still too slow as R-L+1 is as large as 10^18, so we need another tricks.
Note that actually by the triangle inequality, for a pair (a_i-1, a_i), indeed we can find a range of x which can form triangle with this pair:
a_i-1 + a_i > x > a_i - a_i-1 (By above 1 & 2)
For example, (4,5) can form triangle with all 1 < x < 9
So instead of iterating all x in [L,R], we try to find range of x for each pair, which can be done in O(N) as we know the range for x of a pair in O(1). Beware that x must fall in the range of [L,R].
After that we have O(N) ranges / segments of x which then we can union them and the size of the result set is the desired answer.
Union O(N) segments can be done easily in O(N) as well, I leave you as a homework :)
Combine both tricks, the algorithm is O(N lg N + N + N) = O(N lg N)

The number prodigy : reverse sum of pattern

Number prodigy is given X - there's a X digit number N, reverse of N is M. Number prodigy is interested in finding out how many X digit numbers are of form : N+M=10^X-1 and N is expected not have trailing zeroes. Means that N%10 != 0 .
In case of X=1, 9 such combinations exist.
Denote A[i] - the i'th digit of A.
We first need to understand that to get N+M=10^X-1, we need N[i]+M[i]=9 for all i. Since M[i]=N[X-i], it means we need N[i] + N[X-i] = 9. This means, once N[i] is set, also N[X-i].
We can now derive a recursive formula:
F(X) = 10*F(X-2)
The idea is - we look at the first digit of X, we have 10 possibilities for it, and for each possibility, we set N[0] and N[X-1].
However, this allows leading and trailing zeros, which we don't want. The first and last number can be anything by 0.
G(X) = 8*F(X-2)
The above is chosing one of 1,2,...,8 as N[0], then setting (one option) the last number so N[X-1] = 9 - N[0], and invoke the recursive call without restrictions. Note neither N[0] nor N[X-1] can be zero.
Base clauses:
F(0) = 1
F(1) = 0
F(1)=0 because there is no natural number n such that n+n=9.
All in all, we found a recursive formula to compute the total number of elements. This recursive formula can be transformed into a close one with some basic algebra. I leave this part for you.

How can I find a faster algorithm for this special case of Longest Common Sub-sequence (LCS)?

I know the LCS problem need time ~ O(mn) where m and n are length of two sequence X and Y respectively. But my problem is a little bit easier so I expect a faster algorithm than ~O(mn).
Here is my problem:
Input:
a positive integer Q, two sequence X=x1,x2,x3.....xn and Y=y1,y2,y3...yn, both of length n.
Output:
True, if the length of the LCS of X and Y is at least n - Q;
False, otherwise.
The well-known algorithm costs O(n^2) here, but actually we can do better than that. Because whenever we eliminate as many as Q elements in either sequence without finding a common element, the result returns False. Someone said there should be an algorithm as good as O(Q*n), but I cannot figure out.
UPDATE:
Already found an answer!
I was told I can just calculate the diagonal block of the table c[i,j], because if |i-j|>Q, means there are already more than Q unmatched elements in both sequences. So we only need to calculate the c[i,j] when |i-j|<=Q.
Here is one possible way to do it:
1. Let's assume that f(prefix_len, deleted_cnt) is the leftmost position in Y such that prefix_len elements of X were already processed and exactly deleted_cnt of them were deleted. Obviously, there are only O(N * Q) states because deleted_cnt cannot exceed Q.
2. The base case is f(0, 0) = 0(nothing was processed, thus nothing was deleted).
3. Transitions:
a) Remove the current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j)).
b) Match the current element with the leftmost possible element from Y that is equal to it and located after f(i, j)(let's assume that it has index pos): f(i + 1, j) = min(f(i + 1, j), pos).
4. So the only question remaining is how to get the leftmost matching element located to the right from a given position. Let's precompute the following pairs: (position in Y, element of X) -> the leftmost occurrence of the element of Y equal to this element of X to the right from this position in Y and put them into a hash table. It looks like O(n^2). But is not. For a fixed position in Y, we never need to go further to the right from it than by Q + 1 positions. Why? If we go further, we skip more than Q elements! So we can use this fact to examine only O(N * Q) pairs and get desired time complexity. When we have this hash table, finding pos during the step 3 is just one hash table lookup. Here is a pseudo code for this step:
map = EmptyHashMap()
for i = 0 ... n - 1:
for j = i + 1 ... min(n - 1, i + q + 1)
map[(i, Y[j])] = min(map[(i, Y[j])], j)
Unfortunately, this solution uses hash tables so it has O(N * Q) time complexity on average, not in the worst case, but it should be feasible.
You can also say cost of the process to make the string equal must not be greater than Q.if it greater than Q than answer must be false.(EDIT DISTANCE PROBLEM)
Suppose of the of string x is m, and the size of string y is n, then we create a two dimensional array d[0..m][0..n], where d[i][j] denotes the edit distance between the i-length prefix of x and j-length prefix of y.
The computation of array d is done using dynamic programming, which uses the following recurrence:
d[i][0] = i , for i <= m
d[0][j] = j , for j <= n
d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.
answer of LCS if m>n, m-dp[m][m-n]

Levenstein distance from particular group of numbers

My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is

The expected number of inversions--From Introduction to Algorithms by Cormen

Let A[1 .. n] be an array of n distinct numbers. If i < j and A[i] > A[j], then the pair (i, j) is called an inversion of A. (See Problem 2-4 for more on inversions.) Suppose that each element of A is chosen randomly, independently, and uniformly from the range 1 through n. Use indicator random variables to compute the expected number of inversions.
The problem is from exercise 5.2-5 in Introduction to Algorithms by Cormen. Here is my recursive solution:
Suppose x(i) is the number of inversions in a[1..i], and E(i) is the expected value of x(i), then E(i+1) can be computed as following:
Image we have i+1 positions to place all the numbers, if we place i+1 on the first position, then x(i+1) = i + x(i); if we place i+1 on the second position, then x(i+1) = i-1 + x(i),..., so E(i+1) = 1/(i+1)* sum(k) + E(i), where k = [0,i]. Finally we get E(i+1) = i/2 + E(i).
Because we know that E(2) = 0.5, so recursively we get: E(n) = (n-1 + n-2 + ... + 2)/2 + 0.5 = n* (n-1)/4.
Although the deduction above seems to be right, but I am still not very sure of that. So I share it here.
If there is something wrong, please correct me.
All the solutions seem to be correct, but the problem says that we should use indicator random variables. So here is my solution using the same:
Let Eij be the event that i < j and A[i] > A[j].
Let Xij = I{Eij} = {1 if (i, j) is an inversion of A
0 if (i, j) is not an inversion of A}
Let X = Σ(i=1 to n)Σ(j=1 to n)(Xij) = No. of inversions of A.
E[X] = E[Σ(i=1 to n)Σ(j=1 to n)(Xij)]
= Σ(i=1 to n)Σ(j=1 to n)(E[Xij])
= Σ(i=1 to n)Σ(j=1 to n)(P(Eij))
= Σ(i=1 to n)Σ(j=i + 1 to n)(P(Eij)) (as we must have i < j)
= Σ(i=1 to n)Σ(j=i + 1 to n)(1/2) (we can choose the two numbers in
C(n, 2) ways and arrange them
as required. So P(Eij) = C(n, 2) / n(n-1))
= Σ(i=1 to n)((n - i)/2)
= n(n - 1)/4
Another solution is even simpler, IMO, although it does not use "indicator random variables".
Since all of the numbers are distinct, every pair of elements is either an inversion (i < j with A[i] > A[j]) or a non-inversion (i < j with A[i] < A[j]). Put another way, every pair of numbers is either in order or out of order.
So for any given permutation, the total number of inversions plus non-inversions is just the total number of pairs, or n*(n-1)/2.
By symmetry of "less than" and "greater than", the expected number of inversions equals the expected number of non-inversions.
Since the expectation of their sum is n*(n-1)/2 (constant for all permutations), and they are equal, they are each half of that or n*(n-1)/4.
[Update 1]
Apparently my "symmetry of 'less than' and 'greater than'" statement requires some elaboration.
For any array of numbers A in the range 1 through n, define ~A as the array you get when you subtract each number from n+1. For example, if A is [2,3,1], then ~A is [2,1,3].
Now, observe that for any pair of numbers in A that are in order, the corresponding elements of ~A are out of order. (Easy to show because negating two numbers exchanges their ordering.) This mapping explicitly shows the symmetry (duality) between less-than and greater-than in this context.
So, for any A, the number of inversions equals the number of non-inversions in ~A. But for every possible A, there corresponds exactly one ~A; when the numbers are chosen uniformly, both A and ~A are equally likely. Therefore the expected number of inversions in A equals the expected number of inversions in ~A, because these expectations are being calculated over the exact same space.
Therefore the expected number of inversions in A equals the expected number of non-inversions. The sum of these expectations is the expectation of the sum, which is the constant n*(n-1)/2, or the total number of pairs.
[Update 2]
A simpler symmetry: For any array A of n elements, define ~A as the same elements but in reverse order. Associate the element at position i in A with the element at position n+1-i in ~A. (That is, associate each element with itself in the reversed array.)
Now any inversion in A is associated with a non-inversion in ~A, just as with the construction in Update 1 above. So the same argument applies: The number of inversions in A equals the number of inversions in ~A; both A and ~A are equally likely sequences; etc.
The point of the intuition here is that the "less than" and "greater than" operators are just mirror images of each other, which you can see either by negating the arguments (as in Update 1) or by swapping them (as in Update 2). So the expected number of inversions and non-inversions is the same, since you cannot tell whether you are looking at any particular array through a mirror or not.
Even simpler (similar to Aman's answer above, but perhaps clearer) ...
Let Xij be a random variable with Xij=1 if A[i] > A[j] and Xij=0 otherwise.
Let X=sum(Xij) over i, j where i < j
Number of pairs (ij)*: n(n-1)/2
Probability that Xij=1 (Pr(Xij=1))): 1/2
By linearity of expectation**: E(X) = E(sum(Xij))
= sum(E(Xij))
= sum(Pr(Xij=1))
= n(n-1)/2 * 1/2
= n(n-1)/4
* I think of this as the size of the upper triangle of a square matrix.
** All sums here are over i, j, where i < j.
I think it's right, but I think the proper way to prove it is to use conditionnal expectations :
for all X and Y we have : E[X] =E [E [X|Y]]
then in your case :
E(i+1) = E[x(i+1)] = E[E[x(i+1) | x(i)]] = E[SUM(k)/(1+i) + x(i)] = i/2 + E[x(i)] = i/2 + E(i)
about the second statement :
if :
E(n) = n* (n-1)/4.
then E(n+1) = (n+1)*n/4 = (n-1)*n/4 + 2*n/4 = (n-1)*n/4 + n/2 = E(n) +n/2
So n* (n-1)/4. verify the recursion relation for all n >=2 and it verifies it for n=2
So E(n) = n*(n-1)/4
Hope I understood your problem and it helps
Using indicator random variables:
Let X = random variable which is equal to the number of inversions.
Let Xij = 1 if A[i] and A[j] form an inversion pair, and Xij = 0 otherwise.
Number of inversion pairs = Sum over 1 <= i < j <= n of (Xij)
Now P[Xij = 1] = P[A[i] > A[j]] = (n choose 2) / (2! * n choose 2) = 1/2
E[X] = E[sum over all ij pairs such that i < j of Xij] = sum over all ij pairs such that i < j of E[Xij] = n(n - 1) / 4

Resources