For a subset of 5 vertices find P(all edges between these vertices are present in G) - probability

For a random graph, G, on n vertices's, each possible edge is present independently with probability k, 0 <= k <= 1.
I seek P(all edges between these vertices's are present in G)
My thoughts so far
If we have the empty subset, p = 1
If we have a one element set, p = 1
If we have a two element set, p = k
If we have a three element set, p = k^3
If we have a four element st, p = k^6
If we have a five element set, p = k^10.
If the above is correct, then I can capture the probability as the following: P = k^(n C 2)
However, this only works for two - five element set. If I have a
one or two element set the following if incorrect. If I am understanding everything correctly up to this point, how can I capture the other two cases?
Is the only possibility a piecewise defined function?
If n=0 or n = 1, 1
Otherwise, k^(n C 2)

Actually, your formula works in all cases because:
n C 2 = 0, for n < 2
And thus:
k^(n C 2) = k^0 = 1, for n < 2

Related

Efficient way to find min(max(A[L], A[L+1],...,A[R]), min(B[L], B[L+1],…, B[R]))

Given 2 array A[N] and B[N]. For each 0 <= L < N <= 5e5, find the maximum value of
min(max(A[L], A[L+1],...,A[R]), min(B[L], B[L+1],…, B[R]))
for L <= R <= N.
ans[L] is the answer for L.
For example,
N = 3
A[3] = {3, 2, 1}
B[3] = {3, 2, 3}
So, the answer is
ans[0] = 3
ans[1] = 2
ans[2] = 1
It is clear that brute-forces can run fast.
Then, I tried using Sparse table, Segment Tree or Binary Indexed Tree (and we don't need to update anything, so I choose Sparse Table). But for each L, we don't know R, so I need to run to the end of the array, and this doesn't different from brute-forces .
Is there any efficient algorithm or data structures for this problem??
P/s: Sorry for my bad English.
Using Sparse table A is monotone increasing, B is monotone decreasing, so we need to find the crossing point to get the max out of their min ...
pseudo python code untested
stA = SparseTable(A);
stB = SparseTable(B);
for (i in range(len(A))
r = len(B)
l = i
a = stA.max(l,r)
b = stB.min(l,r)
# binary search for crossing point
while (l != r)
m = l + (r-l)//2 # integer division
a = stA.max(l,m)
b = stB.min(l,m)
if (b > a)
l = m + 1
else
r = m
ans[i] = min(a,b) # might be off-by-one m?
max(A[L], A[L+1], ..., A[R]) is non-increasing in L and non-decreasing in R. Conversely, min(B[L], B[L+1], ..., B[R]) is non-decreasing in L and non-increasing in R. It follows that the function from L to the argmax in R is non-decreasing. The last ingredient is two queues, one that can report max, one that can report min, to quickly compute the sliding window aggregates.

How to find largest square of palindrome in a matrix

I am trying to solve a problem where I am given a nXn square matrix of characters and I want to find out size of the largest palindrome square from this? The largest palindrome square is, a square with all rows and all columns as palindrome.
For eg.
Input
a g h j k
s d g d j
s e f e n
a d g d h
r y d g s
The output will be:
3
corresponding to the middle square. I am thinking of dynamic programming solution but unable to formulate the recurrence relation. I am thinking the dimensions should be a(i,j,k) where i, j are the bottom-right of rectangle and k be the size of palindrome square.
Can someone help me with the recurrence relation for this problem?
EDIT:
n<500, so I believe that I can't go beyond O(n^3).
Assuming that you can solve the following problem:
Ending at cell (i, j) is there any palindrome with different length horizontally and vertically.
Hint for above problem:
boolean[][][]palindrome;//Is there any palindrome ending at (i , j) has length k
for(int i = 0; i < n; i++){
for(int j = 0; j < n; j++){
palindrome[i][j][0] = true;
palindrome[i][j][1] = true;
for(int k = 2; k <= n; k++)
if(data[i][j - k + 1] == data[i][j] && palindrome[i][j - 1][k - 2])
palindrome[i][j][k] = true;
}
}
So, we can create two three dimensional arrays int[n][n][n]col and int[n][n][n]row.
For each cell(i, j), we will calculate the total number of palindrome with length k, ending at cell (0, j), (1, j) , ... (i, j) and total number of palindrome with length k, ending at cell (i,0), (i, 1), ... (i, j)
for(int k = 1; k <= n; k++)
if(there is palindrome length k horizontally, end at cell (i, j))
row[i][j][k] = 1 + row[i - 1][j][k];
if(there is palindrome length k vertically, end at cell (i, j))
col[i][j][k] = 1 + col[i][j - 1][k];
Finally, if row[i][j][k] >= k && col[i][j][k] >= k -> there is an square palindrome length k ending at (i,j).
In total, the time complexity will be O(n^3)
lets start with the complexity of validating a palindrome:
A palindrome can be identified in O(k) where k is the length of the palindrome see here
you then want need to do that test 2k times once for each row and column in you r inner square. (using the length of the palindrome k, as the dimension)
so now you have k * 2k -> O(2k^2) -> O(k^2)
then you want to increase the possible search space to the whole data set nxn this is when a 2nd variable gets introduced
you will need to iterate over columns 1 to (n-k) and all rows 1 to (n-k) in a nested loop.
so now you have (n-k)^2 * O(k^2) -> O(n^2 * k^2)
Note: this problem is dependant on more than one variable
This is the same approach i suggest you take to coding the solution, start small and get bigger
Im sure there is probably a better way, and im pretty sure my logic is correct so take this at face value as its not tested.
Just to make the example easy im going to say that i,j is the top left corner or coordinates 1,1
1 2 3 4 5 6 7 8
1 a b c d e f g f
2 d e f g h j k q
3 a b g d z f g f
4 a a a a a a a a
ie (1,1) = a, (1,5) = e and (2,1) = d
now instead of checking every column you could start by checking every kth column
ie when k=3
1) create a 2D boolean array the size of the character table all results TRUE
2) I start by checking column 3 cfg which is not a palindrome, thus I no longer need to test columns 1 or 2.
3) because the palindrome test failed marked the coresponding result in the 2D array (1,3) as FALSE (I know not to test any range that uses this position as it is not a palindrome)
4) Next check column 6, fjf which is a palindrome so I go back and test column 5, ehz != a palindrome
5) set (1,5) = FALSE
6) Then test column 8 then 7,
NOTE: You have only had to test 5 of the 8 columns.
since there were k columns in a row that were palindromes, now test the corresponding rows. Start from the bottom row in this case 3 as it will eliminate the most other checks if it fails
7) check row starting at (3,6) fgf = palindrome
8) check row starting at (2,6) jkq != a palindrome
9) set (2,6) = FALSE
10) check column starting at (2,3) daa != palindrome
11) set (2,3) = FALSE
Dont need to test any more for row 2 as both (2,3) and (2,6) are FALSE
Hopefully you can make sense of that.
Note: you would probably start this at k = n and decrement k until you find a result

Pseudo Algorithm that takes 4 inputs and prints the sum of largest/best of three numbers

I need a little help with my assignment in Pseudo codes:
Take input of 4 numbers and print the sum of the largest 3 numbers.
For example:
inputs: 14, 1, 9, 3
output: 14+9+3 => 26
How can I write an algorithm in pseudo codes of the above task?
So far i have come to this:
input a, b, c, d
declare h1, h2, h3
if(a>=b && a>=c && a>=d) h1 = a
if(b>=a && b>=c && b>=d) h2 = b
if(c>=a && c>=b && c>=d) h3 = c
if(d>=a && d>=b && d>=c) h4 = d
print h1+h2+h3
Is this any good?
Let's say that inputs are in array t.
Let sum = t[0].
Let min = t[0].
For i from 1 to 3 repeat steps 5 and 6:
sum += t[i].
if (min > t[i]) min = t[i].
Return sum - min.
Another approach, which you and Brian presented, boils down to sorting (for n=4 you can do it "manually" like you did, but for larger n it's not a good idea) and then taking the sum.
I prefer approach shown above, because it's guaranteed linear time complexity, scales nicely and it's easy to implement. It does exactly one pass through the input data and can be used if the input is streamed one by one and we don't want to store the inputs in the memory (we want to do it "on the fly"). Sorting can be more expensive than linear (can be nlogn) if there are no assumptions whatsoever about the input data.
Your pseudo code is off to a good start. But right now you only find the single largest of your four numbers. You would need to repeat two more times (ignoring the largest) to find the 2nd and 3rd largest.
Another clever idea, though, is if you need the 3 largest numbers, your 4th number must be the smallest. Find the smallest number, then add the others.
input a, b, c, d
declare min
// find the smallest
min = a
if (b < min) min = b
if (c < min) min = c
if (d < min) min = d
// the sum of the largest 3 = the sum of all 4 minus the minimum
print a + b + c + d - min
Use recursion. If the first number in the input is smallest, sum the other three, otherwise rotate the inputs and call recursively. Eventually the smallest number will be a, so the other three are the largest, and you can sum them and return the answer. In pseudocode:
function sum3max(a, b, c, d)
if a == min(a, b, c, d)
return b + c + d
return sum3max(b, c, d, a)

Recursion Puzzle

Recently, one of my friends challenged me to solve this puzzle which goes as follows:
Suppose that you have two variables x and y. These are the only variables which can be used for storage in the program. There are three operations which can be done:
Operation 1: x = x+y
Operation 2: x = x-y
Operation 3: y = x-y
Now, you are given two number n1 and n2 and a target number k. Starting with x = n1 and y = n2, is there a way to arrive at x = k using the operations mentioned above? If yes, what is the sequence of operations which can generate x = k.
Example: If n1 = 16, n2 = 6 and k = 28 then the answer is YES. The sequence is:
Operation 1
Operation 1
If n1 = 19, n2 = 7 and k = 22 then the answer is YES. The sequence is:
Operation 2
Operation 3
Operation 1
Operation 1
Now, I have wrapped my head around the problem for too long but I am not getting any initial thoughts. I have a feeling that this is recursion but I do not know what should be the boundary conditions. It would be very helpful if someone can direct me towards an approach which can be used to solve this problem. Thanks!
Maybe not a complete answer, but a proof that a sequence exists if and only if k is a multiple of the greatest common divisor (GCD) of n1 and n2. Let's write G = GCD(n1, n2) for brevity.
First I'll prove that x and y are always integer multiples of the G. This proof is really straightforward by induction. Hypothesis: x = p * G and y = q * G, for some integers p and q.
Initially, the hypothesis holds by definition of G.
Each of the rules respects the induction hypothesis. The rules yield:
x + y = p * G + q * G = (p + q) * G
x - y = p * G - q * G = (p - q) * G
y - x = q * G - p * G = (q - p) * G
Due to this result, there can only be a sequence to k if k is an integer multiple of the GCD of n1 and n2.
For the other direction we need to show that any integer multiple of G can be achieved by the rules. This is definitely the case if we can reach x = G and y = G. For this we use Euclid's algorithm. Consider the second implementation in the linked wiki article:
function gcd(a, b)
while a ≠ b
if a > b
a := a − b
else
b := b − a
return a
This is a repetitive application of rules 2 and 3 and results in x = G and y = G.
Knowing that a solution exists, you can apply a BFS, as shown in Amit's answer, to find the shortest sequence.
Assuming a solution exists, finding the shortest sequence to get to it can be done using a BFS.
The pseudo code should be something like:
queue <- new empty queue
parent <- new map of type map:pair->pair
parent[(x,y)] = 'root' //special indicator to stop the search there
queue.enqueue(pair(x,y))
while !queue.empty():
curr <- queue.dequeue()
x <- curr.first
y <- curr.second
if x == target or y == target:
printSolAccordingToMap(parent,(x,y))
return
x1 <- x+y
x2 <- x-y
y1 <- x-y
if (x1,y) is not a key in parent:
parent[(x1,y)] = (x,y)
queue.enqueue(pair(x1,y))
//similarly to (x2,y) and (x,y1)
The function printSolAccordingToMap() simply traces back on the map until it finds the root, and prints it.
Note that this solution only finds the optimal sequence if one exists, but will cause infinite loop if one does not exist, so this is only partial answer yet.
Consider that you have both (x,y) always <= target & >0 if not you can always bring them in the range by simple operations. If you consider this constraints you can make a graph where there are O(target*target) nodes and edge you can find by doing an operation among three on that node. You now need to evaluate the shortest path from start position node to target node which is (target,any). The assumption here is (x,y) values always stay within (0,target). The time complexity is O(target*target*log(target)) using djikstra.
In the Vincent's answer, I think the proof is not complete.
Let us suppose two relatively prime numbers suppose n1=19 and n2=13 whose GCD will be 1. According to him, sequence exits if k is multiple of GCD.Since every number is multiple of 1. I think it is not possible for every k.

Find pairs in an array such that a%b = k , where k is a given integer

Here is an interesting programming puzzle I came across . Given an array of positive integers, and a number K. We need to find pairs(a,b) from the array such that a % b = K.
I have a naive O(n^2) solution to this where we can check for all pairs such that a%b=k. Works but inefficient. We can certainly do better than this can't we ? Any efficient algorithms for the same? Oh and it's NOT homework.
Sort your array and binary search or keep a hash table with the count of each value in your array.
For a number x, we can find the largest y such that x mod y = K as y = x - K. Binary search for this y or look it up in your hash and increment your count accordingly.
Now, this isn't necessarily the only value that will work. For example, 8 mod 6 = 8 mod 3 = 2. We have:
x mod y = K => x = q*y + K =>
=> x = q(x - K) + K =>
=> x = 1(x - K) + K =>
=> x = 2(x - K)/2 + K =>
=> ...
This means you will have to test all divisors of y as well. You can find the divisors in O(sqrt y), giving you a total complexity of O(n log n sqrt(max_value)) if using binary search and O(n sqrt(max_value)) with a hash table (recommended especially if your numbers aren't very large).
Treat the problem as having two separate arrays as input: one for the a numbers and a % b = K and one for the b numbers. I am going to assume that everything is >= 0.
First of all, you can discard any b <= K.
Now think of every number in b as generating a sequence K, K + b, K + 2b, K + 3b... You can record this using a pair of numbers (pos, b), where pos is incremented by b at each stage. Start with pos = 0.
Hold these sequences in a priority queue, so you can find the smallest pos value at any given time. Sort the array of a numbers - in fact you could do this ahead of time and discard any duplicates.
For each a number
While the smallest pos in the priority queue is <= a
Add the smallest multiple of b to it to make it >= a
If it is == a, you have a match
Update the stored value of pos for that sequence, re-ordering the priority queue
At worst, you end up comparing every number with every other number, which is the same as the simple solution, but with priority queue and sorting overhead. However, large values of b may remain unexamined in the priority queue while several a numbers pass through, in which case this does better - and if there are a lot of numbers to process and they are all different, some of them must be large.
This answer mentions the main points of an algorithm (called DL because it uses “divisor lists” ) and gives details via a program, called amodb.py.
Let B be the input array, containing N positive integers. Without much loss of generality, suppose B[i] > K for all i and that B is in ascending order. (Note that x%B[i] < K if B[i] < K; and where B[i] = K, one can report pairs (B[i], B[j]) for all j>i. If B is not sorted initially, charge a cost of O(N log N) to sort it.)
In algorithm DL and program amodb.py, A is an array with K pre-subtracted from the input array elements. Ie, A[i] = B[i] - K. Note that if a%b == K, then for some j we have a = b*j + K or a-K = b*j. That is, a%b == K iff a-K is a multiple of b. Moreover, if a-K = b*j and p is any factor of b, then p is a factor of a-K.
Let the prime numbers from 2 to 97 be called “small factors”. When N numbers are uniformly randomly selected from some interval [X,Y], on the order of N/ln(Y) of the numbers will have no small factors; a similar number will have a greatest small factor of 2; and declining proportions will have successively larger greatest small factors. For example, on the average about N/97 will be divisible by 97, about N/89-N/(89*97) by 89 but not 97, etc. Generally, when members of B are random, lists of members with certain greatest small factors or with no small factors are sub-O(N/ln(Y)) in length.
Given a list Bd containing members of B divisible by largest small factor p, DL tests each element of Bd against elements of list Ad, those elements of A divisible by p. But given a list Bp for elements of B without small factors, DL tests each of Bp's elements against all elements of A. Example: If N=25, p=13, Bd=[18967, 23231], and Ad=[12779, 162383], then DL tests if any of 12779%18967, 162383%18967, 12779%23231, 162383%23231 are zero. Note that it is possible to cut the number of tests in half in this example (and many others) by noticing 12779<18967, but amodb.py does not include that optimization.
DL makes J different lists for J different factors; in one version of amodb.py, J=25 and the factor set is primes less than 100. A larger value of J would increase the O(N*J) time to initialize divisor lists, but would slightly decrease the O(N*len(Bp)) time to process list Bp against elements of A. See results below. Time to process other lists is O((N/logY)*(N/logY)*J), which is in sharp contrast to the O(n*sqrt(Y)) complexity for a previous answer's method.
Shown next is output from two program runs. In each set, the first Found line is from a naïve O(N*N) test, and the second is from DL. (Note, both DL and the naïve method would run faster if too-small A values were progressively removed.) The time ratio in the last line of the first test shows a disappointingly low speedup ratio of 3.9 for DL vs naïve method. For that run, factors included only the 25 primes less than 100. For the second run, with better speedup of ~ 4.4, factors included numbers 2 through 13 and primes up to 100.
$ python amodb.py
N: 10000 K: 59685 X: 100000 Y: 1000000
Found 208 matches in 21.854 seconds
Found 208 matches in 5.598 seconds
21.854 / 5.598 = 3.904
$ python amodb.py
N: 10000 K: 97881 X: 100000 Y: 1000000
Found 207 matches in 21.234 seconds
Found 207 matches in 4.851 seconds
21.234 / 4.851 = 4.377
Program amodb.py:
import random, time
factors = [2,3,4,5,6,7,8,9,10,11,12,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
X, N = 100000, 10000
Y, K = 10*X, random.randint(X/2,X)
print "N: ", N, " K: ", K, "X: ", X, " Y: ", Y
B = sorted([random.randint(X,Y) for i in range(N)])
NP = len(factors); NP1 = NP+1
A, Az, Bz = [], [[] for i in range(NP1)], [[] for i in range(NP1)]
t0 = time.time()
for b in B:
a, aj, bj = b-K, -1, -1
A.append(a) # Add a to A
for j,p in enumerate(factors):
if a % p == 0:
aj = j
Az[aj].append(a)
if b % p == 0:
bj = j
Bz[bj].append(b)
Bp = Bz.pop() # Get not-factored B-values list into Bp
di = time.time() - t0; t0 = time.time()
c = 0
for a in A:
for b in B:
if a%b == 0:
c += 1
dq = round(time.time() - t0, 3); t0 = time.time()
c=0
for i,Bd in enumerate(Bz):
Ad = Az[i]
for b in Bd:
for ak in Ad:
if ak % b == 0:
c += 1
for b in Bp:
for ak in A:
if ak % b == 0:
c += 1
dr = round(di + time.time() - t0, 3)
print "Found", c, " matches in", dq, "seconds"
print "Found", c, " matches in", dr, "seconds"
print dq, "/", dr, "=", round(dq/dr, 3)

Resources