Finding a Hamiltonian path in Julia Language - algorithm

I'm a beginner.
I am starting my studies in image processing and I am using the Julia Language.
I have tried until unsuccessfully to develop the Algorithm 1 of this article https://hal.archives-ouvertes.fr/hal-03330433/document, on page 3.
I noticed that there is a similar algorithm proposed in Matlab and in C on this site https://elad.cs.technion.ac.il/software/?pn=1427. The authors provided the algorithm and the article where these ideas were proposed.
I read and studied both, but I could not to do in Julia Language.
The algorithm proposed is
where, the graph G is a pair (V, E), with V the color values {v1, v2, ..., vm} and E the edge eij = (vi, vj).
I think that the constant alpha is 10^6. But I need to read the second paper again.
To show my doubts, I am placing a random RGB color image of size 6 x 5 (30 pixels), putting the color set T as a 3 x 30 matrix and putting the random probability vector as p.
After that I am generating the list L and choose j in L.
julia>
using Colors, FixedPointNumbers
using Images
using Statistics
using LinearAlgebra
img = rand(RGB{N0f8}, 6, 5) # random RGB color image of size 6 x 5
a, b = size(img)
T = reshape(channelview(img), 3, a*b) # matrix T of 3 x 30. Every column is a r, g, b color of img
p = rand(a*b) # random vector p of 30 elements
L = [i for i=1:30] #index list
j = rand(L) #choose a random j in L
#find the pixel (pix_i, pix_j) of the image
if j % a == 0
pix_i = a
pix_j = div(j, a)
else
pix_i = j % a
pix_j = 1 + div(j, a)
end
#The edges eij = (vj, vk)#
for i = pix_i - 1: pix_i+1
for j = pix_j -1 : pix_j + 1
N = CartesianIndex(i, j)
end
end
After that, I don't know how can I continue.
Here in that last line I already have a problem. Because I'm not able to build the set N(vj) with the indices.
In fact, I also don't know how to add to the end the vectors that we get at each iteration.
That is, my problem is in the construction of this set P and so on.
Regarding the N(vj) inside of the command "for" that the author proposes, I am considering a 3 x 3 square. So, for example, for a pixel at position (4, 3) in the first iteration (i.e., j = (4,3)), I would have for absolute value of N(vj) is 8, because in the first iteration, we will get the pixels (3, 2), (3,3), (3,4), (4,2), (4,3), (4,4), (5,2), (5,3) and (5,4).
So N(vj) = {(3, 2), (3,3), (3,4), (4,2), (4,3), (4,4), (5,2), (5,3),(5,4)} \ {(4,3)}.
So the absolute value is 8.
But, I can't do it.
If you can help me, I would be very grateful.

Related

Longest Increasing Subsequence variations dynamic programming

I have this question:
Given the following:
A = [9,6,9,3,8,9,2,0,4,12]
C = [r,g,r,g,r,g,r,r,r,g]
Where
- r = red
- g = green
This list represent the color of the number in the same index in array A i.e. A[0] = 9 = red, A[1] = 6 = green, ...
We need to pick a number N to start, if the number is green we can only move right (by any numbers) to a number that are >=N greater than the current one.
If the number N is red, we can only move left (by any numbers) to a number that are >=N greater than the current one.
Objective: find the longest sequence of moves possible, return the indices of the path. If there are multiple subsequences of the same length that are longest, return anyone:
Example 1:
A = [9,6,9,3,8,9,2,0,4,12]
C = [r,g,r,g,r,g,r,r,r,g]
output: [7,6,3,8,1,4,0]
Example 2:
A = [1,2,3,4,5,6,7,10]
C =[r,r,r,r,r,r,r,r]
output:[7]
Example 3:
A = [5,3,2,0,24,9,20]
C = [g,g,g,g,r,r,g]
output: [0,5,4]
Current idea of my algorithm:
Consider possible moves for every element in A, for the first example, A[0] = 9 = red.
As there is no left elements, there is only 1 move (choose A[0]).
So, OPT[0] = 1. For A[1] = 6 = green.
Possible move are: A[2]= 9, A[4] = 8, A[5] = 9, A[9] =12.
Recursion is OPT[i] = max{1, 1+ OPT[j]} where j is the next possible move.
Am I on the right track using dynamic programming? The runtime is O(n²) isn't it?

How do I implement cross-correlation to prove two images of the same scene are similar? [duplicate]

How can I select a random point on one image, then find its corresponding point on another image using cross-correlation?
So basically I have image1, I want to select a point on it (automatically) then find its corresponding/similar point on image2.
Here are some example images:
Full image:
Patch:
Result of cross correlation:
Well, xcorr2 can essentially be seen as analyzing all possible shifts in both positive and negative direction and giving a measure for how well they fit with each shift. Therefore for images of size N x N the result must have size (2*N-1) x (2*N-1), where the correlation at index [N, N] would be maximal if the two images where equal or not shifted. If they were shifted by 10 pixels, the maximum correlation would be at [N-10, N] and so on. Therefore you will need to subtract N to get the absolute shift.
With your actual code it would probably be easier to help. But let's look at an example:
(A) We read an image and select two different sub-images with offsets da and db
Orig = imread('rice.png');
N = 200; range = 1:N;
da = [0 20];
db = [30 30];
A=Orig(da(1) + range, da(2) + range);
B=Orig(db(1) + range, db(2) + range);
(b) Calculate cross-correlation and find maximum
X = normxcorr2(A, B);
m = max(X(:));
[i,j] = find(X == m);
(C) Patch them together using recovered shift
R = zeros(2*N, 2*N);
R(N + range, N + range) = B;
R(i + range, j + range) = A;
(D) Illustrate things
figure
subplot(2,2,1), imagesc(A)
subplot(2,2,2), imagesc(B)
subplot(2,2,3), imagesc(X)
rectangle('Position', [j-1 i-1 2 2]), line([N j], [N i])
subplot(2,2,4), imagesc(R);
(E) Compare intentional shift with recovered shift
delta_orig = da - db
%--> [30 10]
delta_recovered = [i - N, j - N]
%--> [30 10]
As you see in (E) we get exactly the shift we intenionally introduced in (A).
Or adjusted to your case:
full=rgb2gray(imread('a.jpg'));
template=rgb2gray(imread('b.jpg'));
S_full = size(full);
S_temp = size(template);
X=normxcorr2(template, full);
m=max(X(:));
[i,j]=find(X==m);
figure, colormap gray
subplot(2,2,1), title('full'), imagesc(full)
subplot(2,2,2), title('template'), imagesc(template),
subplot(2,2,3), imagesc(X), rectangle('Position', [j-20 i-20 40 40])
R = zeros(S_temp);
shift_a = [0 0];
shift_b = [i j] - S_temp;
R((1:S_full(1))+shift_a(1), (1:S_full(2))+shift_a(2)) = full;
R((1:S_temp(1))+shift_b(1), (1:S_temp(2))+shift_b(2)) = template;
subplot(2,2,4), imagesc(R);
However, for this method to work properly the patch (template) and the full image should be scaled to the same resolution.
A more detailed example can also be found here.

Enumerate matrix combinations with fixed row and column sums

I'm attempting to find an algorithm (not a matlab command) to enumerate all possible NxM matrices with the constraints of having only positive integers in each cell (or 0) and fixed sums for each row and column (these are the parameters of the algorithm).
Exemple :
Enumerate all 2x3 matrices with row totals 2, 1 and column totals 0, 1, 2:
| 0 0 2 | = 2
| 0 1 0 | = 1
0 1 2
| 0 1 1 | = 2
| 0 0 1 | = 1
0 1 2
This is a rather simple example, but as N and M increase, as well as the sums, there can be a lot of possibilities.
Edit 1
I might have a valid arrangement to start the algorithm:
matrix = new Matrix(N, M) // NxM matrix filled with 0s
FOR i FROM 0 TO matrix.rows().count()
FOR j FROM 0 TO matrix.columns().count()
a = target_row_sum[i] - matrix.rows[i].sum()
b = target_column_sum[j] - matrix.columns[j].sum()
matrix[i, j] = min(a, b)
END FOR
END FOR
target_row_sum[i] being the expected sum on row i.
In the example above it gives the 2nd arrangement.
Edit 2:
(based on j_random_hacker's last statement)
Let M be any matrix verifying the given conditions (row and column sums fixed, positive or null cell values).
Let (a, b, c, d) be 4 cell values in M where (a, b) and (c, d) are on the same row, and (a, c) and (b, d) are on the same column.
Let Xa be the row number of the cell containing a and Ya be its column number.
Example:
| 1 a b |
| 1 2 3 |
| 1 c d |
-> Xa = 0, Ya = 1
-> Xb = 0, Yb = 2
-> Xc = 2, Yc = 1
-> Xd = 2, Yd = 2
Here is an algorithm to get all the combinations verifying the initial conditions and making only a, b, c and d varying:
// A matrix array containing a single element, M
// It will be filled with all possible combinations
matrices = [M]
I = min(a, d)
J = min(b, c)
FOR i FROM 1 TO I
tmp_matrix = M
tmp_matrix[Xa, Ya] = a - i
tmp_matrix[Xb, Yb] = b + i
tmp_matrix[Xc, Yc] = c - i
tmp_matrix[Xd, Yd] = d + i
matrices.add(tmp_matrix)
END FOR
FOR j FROM 1 TO J
tmp_matrix = M
tmp_matrix[Xa, Ya] = a + j
tmp_matrix[Xb, Yb] = b - j
tmp_matrix[Xc, Yc] = c + j
tmp_matrix[Xd, Yd] = d - j
matrices.add(tmp_matrix)
END FOR
It should then be possible to find every possible combination of matrix values:
Apply the algorithm on the first matrix for every possible group of 4 cells ;
Recursively apply the algorithm on each sub-matrix obtained by the previous iteration, for every possible group of 4 cells except any group already used in a parent execution ;
The recursive depth should be (N*(N-1)/2)*(M*(M-1)/2), each execution resulting in ((N*(N-1)/2)*(M*(M-1)/2) - depth)*(I+J+1) sub-matrices. But this creates a LOT of duplicate matrices, so this could probably be optimized.
Are you needing this to calculate Fisher's exact test? Because that requires what you're doing, and based on that page, it seems there will in general be a vast number of solutions, so you probably can't do better than a brute force recursive enumeration if you want every solution. OTOH it seems Monte Carlo approximations are successfully used by some software instead of full-blown enumerations.
I asked a similar question, which might be helpful. Although that question deals with preserving frequencies of letters in each row and column rather than sums, some results can be translated across. E.g. if you find any submatrix (pair of not-necessarily-adjacent rows and pair of not-necessarily-adjacent columns) with numbers
xy
yx
Then you can rearrange these to
yx
xy
without changing any row or column sums. However:
mhum's answer proves that there will in general be valid matrices that cannot be reached by any sequence of such 2x2 swaps. This can be seen by taking his 3x3 matrices and mapping A -> 1, B -> 2, C -> 4 and noticing that, because no element appears more than once in a row or column, frequency preservation in the original matrix is equivalent to sum preservation in the new matrix. However...
someone's answer links to a mathematical proof that it actually will work for matrices whose entries are just 0 or 1.
More generally, if you have any submatrix
ab
cd
where the (not necessarily unique) minimum is d, then you can replace this with any of the d+1 matrices
ef
gh
where h = d-i, g = c+i, f = b+i and e = a-i, for any integer 0 <= i <= d.
For a NXM matrix you have NXM unknowns and N+M equations. Put random numbers to the top-left (N-1)X(M-1) sub-matrix, except for the (N-1, M-1) element. Now, you can find the closed form for the rest of N+M elements trivially.
More details: There are total of T = N*M elements
There are R = (N-1)+(M-1)-1 randomly filled out elements.
Remaining number of unknowns: T-S = N*M - (N-1)*(M-1) +1 = N+M

How to approach Vertical Sticks challenge?

This problem is taken from interviewstreet.com
Given array of integers Y=y1,...,yn, we have n line segments such that
endpoints of segment i are (i, 0) and (i, yi). Imagine that from the
top of each segment a horizontal ray is shot to the left, and this ray
stops when it touches another segment or it hits the y-axis. We
construct an array of n integers, v1, ..., vn, where vi is equal to
length of ray shot from the top of segment i. We define V(y1, ..., yn)
= v1 + ... + vn.
For example, if we have Y=[3,2,5,3,3,4,1,2], then v1, ..., v8 =
[1,1,3,1,1,3,1,2], as shown in the picture below:
For each permutation p of [1,...,n], we can calculate V(yp1, ...,
ypn). If we choose a uniformly random permutation p of [1,...,n], what
is the expected value of V(yp1, ..., ypn)?
Input Format
First line of input contains a single integer T (1 <= T <= 100). T
test cases follow.
First line of each test-case is a single integer N (1 <= N <= 50).
Next line contains positive integer numbers y1, ..., yN separated by a
single space (0 < yi <= 1000).
Output Format
For each test-case output expected value of V(yp1, ..., ypn), rounded
to two digits after the decimal point.
Sample Input
6
3
1 2 3
3
3 3 3
3
2 2 3
4
10 2 4 4
5
10 10 10 5 10
6
1 2 3 4 5 6
Sample Output
4.33
3.00
4.00
6.00
5.80
11.15
Explanation
Case 1: We have V(1,2,3) = 1+2+3 = 6, V(1,3,2) = 1+2+1 = 4, V(2,1,3) =
1+1+3 = 5, V(2,3,1) = 1+2+1 = 4, V(3,1,2) = 1+1+2 = 4, V(3,2,1) =
1+1+1 = 3. Average of these values is 4.33.
Case 2: No matter what the permutation is, V(yp1, yp2, yp3) = 1+1+1 =
3, so the answer is 3.00.
Case 3: V(y1 ,y2 ,y3)=V(y2 ,y1 ,y3) = 5, V(y1, y3, y2)=V(y2, y3, y1) =
4, V(y3, y1, y2)=V(y3, y2, y1) = 3, and average of these values is
4.00.
A naive solution to the problem will run forever for N=50. I believe that the problem can be solved by independently calculating a value for each stick. I still need to know if there is any other efficient approach for this problem. On what basis do we have to independently calculate value for each stick?
We can solve this problem, by figure out:
if the k th stick is put in i th position, what is the expected ray-length of this stick.
then the problem can be solve by adding up all the expected length for all sticks in all positions.
Let expected[k][i] be the expected ray-length of k th stick put in i th position, let num[k][i][length] be the number of permutations that k th stick put in i th position with ray-length equals to length, then
expected[k][i] = sum( num[k][i][length] * length ) / N!
How to compute num[k][i][length]? For example, for length=3, consider the following graph:
...GxxxI...
Where I is the position, 3 'x' means we need 3 sticks that are strictly lower then I, and G means we need a stick that are at least as high as I.
Let s_i be the number of sticks that are smaller then the k th the stick, and g_i be the number of sticks that are greater or equal to the k th stick, then we can choose any one of g_i to put in G position, we can choose any length of s_i to fill the x position, so we have:
num[k][i][length] = P(s_i, length) * g_i * P(n-length-1-1)
In case that all the positions before I are all smaller then I, we don't need a greater stick in G, i.e. xxxI...., we have:
num[k][i][length] = P(s_i, length) * P(n-length-1)
And here's a piece of Python code that can solve this problem:
def solve(n, ys):
ret = 0
for y_i in ys:
s_i = len(filter(lambda x: x < y_i, ys))
g_i = len(filter(lambda x: x >= y_i, ys)) - 1
for i in range(n):
for length in range(1, i+1):
if length == i:
t_ret = combination[s_i][length] * factorial[length] * factorial[ n - length - 1 ]
else:
t_ret = combination[s_i][length] * factorial[length] * g_i * factorial[ n - length - 1 - 1 ]
ret += t_ret * length
return ret * 1.0 / factorial[n] + n
This is the same question as https://cs.stackexchange.com/questions/1076/how-to-approach-vertical-sticks-challenge and my answer there (which is a little simpler than those given earlier here) was:
Imagine a different problem: if you had to place k sticks of equal heights in n slots then the expected distance between sticks (and the expected distance between the first stick and a notional slot 0, and the expected distance between the last stick and a notional slot n+1) is (n+1)/(k+1) since there are k+1 gaps to fit in a length n+1.
Returning to this problem, a particular stick is interested in how many sticks (including itself) as as high or higher. If this is k, then the expected gap before it is also (n+1)/(k+1).
So the algorithm is simply to find this value for each stick and add up the expectation. For example, starting with heights of 3,2,5,3,3,4,1,2, the number of sticks with a greater or equal height is 5,7,1,5,5,2,8,7 so the expectation is 9/6+9/8+9/2+9/6+9/6+9/3+9/9+9/8 = 15.25.
This is easy to program: for example a single line in R
V <- function(Y){(length(Y) + 1) * sum(1 / (rowSums(outer(Y, Y, "<=")) + 1) )}
gives the values in the sample output in the original problem
> V(c(1,2,3))
[1] 4.333333
> V(c(3,3,3))
[1] 3
> V(c(2,2,3))
[1] 4
> V(c(10,2,4,4))
[1] 6
> V(c(10,10,10,5,10))
[1] 5.8
> V(c(1,2,3,4,5,6))
[1] 11.15
As you correctly, noted we can solve problem independently for each stick.
Let F(i, len) is number of permutations, that ray from stick i is exactly len.
Then answer is
(Sum(by i, len) F(i,len)*len)/(n!)
All is left is to count F(i, len). Let a(i) be number of sticks j, that y_j<=y_i. b(i) - number of sticks, that b_j>b_i.
In order to get ray of length len, we need to have situation like this.
B, l...l, O
len-1 times
Where O - is stick #i. B - is stick with bigger length, or beginning. l - is stick with heigth, lesser then ith.
This gives us 2 cases:
1) B is the beginning, this can be achieved in P(a(i), len-1) * (b(i)+a(i)-(len-1))! ways.
2) B is bigger stick, this can be achieved in P(a(i), len-1)*b(i)*(b(i)+a(i)-len)!*(n-len) ways.
edit: corrected b(i) as 2nd term in (mul)in place of a(i) in case 2.

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm.
Is the probability function used based on distance or Gaussian?
In the same time the most long distant point (From the other centroids) is picked for a new centroid.
I will appreciate a step by step explanation and an example. The one in Wikipedia is not clear enough. Also a very well commented source code would also help. If you are using 6 arrays then please tell us which one is for what.
Interesting question. Thank you for bringing this paper to my attention - K-Means++: The Advantages of Careful Seeding
In simple terms, cluster centers are initially chosen at random from the set of input observation vectors, where the probability of choosing vector x is high if x is not near any previously chosen centers.
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1, be 0. The probability that the next cluster center, c2, is x is proportional to ||c1-x||^2. So, P(c2 = 1) = 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a, where a = 1/(1+4+1).
I've coded the initialization procedure in Python; I don't know if this helps you.
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
EDIT with clarification: The output of cumsum gives us boundaries to partition the interval [0,1]. These partitions have length equal to the probability of the corresponding point being chosen as a center. So then, since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals (because of break). The for loop checks to see which partition r is in.
Example:
probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
# this event has probability 0.1
i = 0
elif r < cumprobs[1]:
# this event has probability 0.2
i = 1
elif r < cumprobs[2]:
# this event has probability 0.3
i = 2
elif r < cumprobs[3]:
# this event has probability 0.4
i = 3
One Liner.
Say we need to select 2 cluster centers, instead of selecting them all randomly{like we do in simple k means}, we will select the first one randomly, then find the points that are farthest to the first center{These points most probably do not belong to the first cluster center as they are far from it} and assign the second cluster center nearby those far points.
I have prepared a full source implementation of k-means++ based on the book "Collective Intelligence" by Toby Segaran and the k-menas++ initialization provided here.
Indeed there are two distance functions here. For the initial centroids a standard one is used based numpy.inner and then for the centroids fixation the Pearson one is used. Maybe the Pearson one can be also be used for the initial centroids. They say it is better.
from __future__ import division
def readfile(filename):
lines=[line for line in file(filename)]
rownames=[]
data=[]
for line in lines:
p=line.strip().split(' ') #single space as separator
#print p
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
#print [float(x) for x in p[1:]]
return rownames,data
from math import sqrt
def pearson(v1,v2):
# Simple sums
sum1=sum(v1)
sum2=sum(v2)
# Sums of the squares
sum1Sq=sum([pow(v,2) for v in v1])
sum2Sq=sum([pow(v,2) for v in v2])
# Sum of the products
pSum=sum([v1[i]*v2[i] for i in range(len(v1))])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/len(v1))
den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1)))
if den==0: return 0
return 1.0-num/den
import numpy
from numpy.random import *
def initialize(X, K):
C = [X[0]]
for _ in range(1, K):
#D2 = numpy.array([min([numpy.inner(c-x,c-x) for c in C]) for x in X])
D2 = numpy.array([min([numpy.inner(numpy.array(c)-numpy.array(x),numpy.array(c)-numpy.array(x)) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
#print "cumprobs=",cumprobs
r = rand()
#print "r=",r
i=-1
for j,p in enumerate(cumprobs):
if r 0:
for rowid in bestmatches[i]:
for m in range(len(rows[rowid])):
avgs[m]+=rows[rowid][m]
for j in range(len(avgs)):
avgs[j]/=len(bestmatches[i])
clusters[i]=avgs
return bestmatches
rows,data=readfile('/home/toncho/Desktop/data.txt')
kclust = kcluster(data,k=4)
print "Result:"
for c in kclust:
out = ""
for r in c:
out+=rows[r] +' '
print "["+out[:-1]+"]"
print 'done'
data.txt:
p1 1 5 6
p2 9 4 3
p3 2 3 1
p4 4 5 6
p5 7 8 9
p6 4 5 4
p7 2 5 6
p8 3 4 5
p9 6 7 8

Resources