Grouping connected pairs of values - algorithm

I have a list containing unique pairs of values x and y; for example:
x y
-- --
1 A
2 A
3 A
4 B
5 A
5 C
6 D
7 D
8 C
8 E
9 B
9 F
10 C
10 G
I want to divide this list of pairs as follows:
Group 1
1 A
2 A
3 A
5 A
5 C
8 C
10 C
8 E
10 G
Group 2
4 B
9 B
9 F
Group 3
6 D
7 D
Group 1 contains
all pairs where y = 'A' (1-A, 2-A, 3-A, 5-A)
any additional pairs where x = any of the x's above (5-C)
any additional pairs where y = any of the y's above (8-C, 10-C)
any additional pairs where x = any of the x's above (8-E, 10-G)
The pairs in Group 2 can't be reached in such a manner from any pairs in Group 1, nor can the pairs in Group 3 be reached from either Group 1 or Group 2.
As suggested in Group 1, the chain of connections can be arbitrarily long.
I'm exploring solutions using Perl, but any sort of algorithm, including pseudocode, would be fine. For simplicity, assume that all of the data can fit in data structures in memory.
[UPDATE] Because I need to apply this approach to 5.3 billion pairs, scaleability is important to me.

Pick a starting point. Find all points reachable from that, removing from the master list. Repeat for all added points, until no more can be reached. Move to the next group, starting with another remaining point. Continue until you have no more remaining points.
pool = [(1 A), (2 A), (3 A), (4 B), ... (10 G)]
group_list = []
group = []
pos = 0
while pool is not empty
group = [ pool[0] ] # start with next available point
pos = -1
while pos+1 < size(group) // while there are new points in the group
pos += 1
group_point = group[pos] // grab next available point
for point in pool // find all remaining points reachable
if point and group_point have a coordinate in common
remove point from pool
add point to group
// we've reached closure with that starting point
add group to group_list
return group_list

You can think of the letters and numbers as nodes of a graph, and the pairs as edges. Divide this graph into connected components in linear time.
The connected component with 'A' forms group 1. The other connected components form the other groups.

Related

What is N in this given scenario

I am trying to implement this code and this website has kindly provided their algorithm but I am trying to Find out what is "N" I understood what "I" and "M" is but not "N", is "N" the Total input(in the below example 5 because there are 5 letters)?
Algorithm:
Combinations are generated in lexicographical order. The algorithm uses indexes of the elements of the set. Here is how it works on example: Suppose we have a set of 5 elements with indexes 1 2 3 4 5 (starting from 1), and we need to generate all combinations of size m
= 3.
First, we initialize the first combination of size m - with indexes in ascending order
1 2 3
Then we check the last element (i = 3). If its value is less than n - m + i, it is incremented by 1.
1 2 4
Again we check the last element, and since it is still less than n - m
i, it is incremented by 1.
1 2 5
Now it has the maximum allowed value: n - m + i = 5 - 3 + 3 = 5, so we move on to the previous element (i = 2).
If its value less than n - m + i, it is incremented by 1, and all following elements are set to value of their previous neighbor plus 1
1 (2+1)3 (3+1)4 = 1 3 4
Then we again start from the last element i = 3
1 3 5
Back to i = 2
1 4 5
Now it finally equals n - m + i = 5 - 3 + 2 = 4, so we can move to first element (i = 1) (1+1)2 (2+1)3 (3+1)4 = 2 3 4
And then,
2 3 5
2 4 5
3 4 5
and it is the last combination since all values are set to the maximum possible value of n - m + i.
Input:
A
B
C
D
E
Output:
A B C
A B D
A B E
A C D
A C E
A D E
B C D
B C E
B D E
C D E
Take a look at the very first paragraf of the link you provided.
It states that
This combinations calculator generates all possible combinations of m elements from the set of n elements.
So yes, n is the number of elements or letters that the algorithm needs to use.
N here is the size of the set of set from which you generate the combinations. In the given example, "Suppose we have a set of 5 elements with indexes 1 2 3 4 5 (starting from 1)", N is 5.
Combinations are usually symbolized with nCm, or n choose m. So n is the total set size(in this example 5) and m is the number chosen(3).

Assignment regarding, dynamic programming. Making my code more efficient?

I've got an assignment regarding dynamic programming.
I'm to design an efficient algorithm that does the following:
There is a path, covered in spots. The user can move forward to the end of the path using a series of push buttons. There are 3 buttons. One moves you forward 2 spots, one moves you forward 3 spots, one moves you forward 5 spots. The spots on the path are either black or white, and you cannot land on a black spot. The algorithm finds the smallest number of button pushes needed to reach the end (past the last spot, can overshoot it).
The user inputs are for "n", the number of spots. And fill the array with n amount of B or W (Black or white). The first spot must be white. Heres what I have so far (Its only meant to be pseudo):
int x = 0
int totalsteps = 0
n = user input
int countAtIndex[n-1] <- Set all values to -1 // I'll do the nitty gritty stuff like this after
int spots[n-1] = user input
pressButton(totalSteps, x) {
if(countAtIndex[x] != -1 AND totalsteps >= countAtIndex[x]) {
FAILED } //Test to see if the value has already been modified (not -1 or not better)
else
if (spots[x] = "B") {
countAtIndex[x] = -2 // Indicator of invalid spot
FAILED }
else if (x >= n-5) { // Reached within 5 of the end, press 5 so take a step and win
GIVE VALUE OF TOTALSTEPS + 1 A SUCCESSFUL SHORTEST OUTPUT
FINISH }
else
countAtIndex[x] = totalsteps
pressButton(totalsteps + 1, x+5) //take 5 steps
pressButton(totalsteps + 1, x+3) //take 3 steps
pressButton(totalsteps + 1, x+2) //take 2 steps
}
I appreciate this may look quite bad but I hope it comes across okay, I just want to make sure the theory is sound before I write it out better. I'm wondering if this is not the most efficient way of doing this problem. In addition to this, where there are capitals, I'm unsure on how to "Fail" the program, or how to return the "Successful" value.
Any help would be greatly appreciated.
I should add incase its unclear, I'm using countAtIndex[] to store the number of moves to get to that index in the path. I.e at position 3 (countAtIndex[2]) could have a value 1, meaning its taken 1 move to get there.
I'm converting my comment into an answer since this will be too long for a comment.
There are always two ways to solve a dynamic programming problem: top-down with memoization, or bottom-up by systematically filling an output array. My intuition says that the implementation of the bottom-up approach will be simpler. And my intent with this answer is to provide an example of that approach. I'll leave it as an exercise for the reader to write the formal algorithm, and then implement the algorithm.
So, as an example, let's say that the first 11 elements of the input array are:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B ...
To solve the problem, we create an output array (aka the DP table), to hold the information we know about the problem. Initially all values in the output array are set to infinity, except for the first element which is set to 0. So the output array looks like this:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - x - x x x - - x -
where - is a black space (not allowed), and x is being used as the symbol for infinity (a spot that's either unreachable, or hasn't been reached yet).
Then we iterate from the beginning of the table, updating entries as we go.
From index 0, we can reach 2 and 5 with one move. We can't move to 3 because that spot is black. So the updated output array looks like this:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - 1 - x 1 x - - x -
Next, we skip index 1 because the spot is black. So we move on to index 2. From 2, we can reach 4,5, and 7. Index 4 hasn't been reached yet, but now can be reached in two moves. The jump from 2 to 5 would reach 5 in two moves. But 5 can already be reached in one move, so we won't change it (this is where the recurrence relation comes in). We can't move to 7 because it's black. So after processing index 2, the output array looks like this:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - 1 - 2 1 x - - x -
After skipping index 3 (black) and processing index 4 (can reach 6 and 9), we have:
index: 0 1 2 3 4 5 6 7 8 9 10 ...
spot: W B W B W W W B B W B
output: 0 - 1 - 2 1 3 - - 3 -
Processing index 5 won't change anything because 7,8,10 are all black. Index 6 doesn't change anything because 8 is black, 9 can already be reached in three moves, and we aren't showing index 11. Indexes 7 and 8 are skipped because they're black. And all jumps from 9 are into parts of the array that aren't shown.
So if the goal was to reach index 11, the number of moves would be 4, and the possible paths would be 2,4,6,11 or 2,4,9,11. Or if the array continued, we would simply keep iterating through the array, and then check the last five elements of the array to see which has the smallest number of moves.

How to represent clusters in MATLAB?

Suppose I have the following data sets:
A:
1 8 9 12
2 1 0 35
7 0 0 23
B:
6 3
1 9
0 7
What I want to do is for each row in B, find the smallest value and get the column index from which it appears in. For example, for row 1 from B, the smallest value is 3 which comes from column 2. Therefore add row 1 from A to Cluster 2.
For row 2 from B, the smallest value is 1, which comes from column 1. Therefore add row 2 from A to Cluster 1. And so on...
Now I want to make an array called C (this will represent my clusters) with 2 items. Item 1 contains the matrix of all rows from A that should be in Cluster 1, and Item 2 contains the matrix of all rows from A that should be in Cluster 2. This is where I'm having problems. This is my current attempt:
function clusterSet = buildClusters(A, B)
clusterSet = zeros(size(B, 2)); % Number of clusters = number of columns in B
for i = 1:size(A, 1)
[value, index] = min(B(i,:)); % Get the minimum value of B in row i, and its index (column number)
clusterSet(index) = A(i,:); % Add row i from A to its corresponding cluster's matrix.
end
end
I'm getting the following error on the last line (note: this is not explicitly referring to my data sets 'A' and 'B', but talks about a general A and B):
In an assignment A(I) = B, the number of elements in B and I must
be the same.
If the minimum value of B in row 1 comes from column 2, then row 1 from A should be added to a matrix Cluster 2 (row of B corresponds to which row of A to add to the cluster, and the column of B represents which cluster to add it to). This is what I want that line to do but I get the above error.
Any suggestions?
Here's a way without loops:
[~, cluster] = min(B,[],2); %// get cluster index of each row
[clusterSort, indSort] = sort(cluster); %// sort cluster indices
sz = accumarray(clusterSort,1); %// size of each cluster
C = mat2cell(A(indSort,:), sz); %// split A into cell array based on clusters

Strategy with regard to how to approach this algorithm?

I was asked this question in a test and I need help with regards to how I should approach the solution, not the actual answer. The question is
You have been given a 7 digit number(with each digit being distinct and 0-9). The number has this property
product of first 3 digits = product of last 3 digits = product of central 3 digits
Identify the middle digit.
Now, I can do this on paper by brute force(trial and error), the product is 72 and digits being
8,1,9,2,4,3,6
Now how do I approach the problem in a no brute force way?
Let the number is: a b c d e f g
So as per the rule(1):
axbxc = cxdxe = exfxg
more over we have(2):
axb = dxe and
cxd = fxg
This question can be solved with factorization and little bit of hit/trial.
Out of the digits from 1 to 9, 5 and 7 can rejected straight-away since these are prime numbers and would not fit in the above two equations.
The digits 1 to 9 can be factored as:
1 = 1, 2 = 2, 3 = 3, 4 = 2X2, 6 = 2X3, 8 = 2X2X2, 9 = 3X3
After factorization we are now left with total 7 - 2's, 4 - 3's and the number 1.
As for rule 2 we are left with only 4 possibilities, these 4 equations can be computed by factorization logic since we know we have overall 7 2's and 4 3's with us.
1: 1X8(2x2x2) = 2X4(2x2)
2: 1X6(3x2) = 3X2
3: 4(2x2)X3 = 6(3x2)X2
4: 9(3x3)X2 = 6(3x2)X3
Skipping 5 and 7 we are left with 7 digits.
With above equations we have 4 digits with us and are left with remaining 3 digits which can be tested through hit and trial. For example, if we consider the first case we have:
1X8 = 2X4 and are left with 3,6,9.
we have axbxc = cxdxe we can opt c with these 3 options in that case the products would be 24, 48 and 72.
24 cant be correct since for last three digits we are left with are 6,9,4(=216)
48 cant be correct since for last three digits we are left with 3,9,4(=108)
72 could be a valid option since the last three digits in that case would be 3,6,4 (=72)
This question is good to solve with Relational Programming. I think it very clearly lets the programmer see what's going on and how the problem is solved. While it may not be the most efficient way to solve problems, it can still bring desired clarity and handle problems up to a certain size. Consider this small example from Oz:
fun {FindDigits}
D1 = {Digit}
D2 = {Digit}
D3 = {Digit}
D4 = {Digit}
D5 = {Digit}
D6 = {Digit}
D7 = {Digit}
L = [D1 D2 D3] M = [D3 D4 D5] E= [D5 D6 D7] TotL in
TotL = [D1 D2 D3 D4 D5 D6 D7]
{Unique TotL} = true
{ProductList L} = {ProductList M} = {ProductList E}
TotL
end
(Now this would be possible to parameterize furthermore, but non-optimized to illustrate the point).
Here you first pick 7 digits with a function Digit/0. Then you create three lists, L, M and E consisting of the segments, as well as a total list to return (you could also return the concatenation, but I found this better for illustration).
Then comes the point, you specify relations that have to be intact. First, that the TotL is unique (distinct in your tasks wording). Then the next one, that the segment products have to be equal.
What now happens is that a search is conducted for your answers. This is a depth-first search strategy, but could also be breadth-first, and a solver is called to bring out all solutions. The search strategy is found inside the SolveAll/1 function.
{Browse {SolveAll FindDigits}}
Which in turns returns this list of answers:
[[1 8 9 2 4 3 6] [1 8 9 2 4 6 3] [3 6 4 2 9 1 8]
[3 6 4 2 9 8 1] [6 3 4 2 9 1 8] [6 3 4 2 9 8 1]
[8 1 9 2 4 3 6] [8 1 9 2 4 6 3]]
At least this way forward is not using brute force. Essentially you are searching for answers here. There might be heuristics that let you find the correct answer sooner (some mathematical magic, perhaps), or you can use genetic algorithms to search the space or other well-known strategies.
Prime factor of distinct digit (if possible)
0 = 0
1 = 1
2 = 2
3 = 3
4 = 2 x 2
5 = 5
6 = 2 x 3
7 = 7
8 = 2 x 2 x 2
9 = 3 x 3
In total:
7 2's + 4 3's + 1 5's + 1 7's
With the fact that When A=B=C, composition of prime factor of A must be same as composition of prime factor of B and that of C, 0 , 5 and 7 are excluded since they have unique prime factor that can never match with the fact.
Hence, 7 2's + 4 3's are left and we have 7 digit (1,2,3,4,6,8,9). As there are 7 digits only, the number is formed by these digits only.
Recall the fact, A, B and C must have same composition of prime factors. This implies that A, B and C have same number of 2's and 3's in their composition. So, we should try to achieve (in total for A and B and C):
9 OR 12 2's AND
6 3's
(Must be product of 3, lower bound is total number of prime factor of all digits, upper bound is lower bound * 2)
Consider point 2 (as it has one possibility), A has 2 3's and same for B and C. To have more number of prime factor in total, we need to put digit in connection digit between two product (third or fifth digit). Extract digits with prime factor 3 into two groups {3,6} and {9} and put digit into connection digit. The only possible way is to put 9 in connection digit and 3,6 on unconnected product. That mean xx9xx36 or 36xx9xx (order of 3,6 is not important)
With this result, we get 9 x middle x connection digit = connection digit x 3 x 6. Thus, middle = (3 x 6) / 9 = 2
My answer actually extends #Ansh's answer.
Let abcdefg be the digits of the number. Then
ab=de
cd=fg
From these relations we can exclude 0, 5 and 7 because there are no other multipliers of these numbers between 0 and 9. So we are left with seven numbers and each number is included once in each answer. We are going to examine how we can pair the numbers (ab, de, cd, fg).
What happens with 9? It can't be combined with 3 or 6 since then their product will have three times the factor 3 and we have at total 4 factors of 3. Similarly, 3 and 6 must be combined at least one time together in response to the two factors of 9. This gives a product of 18 and so 9 must be combined at least once with 2.
Now if 9x2 is in a corner then 3x6 must be in the middle. Meaning in the other corner there must be another multiplier of 3. So 9 and 2 are in the middle.
Let's suppose ab=3x6 (The other case is symmetric). Then d must be 9 or 2. But if d is 9 then f or g must be multiplier of 3. So d is 2 and e is 9. We can stop here and answer the middle digit is
2
Now we have 2c = fg and the remaining choices are 1, 4, 8. We see that the only solutions are c = 4, f = 1, g = 8 and c = 4, f = 8, g = 1.
So if is 3x6 is in the left corner we have the following solutions:
3642918, 3642981, 6342918, 6342981
If 3x6 is in the right corner we have the following solutions which are the reverse of the above:
8192463, 1892463, 8192436, 1892436
Here is how you can consider the problem:
Let's note the final solution N1 N2 N3 N4 N5 N6 N7 for the 3 numbers N1N2N3, N3N4N5 and N5N6N7
0, 5 and 7 are to exclude because they are prime and no other ciphers is a multiple of them. So if they had divided one of the 3 numbers, no other number could have divided the others.
So we get the 7 remaining ciphers : 1234689
where the product of the ciphers is 2^7*3^4
(N1*N2*N3) and (N5*N6*N7) are equals so their product is a square number. We can then remove, one of the number (N4) from the product of the previous point to find a square number (i.e. even exponents on both numbers)
N4 can't be 1, 3, 4, 6, 9.
We conclude N4 is 2 or 8
If N4 is 8 and it divides (N3*N4*N5), we can't use the remaining even numbers (2, 4, 6) to divides
both (N1*N2*N3) and (N6*N7*N8) by 8. So N4 is 2 and 8 does not belong to the second group (let's put it in N1).
Now, we have: 1st grp: 8XX, 2nd group: X2X 3rd group: XXX
Note: at this point we know that the product is 72 because it is 2^3*3^2 (the square root of 2^6*3^4) but the result is not really important. We have made the difficult part knowing the 7 numbers and the middle position.
Then, we know that we have to distribute 2^3 on (N1*N2*N3), (N3*N4*N5), (N5*N6*N7) because 2^3*2*2^3=2^7
We already gave 8 to N1, 2 to N4 and we place 6 to N6, and 4 to N5 position, resulting in each of the 3 numbers being a multiple of 8.
Now, we have: 1st grp: 8XX, 2nd group: X24 3rd group: 46X
We have the same way of thinking considering the odd number, we distribute 3^2, on each part knowing that we already have a 6 in the last group.
Last group will then get the 3. And first and second ones the 9.
Now, we have: 1st grp: 8X9, 2nd group: 924 3rd group: 463
And, then 1 at N2, which is the remaining position.
This problem is pretty easy if you look at the number 72 more carefully.
We have our number with this form abcdefg
and abc = cde = efg, with those digits 8,1,9,2,4,3,6
So, first, we can conclude that 8,1,9 must be one of the triple, because, there is no way 1 can go with other two numbers to form 72.
We can also conclude that 1 must be in the start/end of the whole number or middle of the triple.
So now we have 819defg or 918defg ...
Using some calculations with the rest of those digits, we can see that only 819defg is possible, because, we need 72/9 = 8,so only 2,4 is valid, while we cannot create 72/8 = 9 from those 2,4,3,6 digits, so -> 81924fg or 81942fg and 819 must be the triple that start or end our number.
So the rest of the job is easy, we need either 72/4 = 18 or 72/2 = 36, now, we can have our answers: 8192436 or 8192463.
7 digits: 8,1,9,2,4,3,6
say XxYxZ = 72
1) pick any two from above 7 digits. say X,Y
2) divide 72 by X and then Y.. you will get the 3rd number i.e Z.
we found XYZ set of 3-digits which gives result 72.
now repeat 1) and 2) with remaining 4 digits.
this time we found ABC which multiplies to 72.
lets say, 7th digit left out is I.
3) divide 72 by I. result R
4) divide R by one of XYZ. check if result is in ABC.
if No, repeat the step 3)
if yes, found the third pair.(assume you divided R by Y and the result is B)
YIB is the third pair.
so... solution will be.
XZYIBAC
You have your 7 numbers - instead of looking at it in groups of 3 divide up the number as such:
AB | C | D | E | FG
Get the value of AB and use it to get the value of C like so: C = ABC/AB
Next you want to do the same thing with the trailing 2 digits to find E using FG. E = EFG/FG
Now that you have C & E you can solve for D
Since CDE = ABC then D = ABC/CE
Remember your formulas - instead of looking at numbers create a formula aka an algorithm that you know will work every time.
ABC = CDE = EFG However, you have to remember that your = signs have to balance. You can see that D = ABC/CE = EFG/CE Once you know that, you can figure out what you need in order to solve the problem.
Made a quick example in a fiddle of the code:
http://jsfiddle.net/4ykxx9ve/1/
var findMidNum = function() {
var num = [8, 1, 9, 2, 4, 3, 6];
var ab = num[0] * num[1];
var fg = num[5] * num[6];
var abc = num[0] * num[1] * num[2];
var cde = num[2] * num[3] * num[4];
var efg = num[4] * num[5] * num[6];
var c = abc/ab;
var e = efg/fg;
var ce = c * e
var d = abc/ce;
console.log(d); //2
}();
You have been given a 7 digit number(with each digit being distinct and 0-9). The number has this property
product of first 3 digits = product of last 3 digits = product of central 3 digits
Identify the middle digit.
Now, I can do this on paper by brute force(trial and error), the product is 72 and digits being
8,1,9,2,4,3,6
Now how do I approach the problem in a no brute force way?
use linq and substring functions
example var item = array.Skip(3).Take(3) in such a way that you have a loop
for(f =0;f<charlen.length;f++){
var xItemSum = charlen[f].Skip(f).Take(f).Sum(f => f.Value);
}
// untested code

Determining the Longest Continguous Subsequence

There are N nodes (1 <= N <= 100,000) various positions along a
long one-dimensional length. The ith node is at position x_i (an
integer in the range 0...1,000,000,000) and has a node type b_i(an integer in
the range 1..8). Nodes can not be in the same position
You want to get a range on this one-dimension in which all of the types of nodes are fairly represented. Therefore, you want to ensure that, for whatever types of nodes that are present in the range, there is an equal number of each node type (for example, a range with 27 each of types 1 and 3 is ok, a range with 27 of types 1, 3, and 4 is
ok, but 9 of type 1 and 10 of type 3 is not ok). You also want
at least K (K >= 2) types (out of the 8 total) to be represented in the
rand. Find the maximum size of this range that satisfies the constraints. The size of a photo is the difference between the maximum and minimum positions of the nodes in the photo.
If there are no ranges satisfying the constraints, output -1 instead.
INPUT:
* Line 1: N and K separated by a space
* Lines 2..N+1: Each line contains a description of a node as two
integers separated by a space; x(i) and its node type.
INPUT:
9 2
1 1
5 1
6 1
9 1
100 1
2 2
7 2
3 3
8 3
INPUT DETAILS:
Node types: 1 2 3 - 1 1 2 3 1 - ... - 1
Locations: 1 2 3 4 5 6 7 8 9 10 ... 99 100
OUTPUT:
* Line 1: A single integer indicating the maximum size of a fair
range. If no such range exists, output -1.
OUTPUT:
6
OUTPUT DETAILS:
The range from x = 2 to x = 8 has 2 each of types 1, 2, and 3. The range
from x = 9 to x = 100 has 2 of type 1, but this is invalid because K = 2
and so you need at least 2 distinct types of nodes.
Could You Please help in suggesting some algorithm to solve this. I have thought about using some sort of priority queue or stack data structure, but am really unsure how to proceed.
Thanks, Todd
It's not too difficult to invent almost linear-time algorithm because recently similar problem was discussed on CodeChef: "ABC-Strings".
Sort nodes by their positions.
Prepare all possible subsets of node types (for example, we could expect types 1,2,4,5,7 to be present in resulting interval and all other types not present there). For K=2 there may be only 256-8-1=247 subsets. For each subset perform remaining steps:
Initialize 8 type counters to [0,0,0,0,0,0,0,0].
For each node perform remaining steps:
Increment counter for current node type.
Take L counters for types included to current subset, subtract first of them from other L-1 counters, which produces L-1 values. Take remaining 8-L counters and combine them together with those L-1 values into a tuple of 7 values.
Use this tuple as a key for hash map. If hash map contains no value for this key, add a new entry with this key and value equal to the position of current node. Otherwise subtract value in the hash map from the position of current node and (possibly) update the best result.

Resources