Parallel algorithm for set intersections - algorithm

I have n-sets (distributed on n-ranks) of data which represents the nodes of a mesh and I wanted to know an efficient parallel algorithm to find the intersection of these sets, i.e., the common nodes. An intersection is defined as soon as any 2 sets share a node.
For example;
Input:
Rank 0: Set 1 - [0, 1, 2, 3, 4]
Rank 1: Set 2 - [2, 4, 5, 6]
Rank 2: Set 3 - [0, 5, 6, 7, 8]
Implement Parallel Algorithm --> Result: (after finding intersections)
Rank 0: [0, 2, 4]
Rank 1: [2, 4, 5, 6]
Rank 2: [0, 5, 6]
The algorithm needs to be done on n-ranks with 1 set on each rank.

You should be able to this fast O(N), in parallel, with hash tables.
For each set S_i, for each member m_x (all of which can be done in parallel), put the set member into a hash table associated with the set name, e.g., . Anytime you get a hit in the hash table on m_x from set S_j, you now have the corresponding set number S_i, and you know immediately that S_i intersects S_j. You can put m_x in the derived intersection sets.
You need a parallel-safe hash table. That's easy; lock the buckets during updates.
[Another answer suggested sorting the sets. With most sort algorithms, would be O(N ln N) time, not as fast].

Related

How would I code an algorithm that does as follow?

I have three arrays of six red numbers ranging from 1 to 6. There can be multiple times the same number.
For example, [1, 4, 3, 3, 6, 2], [5, 5, 2, 1, 3, 4] and [2, 4, 3, 1, 1, 6]
The goal of the algorithm is to turn numbers blue in those arrays following these rules:
Every blue number has to be unique (1, 2, 3, 4, 5, 6)
Each array should have 2 blue numbers
The algorithm should warn me if it isn't possible to do so
In this example, the blue numbers could be (1, 4) in array one, (2, 5) in array two and (3, 6) in array three.
Is it realistic to code an algorithm that could do that? I'm looking for the logic that I could code to make it work.
You can reduce this problem to bipartite matching.
If we consider two sets of vertices, one for the arrays and one for the numbers, and make an edge between the array A and number n if n is an element of A, then we have a bipartite graph on which we can use any matching algorithm. Then, each match between an array and a number indicates that that number is blue in that array.
This only works for making a single number blue for each array, but can be expanded by adding every array as a vertex twice, thus getting two matches per array and therefore two blue numbers per array.

query the number of intersected segments in a rage

I have a large dataset of segments (ai, bi), where ai < bi, and many queries. Each query asks for the number of intersected segments with the given range (b, e). The number of queries can be very large. A naive algorithm is to search for all intersected segments per query which takes O(N) time apparently. Is there a faster way to do this? I can imagine soring the segments dataset in ascending order of ai may help but I don't know what to do with the other direction.
segments: [1, 3], [2, 6], [4, 7], [7, 8]
query 1: [2, 5] => output [1, 3] [2, 6], [4, 7]
...
Make list B of sorted start points, as you wrote.
Make list P of structures containing all points - both starting and ending points together with field SE = +1/-1 for start and end correspondingly. Sort it by point coordinate.
Make Active = 0. Walk through P, adding SE to Counter and making new list A containing point position and Active count.
For every query start search (with binary search) lower position in A, get Active - number of opened segments at this moment.
Then search indexes in B corresponding to query start and query end, get index difference - number of segments starting inside query interval.
Sum of these values is needed number of intersected segments (you don't need segments themselves according to the problem statement)
Time per query is O(log(N))
[1, 3], [2, 6], [4, 7], [7, 8] initial list
[1, 2, 4, 7] list B
(1,1),(2,1),(3,-1),(4,1),(6,-1),(7,-1),(7,1),(8,-1) list P
(1,1),(2,2),(3,1), (4,2),(6,1), (7,0), (7,1),(8,0) list A
^
q start 2 gives active = 2 (two active intervals)
searching 2 in B gives index 1, searching 5 gives index 2,
difference is 1
result = 2 + 1 = 3

Is there a data structure for effective implementation of this encryption algorithm?

input -> alphabet -> output (index of a number in alphabet) -> new alphabet (the number moved to the begin of the alphabet):
3 -> [1, 2, 3, 4, 5] -> 3 -> [3, 1, 2, 4, 5]
2 -> [3, 1, 2, 4, 5] -> 3 -> [2, 3, 1, 4, 5]
1 -> [2, 3, 1, 4, 5] -> 3 -> [1, 2, 3, 4, 5]
1 -> [1, 2, 3, 4, 5] -> 1 -> [1, 2, 3, 4, 5]
4 -> [1, 2, 3, 4, 5] -> 4 -> [4, 1, 2, 3, 5]
5 -> [4, 1, 2, 3, 5] -> 5 -> [5, 4, 1, 2, 3]
input: (n - number of numbers in alphabet, m - length of text to be encrypted, the text)
5, 6
3 2 1 1 4 5
Answer: 3 2 1 1 4 5 -> 3 3 3 1 4 5
Is there any data structure or algorithm to make this efficiently, faster than O(n*m)?
I'd be appreciated for any ideas. Thanks.
Use an order statistics tree to store the pairs (1,1)...(n,n), ordered by their first elements.
Look up the translation for a character c by selecting the c-th smallest element of the tree and taking its second element.
Then update the tree by removing the node that you looked up and inserting it back into the tree with the first element of the pair set to -t, where t is the position in the message (or some other steadily decreasing counter).
Lookup, removal and insertion can be done in O(ln n) time worst-case if a self-balanced search tree (e.g. a red-black tree) is used as underlying tree structure for the order statistics tree.
Given that the elements for the initial tree are inserted in order, the tree structure can be build in O(n).
So the whole algorithm will be O(n + m ln n) time, worst-case.
You can further improve this for the case that n is larger than m, by storing only one node for any continuous range of nodes in the tree, but counting it for the purpose of rank in the order statistics tree according to the number of nodes there would normally be.
Starting then from only one actually stored node, when the tree is rearranged, you split the range-representing node into three: one node representing the range before the found value, one representing the range after the found value and one representing the actual value. These three nodes are then inserted back, in case of the range nodes only if they are non-empty and with the first pair element equal to the second and in case of the non-range node, with the negative value as described before. If a node with negative first entry is found, it is not split in this.
The result of this is that the tree will contain at most O(m) nodes, so the algorithm has a worst-time complexity of O(m ln min(n,m)).
Maybe a hashmap with letter/index pairs? I believe that element lookup in a hashmap usually O(1) most of the time, unless you have a lot of collisions (which is unlikely).

Finding a group of subsets that does not overlap

I am reviewing for an upcoming programming contest and was working on the following problem:
Given a list of integers, an integer t, an integer r, and an integer p, determine if the list contains t sets of 3, r runs of 3, and p pairs of numbers. For each of these subsets, the numbers must be adjacent and any given number can only exist in one subset, if any at all.
Currently, I am solving the problem by simply finding all sets of 3, runs of 3, and pairs and then checking all permutations until finding one which has no overlapping subsets. This seems inefficient, however, and I was wondering if there was a better solution to the problem.
Here are two examples of the problem:
{1, 1, 1, 2, 3, 4, 4, 4, 5, 5, 1, 0}, t = 1, r = 1, p = 2.
This works because we have the triple {4 4 4}, the run {1 2 3}, and the pairs {1 1} and {5 5}
{1, 1, 1, 2, 3, 3}, t = 1, r = 1, p = 1
This does not work because the only triple is {1 1 1} and the only run is {1 2 3} and the two overlap (They share a 1).
I am looking for a more efficient approach to this problem.
There is probably a faster way, but you can solve this with dynamic programming. Compute a recursive function F(t,r,p,n) which decides whether it is possible to have t triples, r runs, and p pairs in the sequence starting at position 1 and ending at n, and storing the last subset of the solution ending at position n if it is possible. If you can have a triple, run, or pair ending at position n then you have a recursive case, either. F(t-1,r,p,n-3) or F(t,r-1,p,n-3) or F(t,r,p-1,n-2), and you have the last subset stored, or otherwise you have a recursive case F(t,r,p,n-1). This looks like fourth power complexity but it really isn't, because the value of n is always decreasing so the complexity is actually O(n + TRP), where T is the total desired number of triples, R is the total desired number of runs, and P is the total desired number of pairs. So O(n^3) in the worst case.

How to distribute a vector of n elements across p processors

Suppose I have a vector of n elements, and I want to distribute it on p processes, where n isn't necessary a multiple of p. Each process has a rank from 0 to p-1. How to determine how many elements will be on each process, to have data distributed the more evenly possible?
For example, if n=14 and p=4, I want a distribution like [3, 3, 4, 4] or [3, 4, 3, 4], but not [3, 3, 3, 5] nor [4, 4, 4, 2].
I want a function f(n, p, r) that returns me the number of elements for process with rank r.
Does
(n + r) / p
work for you?
This seems to be a special case of the Bin Packing problem. There are some very good approximation algorithms, but in theory it is NP-hard.
If you can't be bothered to read the wiki page, I'll cut it down into a few lines. If you want to look deeper for possibly better solutions, or for an analysis on how well the approximation schemes work, by all means.
Step 1: sort the elements by priority.
Step 2: grab the element with highest priority, and shove it on the least burdened process.
Step 3: If you have more elements, go to Step 1. Else return.

Resources