Create groups from sets of nodes - algorithm

I have a list of sets (a,b,c,d,e in below example). Each of the sets contains a list of nodes in that set (1-6 below). I was wondering that there probably is a general known algorithm for achieving the below, and I just do not know about it.
sets[
a[1,2,5,6],
b[1,4,5],
c[1,2,5],
d[2,5],
e[1,6],
]
I would like to generate a new structure, a list of groups, with each group having
all the (sub)sets of nodes that appear in multiple sets
references to the original sets those nodes belong to
So the above data would become (order of groups irrelevant).
group1{nodes[2,5],sets[a,c,e]}
group2{nodes[1,2,5],sets[a,c]}
group3{nodes[1,6],sets[a,e]}
group4{nodes[1,5],sets[a,b,c]}
I am assuming I can get the data in as an array/object structure and manipulate that, and then spit the resulting structure out in whatever format needed.
It would be a plus if:
all groups had a minimum of 2 nodes and 2 sets.
when a subset of nodes is contained in a bigger set that forms a group, then only the bigger set gets a group: in this example, nodes 1,2 do not have a group of their own since all the sets they have in common already appear in group2.
(The sets are stored in XML, which I have also managed to convert to JSON so far, but this is irrelevant. I can understand procedural (pseudo)code but also something like a skeleton in XSLT or Scala could help to get started, I guess.)

Go through the list of sets. For each set S
Go through the list of groups. For each group G
If S can be a member of G (i.e. if G's set is a subset of S), add S to G.
If S cannot be a member of G but the intersection of S ang G's set contains more than one node, make a new group for that intersection and add it to the list.
Give S a group of its own and add it to the list.
Combine any groups that have the same set.
Delete any group with only one member set.
For example, with your example sets, after reading a and b the list of groups is
[1,2,5,6] [a]
[1,5] [a,b]
[1,4,5] [b]
And after reading c it's
[1,2,5,6] [a]
[1,5] [a,b,c]
[1,4,5] [b]
[1,2,5] [a,c]
There are slightly more efficient algorithms, if speed is a problem.

/*
Pseudocode algorithm for creating groups data from a set dataset, further explained in the project documentation. This is based on
http://stackoverflow.com/questions/1644387/create-groups-from-sets-of-nodes
I am assuming
- Group is a structure (class) the objects of which contain two lists: a list of sets and a list of nodes (group.nodes). Its constructor accepts a list of nodes and a reference to a Set object
- Set is a list structure (class), the objects (set) of which contain the nodes of the list in set.nodes
- groups and sets are both list structures that can contain arbitrary objects which can be iterated with foreach().
- you can get the objects two lists have in common as a new list with intersection()
- you can count the number of objects in a list with length()
*/
//Create groups, going through the original sets
foreach(sets as set){
if(groups.nodes.length==0){
groups.addGroup(new Group(set.nodes, set));
}
else{
foreach (groups as group){
if(group.nodes.length() == intersection(group.nodes,set.nodes).length()){
// the group is a subset of the set, so just add the set as a member the group
group.addset(set);
if (group.nodes.length() < set.nodes.length()){
// if the set has more nodes than the group that already exists,
// create a new group for the nodes of the set, with set as a member of that group
groups.addGroup(new Group(set.nodes, set));
}
}
// If group is not a subset of set, and the intersection of the nodes of the group
// and the nodes of the set
// is greater than one (they have more than one person in common), create a new group with
// those nodes they have in common, with set as a member of that group
else if(group.nodes.length() > intersection(group.nodes,set.nodes).length()
&& intersection(group.nodes,set.nodes).length()>1){
groups.addGroup(new Group(intersection(group.nodes,set.nodes), set);
}
}
}
}
// Cleanup time!
foreach(groups as group){
//delete any group with only one member set (for it is not really a group then)
if (group.sets.length<2){
groups.remove(group);
}
// combine any groups that have the same set of nodes. Is this really needed?
foreach(groups2 as group2){
//if the size of the intersection of the groups is the same size as either of the
//groups, then the groups have the same nodes.
if (intersection(group.nodes,group2.nodes).length == group.nodes.length){
foreach(group2.sets as set2){
if(!group.hasset(set)){
group.addset(set2);
}
}
groups.remove(group2);
}
}
}

Related

How to get level(depth) number of two connected nodes in neo4j

I'm using neo4j as a graph database to store user's connections detail into this. here I want to show the level of one user with respect to another user in their connections like Linkedin. for example- first layer connection, second layer connection, third layer and above the third layer shows 3+. but I don't know how this happens using neo4j. i searched for this but couldn't find any solution for this. if anybody knows about this then please help me to implement this functionality.
To find the shortest "connection level" between 2 specific people, just get the shortest path and add 1:
MATCH path = shortestpath((p1:Person)-[*..]-(p2:Person))
WHERE p1.id = 1 AND p2.id = 2
RETURN LENGTH(path) + 1 AS level
NOTE: You may want to put a reasonable upper bound on the variable-length relationship pattern (e.g., [*..6]) to avoid having the query taking too long or running out of memory in a large DB). You should probably ignore very distant connections anyway.
it would be something like this
// get all persons (or users)
MATCH (p:Person)
// create a set of unique combinations , assuring that you do
// not do double work
WITH COLLECT(p) AS personList
UNWIND personList AS personA
UNWIND personList AS personB
WITH personA,personB
WHERE id(personA) < id(personB)
// find the shortest path between any two nodes
MATCH path=shortestPath( (personA)-[:LINKED_TO*]-(personB) )
// return the distance ( = path length) between the two nodes
RETURN personA.name AS nameA,
personB.name AS nameB,
CASE WHEN length(path) > 3 THEN '3+'
ELSE toString(length(path))
END AS distance

What is the efficient algorithm / way to find intersection between 2 array lists. ( I am using java 8 )

I have 2 array lists which contains a custom object Stock.
public class Stock{
private String companyName;
private double stockPrice;
// getters and setters
}
List1 contains Stock objects . List2 also contains stock objects.
List 1 and list 2 are same in size. There are some stock objects in list 1 which are same as present in list 2. I need to get those same objects which re present in list 1 out of list 2. i.e. in another words get the intersection of list 1 and list 2 in list 2. I am trying to find out if there is any direct way in Java 8 which gives this result in an efficient way .Or if not , how to construct an efficient algorithm in terms of time complexity and space complexity ? Help is highly appreciated.
List<T> intersect = list1.stream()
.filter(list2::contains)
.collect(Collectors.toList());
CREDIT: Fat_FS answer at Intersection and union of ArrayLists in Java.
Make sure you override equals and hashcode methods (assuming you wanted to look at the contents of the objects for equality comparison)
Convert list to set for better performance

How to implement semi-random groups with Ruby?

suppose I've got groups:
{1=>[1,1,1,1,1,1], 2=>[2,2,2], 3=>[3,3,3,3,3,3], 4=>[4,4,4,4,4,4]}
the keys represent teams, and the values within the arrays represent employees. Imagine I wish to match employees in a semi-random way. I want to make groups of 3's-5's like this:
[1,1,2,3,5], [1,2,3,4], [1,2,3,3], [1,3,4,1,4]
I have the wish to create groups and have a bias for matching team members of opposite teams, but not an absolute bias. Also you must match every member of each team with a group.
How would you solve this?
This is how I've done it:
group_by_team = records.group_by {|x| x.team_id}.values
mixed_groups = group_by_team.each{|x| x.shuffle!}
# take 1 element from each team and mix
# the number of teams is defined as a constant so we don't have to hit the db with a count
for index in (1..(TEAMS-1))
zipped_groups ||= mixed_groups[0]
zipped_groups = zipped_groups.zip(mixed_groups[index])
end
# flatten the arrays to produce one large Array
# remove nil values from ziped steps with compact!
zipped_groups = zipped_groups.flatten!.compact!
lunch_groups = zipped_groups.each_slice(3)
# we can no longer reduce table size, so lets join two small lunch groups
if lunch_groups.any?{|x| x.size<3}
lunch_groups = self.merge_last_two(lunch_groups)
end
But the problems with my implementation are vast. Groups size is fixed at 3. And its no exactly elegant, or efficient.
How would you make semi-random groups happen?

Data structure to check if number of elements is equal in O(1) time?

What is the best data structure to check if the number of elements of different types of objects is the same?
For example, if I have
2 a's
3 b's
3 c's
The number of elements of the different types of objects is not the same.
If I have
2 a's
2 b's
2 c's
then this is the same.
What is the best data structure that allows you do this in O(1) time and how would you implement it?
One method is to use two dictionaries to be able to do it in O(1) dynamically.
The first maps each type to a count, {a:2,b:3,c:3}. The second maps each count to a set of types with that count. {2:{a},3:{b,c}}. If the size of the second dictionary is less than 2 (0 or 1) then clearly all types have the same count as if that was not the case then there would be at least two key-item pairs in that dictionary, presuming that the dictionary is updated when the counts change.
Adding a type just means adding it to each dictionary.
Removing a type just means removing it from each dictionary.
Updating a type requires first updating the second dictionary by removing the previous count (obtained from the first dictionary) and adding the current count, after which the first dictionary is updated.
Dictionary<Type, int> typeCounts = new Dictionary<Type, int>();
// read and store type counts
Type type = typeof(A);
if (typeCounts.Contains(type))
{
typeCounts[type]++;
}
else
{
typeCounts.Add(type, 1);
}
// populate type counts by type and finally:
if (typeCounts[typeA] == typeCounts[typeB])
{
// so on
}

Join of two datasets in Mapreduce/Hadoop

Does anyone know how to implement the Natural-Join operation between two datasets in Hadoop?
More specifically, here's what I exactly need to do:
I am having two sets of data:
point information which is stored as (tile_number, point_id:point_info) , this is a 1:n key-value pairs. This means for every tile_number, there might be several point_id:point_info
Line information which is stored as (tile_number, line_id:line_info) , this is again a 1:m key-value pairs and for every tile_number, there might be more than one line_id:line_info
As you can see the tile_numbers are the same between the two datasets. now what I really need is to join these two datasets based on each tile_number. In other words for every tile_number, we have n point_id:point_info and m line_id:line_info. What I want to do is to join all pairs of point_id:point_info with all pairs of line_id:line_info for every tile_number
In order to clarify, here's an example:
For point pairs:
(tile0, point0)
(tile0, point1)
(tile1, point1)
(tile1, point2)
for line pairs:
(tile0, line0)
(tile0, line1)
(tile1, line2)
(tile1, line3)
what I want is as following:
for tile 0:
(tile0, point0:line0)
(tile0, point0:line1)
(tile0, point1:line0)
(tile0, point1:line1)
for tile 1:
(tile1, point1:line2)
(tile1, point1:line3)
(tile1, point2:line2)
(tile1, point2:line3)
Use a mapper that outputs titles as keys and points/lines as values. You have to differentiate between the point output values and line output values. For instance you can use a special character (even though a binary approach would be much better).
So the map output will be something like:
tile0, _point0
tile1, _point0
tile2, _point1
...
tileX, *lineL
tileY, *lineK
...
Then, at the reducer, your input will have this structure:
tileX, [*lineK, ... , _pointP, ...., *lineM, ..., _pointR]
and you will have to take the values separate the points and the lines, do a cross product and output each pair of the cross-product , like this:
tileX (lineK, pointP)
tileX (lineK, pointR)
...
If you can already easily differentiate between the point values and the line values (depending on your application specifications) you don't need the special characters (*,_)
Regarding the cross-product which you have to do in the reducer:
You first iterate through the entire values List, separate them into 2 list:
List<String> points;
List<String> lines;
Then do the cross-product using 2 nested for loops.
Then iterate through the resulting list and for each element output:
tile(current key), element_of_the_resulting_cross_product_list
So basically you have two options here.Reduce side join or Map Side Join .
Here your group key is "tile". In a single reducer you are going to get all the output from point pair and line pair. But you you will have to either cache point pair or line pair in the array. If either of the pairs(point or line) are very large that neither can fit in your temporary array memory for single group key(each unique tile) then this method will not work for you. Remember you don't have to hold both of key pairs for single group key("tile") in memory, one will be sufficient.
If both key pairs for single group key are large , then you will have to try map-side join.But it has some peculiar requirements. However you can fulfill those requirement by doing some pre-processing your data through some map/reduce jobs running equal number of reducers for both data.

Resources