Groovy tree-like sort - sorting

I have a problem with sorting rows from db having tree-like hierarchy. Each row contains three columns meaningful to this problem: id, parent, lp. Id is String, parent is another row and lp is a number used to sort rows having no parent-child relationship. Each row can have any number of children and only one parent (null on top level)
There are three situations I see:
when first row is parent of another: -1 is returned
when first row is child of a parent with lower lp than another row::
-1 is returned
when none of those relations exist (also when rows have same parent and are on the same level) : to lps of rows are compared
I manadged to write this code that I think should solve the problem but it doesnt work for rows that are deep in hierarchy and it messes the order :
dane = dane.sort {it1, it2 ->
it1 == it2.parent ? -1 :
it1.parent && it1.parent.lp < it2.lp ? -1 :
it1.lp - it2.key.lp
}
I'd appreciate any suggestions. Thx in advance!

Your comparison should be consistent regardless of the order of the arguments. If the arguments are a = it1 and b = it2, the result should be the negation of b = it1 and a = it2. It doesn't look like that's the case here. For example, the case where it1.parent == it2.

Related

organization find all children algorithm

So i am creating a system where users are able to build their own organization structure meaning that all organizations will most likely be different.
My setup is that an organization consists of different divisions. In my division table i have a value called parent_id that points to a division who is the current divisions parent.
a setup might look something like this (Paint drawing)
as you can see from the drawing division 2 and 3 are children of division 1 therefore they both have the value parent_id = 1
division 4 is a child of id 2 and has two children (5 & 6)
now to the tricky part because of the structure in my system i need access to all children and the childrens children in my system depending on a root node.
So for example if i want to know all of the children of division 1 the result should be [2,3,4,5,6]
Now my question is. how will i find all children connected?
At first i thought something like this
root = 1;
while(getChildren(root) != null)
{
}
function getChildren(root)
{
var result = 'select * from division where parent_id = '+root;
if(result != null)
{
root = result;
}
return result;
}
please note this is only an example of using a while loop to get through the list
However this would not work when the result of the statement returns two children
So my question is how do i find all children of any root id with the above setup?
You could use a recursive function. Be careful, and keep track of the children you have found so if you run into them again you stop and error - otherwise you will end up in an infinite loop.
I don't know what language you are using, so here's some psuedocode:
create dictionaryOfDivisions
dictionaryOfDivisions.Add(currentDivision)
GetChildren(currentDivision)
Function GetChildren(thisDivision) {
theseChildren = GetChildrenFromDB(thisDivision)
For each child in theseChildren
If dictionaryOfDivisions.Exists(child)
'Oops, here's a loop! Error
Exit
Else
dictionaryOfDivisions.Add(child)
GetChildren(child)
End If
Next
}

hadoop cascading how to get top N tuples

New to cascading, trying to find out a way to get top N tuples based on a sort/order. for example, I'd like to know the top 100 first names people are using.
here's what I can do similar in teradata sql:
select top 100 first_name, num_records
from
(select first_name, count(1) as num_records
from table_1
group by first_name) a
order by num_records DESC
Here's similar in hadoop pig
a = load 'table_1' as (first_name:chararray, last_name:chararray);
b = foreach (group a by first_name) generate group as first_name, COUNT(a) as num_records;
c = order b by num_records DESC;
d = limit c 100;
It seems very easy to do in SQL or Pig, but having a hard time try to find a way to do it in cascading. Please advise!
Assuming you just need the Pipe set up on how to do this:
In Cascading 2.1.6,
Pipe firstNamePipe = new GroupBy("topFirstNames", InPipe,
new Fields("first_name"),
);
firstNamePipe = new Every(firstNamePipe, new Fields("first_name"),
new Count("num_records"), Fields.All);
firstNamePipe = new GroupBy(firstNamePipe,
new Fields("first_name"),
new Fields("num_records"),
true); //where true is descending order
firstNamePipe = new Every(firstNamePipe, new Fields("first_name", "num_records")
new First(Fields.Args, 100), Fields.All)
Where InPipe is formed with your incoming tap that holds the tuple data that you are referencing above. Namely, "first_name". "num_records" is created when new Count() is called.
If you have the "num_records" and "first_name" data in separate taps (tables or files) then you can set up two pipes that point to those two Tap sources and join them using CoGroup.
The definitions I used were are from Cascading 2.1.6:
GroupBy(String groupName, Pipe pipe, Fields groupFields, Fields sortFields, boolean reverseOrder)
Count(Fields fieldDeclaration)
First(Fields fieldDeclaration, int firstN)
Method 1
Use a GroupBy and group them base on the columns required and u can make use of secondary sorting that is provided by the cascading ,by default it provies them in ascending order ,if we want them in descing order we can do them by reverseorder()
To get the TOP n tuples or rows
Its quite simple just use a static variable count in FILTER and increment it by 1 for each tuple count value increases by 1 and check weather it is greater than N
return true when count value is greater than N or else return false
this will provide the ouput with first N tuples
method 2
cascading provides an inbuit function unique which returns firstNbuffer
see the below link
http://docs.cascading.org/cascading/2.2/javadoc/cascading/pipe/assembly/Unique.html

linq - parent/child query to select only lowest level

Given a common parent/child table:
Table A
Column Id int
Column Parent_Id int
Column Description text
I would like to only get the nodes that does not have any child nodes.
1,null,"PARENT A"
2,null,"PARENT B",
3,null,"PARENT C",
100,1,"CHILD A1",
101,1,"CHILD A2",
102,2,"CHILD B1"
So for my resultset I would like to only get:
Parent C (as it does not have any child elements), and child A1, B2, B1.
You don't say what exactly you are querying with LINQ, but the general idea is
var leafNodes = nodes.Where(n => nodes.Count(n1 => n1.Parent_Id == n.Id) == 0);
You might wanna prefer Any() method instead of Count() == 0. See Which method performs better: .Any() vs .Count() > 0?
var itemsWithoutChildren = nodes.Where(item=>!nodes.Any(innerItem=>innerItem.Parent_Id==item.Id))

Recursive Linq Grouping

Scenario:
I have database table that stores the hierarchy of another table's many-to-many relationship. An item can have multiple children and can also have more than one parent.
Items
------
ItemID (key)
Hierarchy
---------
MemberID (key)
ParentItemID (fk)
ChildItemID (fk)
Sample hierarchy:
Level1 Level2 Level3
X A A1
A2
B B1
X1
Y C
I would like to group all of the child nodes by each parent node in the hierarchy.
Parent Child
X A1
A2
B1
X1
A A1
A2
B B1
X1
Y C
Notice how there are no leaf nodes in the Parent column, and how the Child column only contains leaf nodes.
Ideally, I would like the results to be in the form of IEnumerable<IGrouping<Item, Item>> where the key is a Parent and the group items are all Children.
Ideally, I would like a solution that the entity provider can translate in to T-SQL, but if that is not possible then I need to keep round trips to a minimum.
I intend to Sum values that exist in another table joined on the leaf nodes.
Since you are always going to be returning ALL of the items in the table, why not just make a recursive method that gets all children for a parent and then use that on the in-memory Items:
partial class Items
{
public IEnumerable<Item> GetAllChildren()
{
//recursively or otherwise get all the children (using the Hierarchy navigation property?)
}
}
then:
var items =
from item in Items.ToList()
group new
{
item.itemID,
item.GetAllChildren()
} by item.itemID;
Sorry for any syntax errors...
Well, if the hierarchy is strictly 2 levels you can always union them and let LINQ sort out the SQL (it ends up being a single trip though it needs to be seen how fast it will run on your volume of data):
var hlist = from h in Hierarchies
select new {h.Parent, h.Child};
var slist = from h in Hierarchies
join h2 in hlist on h.Parent equals h2.Child
select new {h2.Parent, h.Child};
hlist = hlist.Union(slist);
This gives you an flat IEnumerable<{Item, Item}> list so if you want to group them you just follow on:
var glist = from pc in hlist.AsEnumerable()
group pc.Child by pc.Parent into g
select new { Parent = g.Key, Children = g };
I used AsEnumerable() here as we reached the capability of LINQ SQL provider with attempting to group a Union. If you try it against IQueryable it will run a basic Union for eligable parents then do a round-trip for every parent (which is what you want to avoid). Whether or not its ok for you to use regular LINQ for the grouping is up to you, same volume of data would have to come through the pipe either way.
EDIT: Alternatively you could build a view linking parent to all its children and use that view as a basis for tying Items. In theory this should allow you/L2S to group over it with a single trip.

How can I merge two outputs of two Linq queries?

I'm trying to merge these two object but not totally sure how.. Can you help me merge these two result objects?
//
// Create Linq Query for all segments in "CognosSecurity"
//
var userListAuthoritative = (from c in ctx.CognosSecurities
where (c.SecurityType == 1 || c.SecurityType == 2)
select new {c.SecurityType, c.LoginName , c.SecurityName}).Distinct();
//
// Create Linq Query for all segments in "CognosSecurity"
//
var userListAuthoritative3 = (from c in ctx.CognosSecurities
where c.SecurityType == 3 || c.SecurityType == 0
select new {c.SecurityType , c.LoginName }).Distinct();
I think I see where to go with this... but to answer the question the types of the objects are int, string, string for SecurityType, LoginName , and SecurityName respectively
If you're wondering why I have them broken like this is because I want to ignore one column when doing a distinct. Here are the SQL queries that I'm converting to SQL.
select distinct SecurityType, LoginName, 'Segment'+'-'+SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType =1
select distinct SecurityType, LoginName, 'Business Line'+'-'+SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType =2
select distinct SecurityType, LoginName, SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType in (1,2)
You can't join these because the types are different (first has 3 properties in the resulting type, second has two).
If you can tolerate putting a null value in for the 3rd result of the second query this will help. I would then suggest you just do a userListAuthoritative.concat(userListAuthoritative3 ) BUT I think this will not work as the anonymous types generated by the linq will not be of the same class, even tho the structure is the same. To solve that you can either define a CustomType to encapsulate the tuple and do select new CustomType{ ... } in both queries or postprocess the results using select() in a similar fashion.
Acutally the latter select() approach will also allow you to solve the parameter count mismatch by implementing the select with a null in the post-process to CustomType.
EDIT: According to the comment below once the structures are the same the anonymous types will be the same.
I assume that you want to keep the results distinct:
var merged = userListAuthoritative.Concat(userListAuthoritative3).Distinct();
And, as Mike Q pointed out, you need to make sure that your types match, either by giving the anonymous types the same signature, or by creating your own POCO class specifically for this purpose.
Edit
If I understand your edit, you want your Distinct to ignore the SecurityName column. Is that correct?
var userListAuthoritative = from c in ctx.CognosSecurities
where new[]{0,1,2,3}.Contains(c.SecurityType)
group new {c.SecurityType, c.LoginName, c.SecurityName}
by new {c.SecurityType, c.LoginName}
select g.FirstOrDefault();
I'm not exactly sure what you mean by merge, since you're returning different (anonymous) types from each one. Is there a reason the following doesn't work for you?
var userListAuthoritative = (from c in ctx.CognosSecurities
where (c.SecurityType == 1 || c.SecurityType == 2 || c.SecurityType == 3 || c.SecurityType == 0)
select new {c.SecurityType, c.LoginName , c.SecurityName}).Distinct();
Edit: This assumed they were of the same type -- but they're not.
userListAuthoritative.Concat(userListAuthoritative3);
Try below code, you might need to implement IEqualityComparer<T> in your ctx type.
var merged = userListAuthoritative.Union(userListAuthoritative3);

Resources