Neo4J: Expand node with children childrens (Performance) - performance

I thought this would be very easy, because i have a typical graph use case: Expand a node.
This is easy if there are no additional requirements:
MATCH (s:Entity)-[]-(dest) WHERE s._id = 'xxx'
RETURN dest
Problem Nr.1: sometimes there are many children so i want to limit the return count!
MATCH (s:Entity)-[]-(dest) WHERE s._id = 'xxx'
RETURN dest
LIMIT 100
Additional requirement: Return all children ids of the children childrens!
MATCH (s:Entity)-[]-(dest) WHERE s._id = 'xxx'
WITH collect(dest) as childrenSource
LIMIT 100
MATCH (childrenSource)-[]-(childDestination)
RETURN childrenSource as expandNode, collect(childDestination) as childrenIds
LIMIT 100
Problem 2: The limits are in the wrong place, because collect already did the collection before the limit.
Possible solution:
MATCH (s:Entity)-[]-(dest) WHERE s._id = 'xxx'
WITH collect(dest)[..100] as childrenSource
LIMIT 100
MATCH (childrenSource)-[]-(childDestination)
RETURN childrenSource as expandNode, collect(childDestination)[..100] as childrenIds
But i dont thinks this is a performant solution. Because it takes quite a lot of time
Exact Problem description: If i have 1 node with 1000 children and each child has another 1000 children i want to execute a query which returns 100 children with 100 child ids
-------------------------------------------------
| node 1 | child id 1_1,.... child id 1_100 |
| node 2 | child id 2_1,.... child id 2_100 |
| ... | ... |
| node 100 | child id 100_1,.. child id 100_100 |
-------------------------------------------------
Other solution: i do a simple expand for the node. and than i call an expand on each child node. But doing 101 queries instead of 1 query sounds not too performant either.

EDIT
As usual, APOC Procedures to the rescue. Using apoc.cypher.run(), you can use LIMIT within a subquery, which lazy-load the expansion up to your limit.
MATCH (s:Entity)-[]-(dest) WHERE s._id = 'xxx'
WITH dest
LIMIT 100
CALL apoc.cypher.run('
MATCH (dest)-[]-(childDestination)
RETURN childDestination LIMIT 100
', {dest:dest}) YIELD value
RETURN dest as expandNode, COLLECT(value.childDestination) as childrenIds

Using cypher would help you
MATCH (entity1)-[rel]-(entity2) WHERE entity1.title = "something"
WITH entity2
LIMIT 100
CALL apoc.cypher.run('
MATCH (entity2)-[]-(childDestination)
RETURN childDestination
LIMIT 100
', {entity2:entity2}) YIELD value
RETURN entity2 as expandNode, COLLECT(value.childDestination) as childrenId

Related

Powershell csv calculate total sum

I am currently working on powershell. Powershell is new for me so its kind of hard to figure out this one.
I have three headers in my csv files.
Headers include: Name, MessageCount and Direction.
Names are email addresses and those addresses are all the same. Direction have "Inbound" and "Outbound". MessageCount are bunch of diffrent numbers:
Overview
I want to calculate those number so i get "Inbound" and "Outbound" Totals and emails on those rows.
I am trying to foreach loop out MessageCount and calculate those together it will only give me output like this :
MessageCount
Try something like this:
$data = Import-Csv "path-to-your-csv-file";
$data | group Name
| select Name,
#{n = "Inbound"; e = {
(($_.Group | where Direction -eq "Inbound").MessageCount | Measure-Object -Sum).Sum }
},
#{n = "Outbound"; e = {
(($_.Group | where Direction -eq "Outbound").MessageCount | Measure-Object -Sum).Sum }
}
Code explanation
group Name groups results by property Name - in this case, email address. More here
select allows select property from object or create custom with #{n="";e={}}. More here
($_.Group | where Direction -eq "Outbound").MessageCount gets data from the group, searches for rows with Direction equal to Outbound and then gets the MessageCount from found rows.
Measure-Object -Sum takes array and creates object with properties ie. sum of values in array, so we get sum of MessageCount and return as custom property in object.

Optimizing neo4j query with multiple merges

What I need to do
For each query I need to find the user by device_id, or create a new node if doesn't exist. And for each, I need to update/create a few edges if rows contains certain properties. The load is massive(about 20k per second) and neo4j slows down. Each batch size is exactly 20k. Here is my query:
UNWIND {batch} as row
MERGE (m:User {device_id: row.device_id})
FOREACH (ignore IN CASE WHEN row.type IS NOT NULL THEN [1] ELSE [] END |
MERGE (e:Event {type: row.type})
MERGE (m) -[r:REL]-> (e)
SET r.count = ( CASE r.count WHEN NULL THEN 1 ELSE r.count + 1 END)
)
FOREACH (ignore IN CASE WHEN row.country IS NOT NULL THEN [1] ELSE [] END |
MERGE (c:Country {id: row.country})
MERGE (m) -[:Belongs]-> (c)
)
WITH m, ( CASE row.user_id WHEN NULL THEN m.user_id ELSE row.user_id END) AS user_id
SET m.user_id = user_id
I solved this issue by decreasing the batch size to 5k and running a few of those inside a transaction in parallel before committing.

organization find all children algorithm

So i am creating a system where users are able to build their own organization structure meaning that all organizations will most likely be different.
My setup is that an organization consists of different divisions. In my division table i have a value called parent_id that points to a division who is the current divisions parent.
a setup might look something like this (Paint drawing)
as you can see from the drawing division 2 and 3 are children of division 1 therefore they both have the value parent_id = 1
division 4 is a child of id 2 and has two children (5 & 6)
now to the tricky part because of the structure in my system i need access to all children and the childrens children in my system depending on a root node.
So for example if i want to know all of the children of division 1 the result should be [2,3,4,5,6]
Now my question is. how will i find all children connected?
At first i thought something like this
root = 1;
while(getChildren(root) != null)
{
}
function getChildren(root)
{
var result = 'select * from division where parent_id = '+root;
if(result != null)
{
root = result;
}
return result;
}
please note this is only an example of using a while loop to get through the list
However this would not work when the result of the statement returns two children
So my question is how do i find all children of any root id with the above setup?
You could use a recursive function. Be careful, and keep track of the children you have found so if you run into them again you stop and error - otherwise you will end up in an infinite loop.
I don't know what language you are using, so here's some psuedocode:
create dictionaryOfDivisions
dictionaryOfDivisions.Add(currentDivision)
GetChildren(currentDivision)
Function GetChildren(thisDivision) {
theseChildren = GetChildrenFromDB(thisDivision)
For each child in theseChildren
If dictionaryOfDivisions.Exists(child)
'Oops, here's a loop! Error
Exit
Else
dictionaryOfDivisions.Add(child)
GetChildren(child)
End If
Next
}

Neo4j cypher complicated query sort ,count, sum before collect

I am newbie with neo4j db and just started learning it, looking for some help, because I am stuck. Is it possible to get it in one cypher query? how?
my graph structure looks like that:
(s:Store)-[r:RELEASED]->(m:Movie)<-[r1:ASSIGNED]-(cat:MovieCategorie)
How I could get this data?
Movie store (got)
Movie (got)
Most common 5 categories of movies in that store (I don't know how to sort them before using collect(cat.name)[0..5])
Anyone could suggest how to get this data? I tried lots of times and failed, this is what I got and it doesn't work.
match (s:Store)
with s
match (s)-[r:RELEASED]->(m:Movie)
with s,m
match (m)<-[r1:ASSIGNED]-(cat:MovieCategorie)
with s, m, count(r1) as stylesCount, cat
order by stylesCount
return distinct s as store, collect(cat.name)[0..5] as topCategories
order by store.name
Thank you!
Ok, so as I got my query right and I am developing this query further, got some problem by combining multiple aggregation functions COUNT and SUM.
My query witch works well for finding top 5 categories per store:
MATCH (s:Store)
OPTIONAL MATCH (s)-[:RELEASED]->(m:Movie)<-[r:ASSIGNED]-(cat:MovieCategorie)
WITH s, COUNT(r) AS count, cat
ORDER BY count DESC
RETURN c AS Store, COLLECT(distinct cat.name) AS `Top Categories`
ORDER BY Store.name
On top of this query I need count how much views this store has sum(m.viewsCount) as Total store views. I tried to add in to same WITH statement as COUNT is, and tried to put it in return, In both scenarios it doesn't work how I would like to. Any suggestions, examples? I am still confused how WITH with aggregation functions works... :(
create example database
CREATE (s1:Store) SET s1.name = 'Store 1'
CREATE (s2:Store) SET s2.name = 'Store 2'
CREATE (s3:Store) SET s3.name = 'Store 3'
CREATE (m1:Movie) SET m1.title = 'Movie 1', m1.viewsCount = 50
CREATE (m2:Movie) SET m2.title = 'Movie 2', m2.viewsCount = 50
CREATE (m3:Movie) SET m3.title = 'Movie 3', m3.viewsCount = 50
CREATE (m4:Movie) SET m4.title = 'Movie 4', m4.viewsCount = 50
CREATE (m5:Movie) SET m5.title = 'Movie 5', m5.viewsCount = 50
CREATE (c1:MovieCategorie) SET c1.name = 'Cat 1'
CREATE (c2:MovieCategorie) SET c2.name = 'Cat 2'
CREATE (c3:MovieCategorie) SET c3.name = 'Cat 3'
CREATE (m1)<-[:ASSIGNED]-(c1)
CREATE (m1)<-[:ASSIGNED]-(c3)
CREATE (m2)<-[:ASSIGNED]-(c2)
CREATE (m3)<-[:ASSIGNED]-(c1)
CREATE (m3)<-[:ASSIGNED]-(c2)
CREATE (m3)<-[:ASSIGNED]-(c3)
CREATE (m4)<-[:ASSIGNED]-(c1)
CREATE (m4)<-[:ASSIGNED]-(c3)
CREATE (m5)<-[:ASSIGNED]-(c3)
CREATE (s1)-[:RELEASED]->(m1)
CREATE (s1)-[:RELEASED]->(m3)
CREATE (s1)-[:RELEASED]->(m4)
CREATE (s1)-[:RELEASED]->(m5)
CREATE (s2)-[:RELEASED]->(m1)
CREATE (s2)-[:RELEASED]->(m2)
CREATE (s2)-[:RELEASED]->(m3)
CREATE (s2)-[:RELEASED]->(m4)
CREATE (s2)-[:RELEASED]->(m5)
CREATE (s3)-[:RELEASED]->(m1)
SOLVED!! FINALLY I DID IT! Trick was use one more match after everything , great - now I can sleep in peace. Thank you.
MATCH (s:Store)-[:RELEASED]->(m:Movie)<-[r:ASSIGNED]-(cat:MovieCategorie)
with s,count(r) as catCount, cat
order by catCount desc
with s, collect( distinct cat.name)[0..5] as TopCategories
match (s)-[:RELEASED]->(m:Movie)
return s as Store, TopCategories, sum(m.viewsCount) as TotalViews
Ok, that was fast :D I finally got it!
match (s:Store)
with s
match (s)-[r:PUBLISHED]->(m:Movie)
with s
match (s)<-[r2:ASSIGNED]-(cat:MovieCategorie)
with s, count(r2) as stylesCount, cat
order by stylesCount desc
return distinct s, collect(distinct cat.name)[0..5] as topCategories
order by s.name
So trick is first count() in with , then order by that with, and collect DISTINCT in return. I am not so sure about these mutiple with statements, will try to clean it up. ;)
MATCH (s:Store)-[:RELEASED]->(:Movie)<-[:ASSIGNED]-(cat:MovieCategorie)
WITH s, COUNT(cat) AS count, cat
ORDER BY s.name, count DESC
RETURN s.name AS Store, COLLECT(cat.name)[0..5] AS `Top Categories`
And if you want the sum of the viewsCount property from the Movie nodes per store:
MATCH (s:Store)-[:RELEASED]->(m:Movie)<-[:ASSIGNED]-(cat:MovieCategorie)
WITH s, COUNT(cat) AS count, m, cat
ORDER BY s.name, count DESC
RETURN s.name AS Store, COLLECT(cat.name)[0..5] AS `Top Categories`, SUM(m.viewsCount) AS `Total Views`

Groovy tree-like sort

I have a problem with sorting rows from db having tree-like hierarchy. Each row contains three columns meaningful to this problem: id, parent, lp. Id is String, parent is another row and lp is a number used to sort rows having no parent-child relationship. Each row can have any number of children and only one parent (null on top level)
There are three situations I see:
when first row is parent of another: -1 is returned
when first row is child of a parent with lower lp than another row::
-1 is returned
when none of those relations exist (also when rows have same parent and are on the same level) : to lps of rows are compared
I manadged to write this code that I think should solve the problem but it doesnt work for rows that are deep in hierarchy and it messes the order :
dane = dane.sort {it1, it2 ->
it1 == it2.parent ? -1 :
it1.parent && it1.parent.lp < it2.lp ? -1 :
it1.lp - it2.key.lp
}
I'd appreciate any suggestions. Thx in advance!
Your comparison should be consistent regardless of the order of the arguments. If the arguments are a = it1 and b = it2, the result should be the negation of b = it1 and a = it2. It doesn't look like that's the case here. For example, the case where it1.parent == it2.

Resources