Neo4j query performance with memory tuning - performance

I have a graph (User-[Likes]->Item) with millions nodes and billions nodes (roughly 50G in disk) built on a powerful machine with 256G RAM and 40 cores. Currently, I'm computing the allshortestpath() between two items.
To improve the cypher query performance, I set dbms.pagecache.memory=100g and wrapper.java.additional=-Xmx32g, with the hope that the whole neo4j can be loaded into meomory. However, when I execute the shortestpath query, the CPU usage is 1625% while MEMORY usage is only 5.7%, and I didn't see performance improvements on the cypher query. Am I missing something in the setting? Or can I setup something to run the query faster? I have read the Performance Tuning guide in the developer manual but didn't find solution.
EDIT1:
The cypher query is to count the number of unique users that like both two items. The full pattern would be (Brand)-[:Has]->(Item)<-[:LIKES]-(User)-[:LIKES]->(Item)<-[:HAS]-(Brand)
profile
MATCH p = allShortestPaths((p1:Brand {FID:'001'})-[*..4]-(p2:Brand {FID:'002'}))
with [r in RELS(p)|type(r)] as relationshipPath,
[n in nodes(p)|id(n)][2] as user, p1, p2
return p1.FID, p2.FID, count(distinct user);
EDIT2:
Below is a sampler query plan. It now seems that I'm not using shortestsPath efficiently (380,556,69 db hits). I use shortestsPath to get the common user node between start/end nodes, and then use count(distinct) to get the unique user. Is it possible to tell cypher to eliminate paths which contain the node that have been visited before?

Can you try to run this instead:
MATCH (p1:Brand {FID:'001'}),(p2:Brand {FID:'002'})
MATCH (u:User)
WHERE (p1)-[:Has]->()<-[:LIKES]-(u) AND
(p2)-[:Has]->()<-[:LIKES]-(u)
RETURN p1,p2,count(u);
This starts at the user and checks against both brands, the explain plan looks much better
+----------------------+----------------+------------------------------------------+---------------------------+
| Operator | Estimated Rows | Variables | Other |
+----------------------+----------------+------------------------------------------+---------------------------+
| +ProduceResults | 0 | count(u), p1, p2 | p1, p2, count(u) |
| | +----------------+------------------------------------------+---------------------------+
| +EagerAggregation | 0 | count(u) -- p1, p2 | p1, p2 |
| | +----------------+------------------------------------------+---------------------------+
| +SemiApply | 0 | p2 -- p1, u | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +Expand(Into) | 0 | anon[78] -- anon[87], anon[89], p1, u | (p1)-[:Has]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Expand(All) | 0 | anon[87], anon[89] -- p1, u | (u)-[:LIKES]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Argument | 1 | p1, u | |
| | +----------------+------------------------------------------+---------------------------+
| +SemiApply | 0 | p1 -- p2, u | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +Expand(Into) | 0 | anon[119] -- anon[128], anon[130], p2, u | (p2)-[:Has]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Expand(All) | 0 | anon[128], anon[130] -- p2, u | (u)-[:LIKES]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Argument | 1 | p2, u | |
| | +----------------+------------------------------------------+---------------------------+
| +CartesianProduct | 0 | u -- p1, p2 | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +CartesianProduct | 0 | p2 -- p1 | |
| | |\ +----------------+------------------------------------------+---------------------------+
| | | +Filter | 0 | p1 | p1.FID == { AUTOSTRING0} |
| | | | +----------------+------------------------------------------+---------------------------+
| | | +NodeByLabelScan | 0 | p1 | :Brand |
| | | +----------------+------------------------------------------+---------------------------+
| | +Filter | 0 | p2 | p2.FID == { AUTOSTRING1} |
| | | +----------------+------------------------------------------+---------------------------+
| | +NodeByLabelScan | 0 | p2 | :Brand |
| | +----------------+------------------------------------------+---------------------------+
| +NodeByLabelScan | 0 | u | :User |
+----------------------+----------------+------------------------------------------+---------------------------+

Related

Another Combination

Very similar to my last question, now I want only the, "full combination," for a group in order of priority. So, from this source table:
+-------+-------+----------+
| GROUP | State | Priority |
+-------+-------+----------+
| 1 | MI | 1 |
| 1 | IA | 2 |
| 1 | CA | 3 |
| 1 | ND | 4 |
| 1 | AZ | 5 |
| 2 | IA | 2 |
| 2 | NJ | 1 |
| 2 | NH | 3 |
And so on...
I need a query that returns:
+-------+---------------------+
| GROUP | COMBINATION |
+-------+---------------------+
| 1 | MI, IA, CA, ND, AZ |
| 2 | NJ, IA, NH |
+-------+---------------------+
Thanks for the help, again!
Use listagg() ordering by priority within the group.
SELECT "GROUP",
listagg("STATE", ', ') WITHIN GROUP (ORDER BY "PRIORITY")
FROM "ELBAT"
GROUP BY "GROUP";
db<>fiddle

Slow aggregation on big neo4j graph

Configuration:
Windows 8.1
neo4j-enterprise-2.2.0-M03
cache type: hpc
8Gb RAM
6Gb for JVM Heap (wrapper.java.initmemory=6144 wrapper.java.maxmemory=6144)
5Gb out of 6Gb of JVM Heap for mapped memory (dbms.pagecache.memory=5G)
Model:
Model represents how users navigate through website.
27 522 896 nodes (394Mb)
111 294 796 relationships (3609Mb)
33 906 363 properties (1326Mb)
293 (:Page) nodes
27522603 (:PageView) nodes
0 (:User) nodes (not load yet)
each (:PageView) node connected with (:Page) node
each (:PageView) node connected with next (:PageView) node
each (:PageView) node connected with (:User) node (not yet)
Query
match (:Page {Name:'#########.aspx'})<-[:At]-(:PageView)-[:Next]->(:PageView)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
195363 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(All)(0)
|
+Filter(1)
|
+Expand(All)(1)
|
+Filter(2)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID105, FRESHID110, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID105, FRESHID110 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID105, FRESHID110 | |
| Projection(1) | 776404 | 35941815 | 71883630 | FRESHID105, p | |
| Filter(0) | 776404 | 35941815 | 35941815 | p | (NOT(anon[38] == anon[78]) AND hasLabel(p:Page)) |
| Expand(All)(0) | 776404 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Filter(1) | 384001 | 13345621 | 13345621 | | hasLabel(anon[67]:PageView) |
| Expand(All)(1) | 384001 | 13345621 | 19478500 | | ()-[:Next]->() |
| Filter(2) | 189923 | 6132879 | 6132879 | | hasLabel(anon[46]:PageView) |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
Total database accesses: 202202762
Query without unnecessary labels
match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
166751 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID82, FRESHID87, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID82, FRESHID87 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID82, FRESHID87 | |
| Projection(1) | 776388 | 35941815 | 71883630 | FRESHID82, p | |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
Total database accesses: 146782447
Message.log
Question
How can I perform this query much faster? (more RAM, refactor query, distributed cache, use another language/shell/method, ...)
UPD:
Profile info for last query in answer
neo4j-sh (?)$ profile match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count;
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "OutgoingDocumentsList.aspx" | 5172680 |
| "DocumentPreview.aspx" | 3846455 |
| "Dashboard.aspx" | 3579022 |
| "ActualTasks.aspx" | 3051043 |
| "DocumentFillMissingRequisites.aspx" | 1713004 |
| "EditDocument.aspx" | 1373928 |
| "PaymentsList.aspx" | 1338063 |
| "Login.aspx" | 1285447 |
| "ReportingRequisites.aspx" | 884077 |
| "ContractorInfo.aspx" | 759665 |
+------------------------------------------------+
10 rows
151328 ms
Compiler CYPHER 2.2
Planner COST
Projection
|
+Top
|
+EagerAggregation
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Projection | 881 | 10 | 20 | count, p, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | count, p | { AUTOINT1}; count |
| EagerAggregation | 881 | 173 | 0 | count, p | p |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+------------------+---------------------------+
Total database accesses: 74898837
As I mentioned before, in your other question, if you can write a Java based server extension you can do it pretty easily.
// initialize counters
Map<Node,AtomicInteger> pageCounts = new HashMap<>(300);
for (Node page : graphDb.findNode(Page)) pageCounts.put(page,new AtomicInteger());
// find start page
Label Page = DynamicLabel.label("Page");
Node page = graphDB.findNode(Page,"Name",pageName).iterator().next();
// follow page-view relationships
for (Relationship at : page.getRelationships(At, INCOMING)) {
// follow singular next relationship
Relationship at2 = at.getStartNode().getSingleRelationship(Next,OUTGOING);
if (at2==null) continue;
// follow singular page-view relationship to end-page
Node page2 = at2.getSingleRelationship(At,OUTGOING).getEndNode();
// increment counter
pageCounts.get(page2).incrementAndGet();
}
// sort pages by count descending
List pages = new ArrayList(pageCounts.entrySet())
Collections.sort(pages,new Comparator<Map.Entry<Node,Integer>>() {
public int compare(Map.Entry<Node,Integer> e1, Map.Entry<Node,Integer> e2) {
return - Integer.compare(e1.getValue(),e2.getValue());
}
});
// return top 10
return pages.subList(0,10);
For Cypher I would try something like this:
match (:Page {Name:'#########.aspx'})<-[:At]-(pv:PageView)
WITH distinct pv
MATCH (pv)-[:Next]->(pv2:PageView)
with distinct pv2
match (pv2)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Update
I wrote a test for it and ran it on my bigger linux machine, the results there are much more sensible: between 1.6s in Java and 5s max in Cypher.
Here is the code and the results: https://gist.github.com/jexp/94f75ddb849f8c41c97c
In Cypher:
-------------------
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
+-------------------+
| p.Name | count |
+-------------------+
| "Page169" | 975 |
| "Page125" | 959 |
| "Page106" | 955 |
| "Page274" | 951 |
| "Page176" | 947 |
| "Page241" | 944 |
| "Page30" | 942 |
| "Page44" | 938 |
| "Page1" | 938 |
| "Page118" | 938 |
+-------------------+
10 rows
in 3212 ms
[Compiler CYPHER 2.2
Planner COST
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Top | 488 | 10 | 0 | FRESHID71, FRESHID76 | { AUTOINT1}; |
| EagerAggregation | 488 | 300 | 0 | FRESHID71, FRESHID76 | |
| Projection | 238460 | 264828 | 529656 | FRESHID71, p | |
| Filter | 238460 | 264828 | 0 | p | NOT(anon[29] == anon[51]) |
| Expand(All)(0) | 238460 | 264828 | 529656 | p | ()-[:At]->(p) |
| Expand(All)(1) | 238460 | 264828 | 778522 | | ()-[:Next]->() |
| Expand(All)(2) | 476922 | 513694 | 513695 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
Total database accesses: 2351530]
And in Java:
-------------------
Java took 1618 ms
Node[169]=975
Node[125]=959
Node[106]=955
Node[274]=951
Node[176]=947
Node[241]=944
Node[30]=942
Node[1]=938
Node[44]=938
Node[118]=938
Something you can also do to speed up your Cypher query, is to only aggregate on the nodes, and only return the page.Name property for the last 10 rows, much faster.
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count

Is it normal query performance?

I have next graph model:
(:PaveView {Number:int, Page:string}), (:Page {Name:string})
(:PageView)-[:At]->(:Page)
(:PageView)-[:Next]->(:PageView)
Schema:
Indexes
ON :Page(Name) ONLINE (for uniqueness constraint)
ON :PageView(Page) ONLINE
ON :PageView(Revision) ONLINE (for uniqueness constraint)
Constraints
ON (pageview:PageView) ASSERT pageview.Number IS UNIQUE
ON (page:Page) ASSERT page.Name IS UNIQUE
I want to do something similar to this post
I have tried to find popular paths without loops of this structure:
(:PageView)-[:Next*2]->(:PageView)
That my tries:
1. Nicole White's method from post
MATCH p = (:PageView)-[:Next*2]->(:PageView)
WITH p, EXTRACT(v IN NODES(p) | v.Page) AS pages
UNWIND pages AS views
WITH p, COUNT(DISTINCT views) AS distinct_views
WHERE distinct_views = LENGTH(NODES(p))
RETURN EXTRACT(v in NODES(p) | v.Page), count(p)
ORDER BY count(p) DESC
LIMIT 10;
profile output:
10 rows
177270 ms
Compiler CYPHER 2.2-rule
ColumnFilter(0)
|
+Extract(0)
|
+ColumnFilter(1)
|
+Top
|
+EagerAggregation(0)
|
+Extract(1)
|
+ColumnFilter(2)
|
+Filter(0)
|
+Extract(2)
|
+ColumnFilter(3)
|
+EagerAggregation(1)
|
+UNWIND
|
+ColumnFilter(4)
|
+Extract(3)
|
+ExtractPath
|
+Filter(1)
|
+TraversalMatcher
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
| ColumnFilter(0) | 10 | 0 | EXTRACT(v in NODES(p) | v.Page), count(p) | keep columns EXTRACT(v in NODES(p) | v.Page), count(p) |
| Extract(0) | 10 | 0 | FRESHID225, FRESHID258, EXTRACT(v in NODES(p) | v.Page), count(p) | EXTRACT(v in NODES(p) | v.Page), count(p) |
| ColumnFilter(1) | 10 | 0 | FRESHID225, FRESHID258 | keep columns , |
| Top | 10 | 0 | FRESHID225, INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | { AUTOINT0}; Cached( INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 of type Integer) |
| EagerAggregation(0) | 212828 | 0 | FRESHID225, INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | |
| Extract(1) | 1749120 | 10494720 | FRESHID225, distinct_views, p | |
| ColumnFilter(2) | 1749120 | 0 | distinct_views, p | keep columns distinct_views, p |
| Filter(0) | 1749120 | 0 | FRESHID196, distinct_views, p | CoercedPredicate(anon[196]) |
| Extract(2) | 2115766 | 0 | FRESHID196, distinct_views, p | |
| ColumnFilter(3) | 2115766 | 0 | distinct_views, p | keep columns p, distinct_views |
| EagerAggregation(1) | 2115766 | 0 | INTERNAL_AGGREGATEb0939c81-a40c-4012-afd6-4852b17cf2e4, p | p |
| UNWIND | 6347298 | 0 | p, pages, views | |
| ColumnFilter(4) | 2115766 | 0 | p, pages | keep columns p, pages |
| Extract(3) | 2115766 | 12694596 | p, pages | pages |
| ExtractPath | 2115766 | 0 | p | |
| Filter(1) | 2115766 | 2115766 | | hasLabel(anon[34]:PageView(0)) |
| TraversalMatcher | 2115766 | 16926150 | | , , , |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
Total database accesses: 42231232
2.
match (p1:PageView)-[:Next]->(p2:PageView)-[:Next]->(p3:PageView)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;
profile output:
10 rows
28660 ms
Compiler CYPHER 2.2-cost
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(0)
|
+Filter(1)
|
+Expand(1)
|
+NodeByLabelScan
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection(0) | 1241 | 10 | 0 | FRESHID146, [p1.Page,p2.Page,p3.Page], count | [p1.Page,p2.Page,p3.Page], count |
| Top | 1241 | 10 | 0 | FRESHID146, count | { AUTOINT0}; count |
| EagerAggregation | 1241 | 212828 | 0 | FRESHID146, count | |
| Projection(1) | 1542393 | 1749120 | 10494720 | FRESHID146, p1, p2, p3 | |
| Filter(0) | 1542393 | 1749120 | 17872173 | p1, p2, p3 | (((hasLabel(p3:PageView(0)) AND NOT(Property(p1,Page(3)) == Property(p3,Page(3)))) AND NOT(anon[20] == anon[43])) AND NOT(Property(p2,Page(3)) == Property(p3,Page(3)))) |
| Expand(0) | 1904189 | 1985797 | 3971596 | p1, p2, p3 | (p2)-[:Next]->(p3) |
| Filter(1) | 1904191 | 1985799 | 10578840 | p1, p2 | (NOT(Property(p1,Page(3)) == Property(p2,Page(3))) AND hasLabel(p2:PageView(0))) |
| Expand(1) | 2115767 | 2115768 | 4231538 | p1, p2 | (p1)-[:Next]->(p2) |
| NodeByLabelScan | 2115770 | 2115770 | 2115771 | p1 | :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3. (With loops!? And I don't know why! I suggested that if identifiers are different then nodes are different)
match (pv1:PageView)-[:Next]->(pv2:PageView)-[:Next]->(pv3:PageView),
(pv1)-[:At]->(p1),(pv2)-[:At]->(p2),(pv3)-[:At]->(p3)
RETURN [p1.Name,p2.Name,p3.Name], count(*) as count
ORDER BY count DESC
LIMIT 10;
profile output:
10 rows
27678 ms
Compiler CYPHER 2.2-cost
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(0)
|
+Filter(1)
|
+Expand(1)
|
+Filter(2)
|
+Expand(2)
|
+Filter(3)
|
+Expand(3)
|
+Expand(4)
|
+NodeByLabelScan
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
| Projection(0) | 1454 | 10 | 0 | FRESHID139, [p1.Name,p2.Name,p3.Name], count | [p1.Name,p2.Name,p3.Name], count |
| Top | 1454 | 10 | 0 | FRESHID139, count | { AUTOINT0}; count |
| EagerAggregation | 1454 | 223557 | 0 | FRESHID139, count | |
| Projection(1) | 2115760 | 2115764 | 12694584 | FRESHID139, p1, p2, p3, pv1, pv2, pv3 | |
| Filter(0) | 2115760 | 2115764 | 0 | p1, p2, p3, pv1, pv2, pv3 | (NOT(anon[116] == anon[80]) AND NOT(anon[80] == anon[98])) |
| Expand(0) | 2115760 | 2115764 | 4231530 | p1, p2, p3, pv1, pv2, pv3 | (pv1)-[:At]->(p1) |
| Filter(1) | 2115762 | 2115766 | 2115766 | p2, p3, pv1, pv2, pv3 | (hasLabel(pv1:PageView(0)) AND NOT(anon[21] == anon[45])) |
| Expand(1) | 2115762 | 2115766 | 4231532 | p2, p3, pv1, pv2, pv3 | (pv2)<-[:Next]-(pv1) |
| Filter(2) | 2115764 | 2115766 | 0 | p2, p3, pv2, pv3 | NOT(anon[116] == anon[98]) |
| Expand(2) | 2115764 | 2115766 | 4231534 | p2, p3, pv2, pv3 | (pv2)-[:At]->(p2) |
| Filter(3) | 2115766 | 2115768 | 2115768 | p3, pv2, pv3 | hasLabel(pv2:PageView(0)) |
| Expand(3) | 2115765 | 2115768 | 4231536 | p3, pv2, pv3 | (pv3)<-[:Next]-(pv2) |
| Expand(4) | 2115767 | 2115768 | 4231538 | p3, pv3 | (pv3)-[:At]->(p3) |
| NodeByLabelScan | 2115770 | 2115770 | 2115771 | pv3 | :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
System info:
windows 8.1
250G ssd
neo4j enterprise 2.2.0-M02
cache: hpc
ram: 8G
jvm heap size: 4G
memory mapping: 50%
149 (:Page) nodes
2115770 (:PageView) nodes
Why even the fastest of this three methods is so slow? (I guess that all my data is in RAM)
What is the best way to filter paths with loops?
By specifying labels for all identifiers, you force Cypher to open the node headers and filter all labels in it.
This is where the names of your relationships are important. Relationships are made to drive you into the graph, for performance there would be no need to specify the labels, so if your sure all nodes along the Path have the Pageview label, just omit it except for the start of your query :
match (p1:PageView)-[:Next]->(p2)-[:Next]->(p3)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;
I posted some query plan results in this answer related to your question : Neo4j: label vs. indexed property?

Creating a linkages map

I have an interesting programming problem that I need to solve for a iPhone app that I am currently building. The problem is actually a logic problem that does not need to be specific to any particular programming language.
The app needs to produce a linkages map (apologies if this isn't the right terminology but it makes sense to me). You have the following data:
A=C
B=A
C=O
D=F
E=F
F=G
G=D
H=J
I=L
J=N
K=A
L=O
M=C
N=H
O=E
The letters A through to O can be linked to any other letter. The app needs to follow the links to create a map, so starting with A, A link to C, C link to O, O links to E, E links to F etc
When complete this map would look like the attached photo.
http://i.stack.imgur.com/TEfAs.jpg
The problem I have is that I need to write code that will output any map using any combination of links. So for example another link list might look like
A=B
B=A
C=A
D=A
E=A
F=A
G=A
H=A
I=A
J=A
K=A
L=A
M=A
N=A
O=A
I can't get my head around the pseudocode / logic for drawing the app. There are always 15 letters A-O and a letter can never be linked to itself so A can never = A.
Can anyone help to come up with the logic for drawing the map?
What you want is to draw a graph. There is no canonical graphical representation of a graph. So if you have no constrains how the graph should be drawn, you can simply make a row of the Letters and than draw arches between the letters according to your map,
Little like this (ASCII-ART):
Example
+-----------------------------------------+
+--------------------------------------+ |
+-----------------------------------+ | |
+--------------------------------+ | | |
+-----------------------------+ | | | |
+--------------------------+ | | | | |
+-----------------------+ | | | | | |
+--------------------+ | | | | | | |
+-----------------+ | | | | | | | |
+--------------+ | | | | | | | | |
+-----------+ | | | | | | | | | |
+--------+ | | | | | | | | | | |
+-----+ | | | | | | | | | | | |
| | | | | | | | | | | | | |
A B C D E F G H I J K L M N O
| |
+--+
Example
+-----------------------------+
+-----------------------------+ |
+--+ +-----------------------------------+
| | | | +--------+ | |
A B C D E F G H I J K L M N O
| | | | | | | | | | | |
+-----+ | +--+ | +-----+ | +--------+
| +-----+ | | +-----------+
| | +--+ +-----------------+
| +--------+ |
+-----------------------------+
Look a bit confusing, but you cannot always avoid crossings. [In this example you could, but I did not try to avoid crossing, because they cannot be avoided in the general case.]

Need help with credit expiration algorithm

So I'm stuck. I am working on a credit system with expirations. Similar to credit card miles but not exactly. By the way I am sorry for the book ahead but I needed to add enough detail to help get the whole picture.
What I need is a system where a user accumulates credits for doing activities. But they can also spend these credits on activities. The credits should expire after 30 days if they are not used. I seem to be stuck on how to accurately calculate this in a batch that will run every night. Any ideas in any language would be greatly appreciated as I seem to be stuck on just one minor detail that I can't get around. Here is an example of the data:
7/1: +5 - user signs up
7/2: +5 - user interacts with system
7/2: -3 - user purchases activity
7/3: +5 - user interacts with system
So at this point the user has received 15 credits and has spent 3. Leaving him with a total of 12 credits. (At least I got basic math down :P)
I should add that currently we are playing with the idea of having two fields: last processed, next processed. So these values at this time assuming it was a new sign up are:
Last Processed Date: 7/1
Next Process Date: 8/1
So now 8/1 comes around. The batch starts and looks at all credits that are older than 30 days. Which at this point is 5.
This is where it starts to get fuzzy.
Then the system should look at all the credits that have been spent in the last 30 days to see if they are using any credits. Because they should only expire if they haven't been used. So there are 3. So I then deduct the user 2 credits because that is the difference of credits earned older than 30 days and what has been spent. So I finish the batch and set the dates accordingly for the next day. Now assuming they haven't spent anymore I start the calculation over of credits earned older than 30, which is 5 and credits spent which again is 3. But I obviously don't want to consider the 3 credits that I considered yesterday. What is a good approach to not include those 3 credits again for consideration.
That is where I am stuck.
We are thinking about writing a debit record for the expired credits so we can track them but having a hard time seeing how I can use it in this calculation.
If you read this far thank you. If you even make a somewhat effort in the answer I will at a minimum give you an up vote for effort.
EDIT:
Ok #Greg mentioned something that I forgot to address. The idea of putting a flag on the credits considered. A valid point but not one that can work because of the following scenario:
Let's say that on a particular day a user spends 10 credits. But the expired credits that the batch is considering only accumulated to 5. Well he should still have 5 more credits left over to not have expired because he spent more than a single expiration. So the flag wouldn't work because we would have skipped those 5 extra credits. Hope that makes sense?
For every user of the system keep an array, that stores information about the amount of credits available to the user for the next 30 consecutive days
For example the data for some user might look like this
8 |
7 | |
6 | | | |
5 | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
Every time the user earns some credits, You increase amounts for all days by the number of credits earned. For example if the user earns 2 credits the table changes as follows. It's like rising the whole graph up.
10 |
9 | |
8 | | | |
7 | | | | | | | | | | |
6 | | | | | | | | | | | | | | | | |
5 | | | | | | | | | | | | | | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
If The user has x credits today and spends y credits, You decrease the amount of credits available to him to x - y, for every day he has an amount greater than x - y. For days he has no more than x - y, the amount stays the same. It's like cutting the top of the graph off. For example if the user spends 3 credits the graph changes to
7 | | | | | | | | | | |
6 | | | | | | | | | | | | | | | | |
5 | | | | | | | | | | | | | | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
Every day You shift the graph to the left to model expiring credits. The user will have the following amounts tomorrow
7 | | | | | | | | | |
6 | | | | | | | | | | | | | | | |
5 | | | | | | | | | | | | | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
I wouldn't consider trying to process the data as you present it. Instead, you should keep track of how many credits the user has, and when they expire. That way you keep track of which credits were used when the purchase is made, instead of trying to work it all out later.
So when the user signs up, they have:
5 credits expiring on 8/1
After interacting with the system the next day:
5 credits expiring on 8/1
5 credits expiring on 8/2
After purchasing something:
2 credits expiring on 8/1
5 credits expiring on 8/2
And so on.
Assuming you run this batch on a daily basis, you can have a table that keeps track of all the credits they earned, and the credits they used (negative credits).
At the beginning of the next month, your job is simply to find out which of the credits earned on the first day were not spent during the month.
The number of credits earned on the first day - the credits they spent all of last month. If the number is positive, they have some credits that need to expired. So simple add a record in the table with a negative credit. This will zero-out the unused credits.
The next day, repeat the process by seeing how many credits they earned on the second day minus the sum of all the credits they earned in the last month, taking into account the record with the negative credits you created the previous day.
How about adding a flag to the expenditures? If the flag is not set, then you can include that expenditure in the batch, if necessary. If you do use the expenditure to offset an expiration, then you set the flag. Next time through, you'll ignore that expenditure because the flag is set.
Use a debit record to record normal expenditures. When the monthly batch job runs, it can calculate the total debits which are less than or equal to the expiring credits. If there are credits to expire, simply insert an appropriate debit record (appropriate == to cancel the excess, in your application). In this way, any 'running total' code which examines only credits and debits will reach the same balance that your batch code intended.
One approach to this problem is to store only the transactions, not the balance. Then you always calculate the balance in real time when needed. Here's the data:
Date : Amount : Expiries
7/1 : +5 : 7/31
7/2 : +5 : 8/1
7/2 : -3 : never
7/3 : +5 : 8/2
The balance at any time is simply the total of all transactions that have not yet expired. No need to run any batch processes.
Regarding Julians reply (that I can't comment to yet), I'm dealing with just the same problem and Julians approach won't work because that would result the account being able to go negative.
If the user didn't use the service for one month, on 8/4 the account balance would be -3 and one activity worth of 5 would bring the balance to 2, not to 5 as it should.

Resources