Elixir: to assign variable in for generator(variable scope?) - for-loop

I'm solving, find the largest prime factor of the number, Project Euler problem3.
Following Elixir code throw warnings, and do not evaluate in if block(assigning) I think:
num = 13195
range = num
|> :math.sqrt
|> Float.floor
|> round
for dv <- 2..range do
if rem(num, dv) == 0 and div(num, dv) != 1 do
num = div(num, dv)
end
end
num
|> IO.puts
Warnings are:
$ elixir 3.exs
warning: variable "num" is unused
3.exs:10
warning: the result of the expression is ignored (suppress the warning by assigning the expression to the _ variable)
3.exs:10
13195
$ elixir -v
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Elixir 1.5.3
How can I update(assign) the num?
(following Python and Javascript codes are work for same the problem):
# 3.py
from math import ceil, sqrt
num = 600851475143
for div in range(2, ceil(sqrt(num)) + 1):
if num%div == 0 and num/div != 1:
num /= div
assert int(num) == 6857
// 3.js
var num = 600851475143;
var range = Array.from({length: Math.trunc(Math.sqrt(num))}, (x, i) => i + 2)
for (const div of range) {
if (num%div === 0 && num/div != 1) {
num /= div;
}
}
var assert = require('assert');
assert(num === 6857)

You are actually creating a new variable and shadowing the one from outer scope
You can rewrite it like this
num = 13195
range =
num
|> :math.sqrt()
|> Float.floor()
|> round
num =
2..range
|> Enum.reduce(num, fn elem, acc ->
if rem(acc, elem) == 0 and div(acc, elem) != 1 do
div(acc, elem)
else
acc
end
end)
IO.puts num
More on shadowing:
+------------------------------------------------------------+
| Top level |
| |
| +------------------------+ +------------------------+ |
| | Module | | Module | |
| | | | | |
| | +--------------------+ | | +--------------------+ | |
| | | Function clause | | | | Function clause | | |
| | | | | | | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | | Comprehension | | | | | | Comprehension | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | +----------------+ | | ... | | +----------------+ | | |
| | | | Anon. function | | | | | | Anon. function | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | | Try block | | | | | | Try block | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | +--------------------+ | | +--------------------+ | |
| +------------------------+ +------------------------+ |
| |
+------------------------------------------------------------+
Any variable in a nested scope whose name coincides with a variable from the surrounding scope will shadow that outer variable. In other words, the variable inside the nested scope temporarily hides the variable from the surrounding scope, but does not affect it in any way.
source

Related

How can i use the functions of "survey" package in "expss" package in R

I try to use the expss packages for survey data analysis, but the result of standard errors, variances and confidence intervals differ of the survey package result.
In survey:
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svyby(~api99, ~stype, dclus1, svymean)
stype api99 se
E E 607.7917 22.81660
H H 595.7143 41.76400
M M 608.6000 32.56064
In expss:
apiclus1 %>% tab_cells(api99) %>%
tab_rows(stype) %>% tab_weight(pw) %>%
tab_stat_fun(w_mean,w_se) %>% tab_pivot()
| | | | | #Total |
| ----- | -- | ----- | ------ | ------ |
| stype | E | api99 | mean | 607.8 |
| | | | se | 1.6 |
| | H | api99 | mean | 595.7 |
| | | | se | 4.7 |
| | M | api99 | mean | 608.6 |
| | | | se | 3.7 |
How can i use the functions of survey package within expss?

Enumerating Cartesian product while minimizing repetition

Given two sets, e.g.:
{A B C}, {1 2 3 4 5 6}
I want to generate the Cartesian product in an order that puts as much space as possible between equal elements. For example, [A1, A2, A3, A4, A5, A6, B1…] is no good because all the As are next to each other. An acceptable solution would be going "down the diagonals" and then every time it wraps offsetting by one, e.g.:
[A1, B2, C3, A4, B5, C6, A2, B3, C4, A5, B6, C1, A3…]
Expressed visually:
| | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | | | | | | | | | | | | | | | | | |
| 2 | | 2 | | | | | | | | | | | | | | | | |
| 3 | | | 3 | | | | | | | | | | | | | | | |
| 4 | | | | 4 | | | | | | | | | | | | | | |
| 5 | | | | | 5 | | | | | | | | | | | | | |
| 6 | | | | | | 6 | | | | | | | | | | | | |
| 1 | | | | | | | | | | | | | | | | | | |
| 2 | | | | | | | 7 | | | | | | | | | | | |
| 3 | | | | | | | | 8 | | | | | | | | | | |
| 4 | | | | | | | | | 9 | | | | | | | | | |
| 5 | | | | | | | | | | 10| | | | | | | | |
| 6 | | | | | | | | | | | 11| | | | | | | |
| 1 | | | | | | | | | | | | 12| | | | | | |
| 2 | | | | | | | | | | | | | | | | | | |
| 3 | | | | | | | | | | | | | 13| | | | | |
| 4 | | | | | | | | | | | | | | 14| | | | |
| 5 | | | | | | | | | | | | | | | 15| | | |
| 6 | | | | | | | | | | | | | | | | 16| | |
| 1 | | | | | | | | | | | | | | | | | 17| |
| 2 | | | | | | | | | | | | | | | | | | 18|
or, equivalently but without repeating the rows/columns:
| | A | B | C |
|---|----|----|----|
| 1 | 1 | 17 | 15 |
| 2 | 4 | 2 | 18 |
| 3 | 7 | 5 | 3 |
| 4 | 10 | 8 | 6 |
| 5 | 13 | 11 | 9 |
| 6 | 16 | 14 | 12 |
I imagine there are other solutions too, but that's the one I found easiest to think about. But I've been banging my head against the wall trying to figure out how to express it generically—it's a convenient thing that the cardinality of the two sets are multiples of each other, but I want the algorithm to do The Right Thing for sets of, say, size 5 and 7. Or size 12 and 69 (that's a real example!).
Are there any established algorithms for this? I keep getting distracted thinking of how rational numbers are mapped onto the set of natural numbers (to prove that they're countable), but the path it takes through ℕ×ℕ doesn't work for this case.
It so happens the application is being written in Ruby, but I don't care about the language. Pseudocode, Ruby, Python, Java, Clojure, Javascript, CL, a paragraph in English—choose your favorite.
Proof-of-concept solution in Python (soon to be ported to Ruby and hooked up with Rails):
import sys
letters = sys.argv[1]
MAX_NUM = 6
letter_pos = 0
for i in xrange(MAX_NUM):
for j in xrange(len(letters)):
num = ((i + j) % MAX_NUM) + 1
symbol = letters[letter_pos % len(letters)]
print "[%s %s]"%(symbol, num)
letter_pos += 1
String letters = "ABC";
int MAX_NUM = 6;
int letterPos = 0;
for (int i=0; i < MAX_NUM; ++i) {
for (int j=0; j < MAX_NUM; ++j) {
int num = ((i + j) % MAX_NUM) + 1;
char symbol = letters.charAt(letterPos % letters.length);
String output = symbol + "" + num;
++letterPos;
}
}
What about using something fractal/recursive? This implementation divides a rectangular range into four quadrants then yields points from each quadrant. This means that neighboring points in the sequence differ at least by quadrant.
#python3
import sys
import itertools
def interleave(*iters):
for elements in itertools.zip_longest(*iters):
for element in elements:
if element != None:
yield element
def scramblerange(begin, end):
width = end - begin
if width == 1:
yield begin
else:
first = scramblerange(begin, int(begin + width/2))
second = scramblerange(int(begin + width/2), end)
yield from interleave(first, second)
def scramblerectrange(top=0, left=0, bottom=1, right=1, width=None, height=None):
if width != None and height != None:
yield from scramblerectrange(bottom=height, right=width)
raise StopIteration
if right - left == 1:
if bottom - top == 1:
yield (left, top)
else:
for y in scramblerange(top, bottom):
yield (left, y)
else:
if bottom - top == 1:
for x in scramblerange(left, right):
yield (x, top)
else:
halfx = int(left + (right - left)/2)
halfy = int(top + (bottom - top)/2)
quadrants = [
scramblerectrange(top=top, left=left, bottom=halfy, right=halfx),
reversed(list(scramblerectrange(top=top, left=halfx, bottom=halfy, right=right))),
scramblerectrange(top=halfy, left=left, bottom=bottom, right=halfx),
reversed(list(scramblerectrange(top=halfy, left=halfx, bottom=bottom, right=right)))
]
yield from interleave(*quadrants)
if __name__ == '__main__':
letters = 'abcdefghijklmnopqrstuvwxyz'
output = []
indices = dict()
for i, pt in enumerate(scramblerectrange(width=11, height=5)):
indices[pt] = i
x, y = pt
output.append(letters[x] + str(y))
table = [[indices[x,y] for x in range(11)] for y in range(5)]
print(', '.join(output))
print()
pad = lambda i: ' ' * (2 - len(str(i))) + str(i)
header = ' |' + ' '.join(map(pad, letters[:11]))
print(header)
print('-' * len(header))
for y, row in enumerate(table):
print(pad(y)+'|', ' '.join(map(pad, row)))
Outputs:
a0, i1, a2, i3, e0, h1, e2, g4, a1, i0, a3, k3, e1,
h0, d4, g3, b0, j1, b2, i4, d0, g1, d2, h4, b1, j0,
b3, k4, d1, g0, d3, f4, c0, k1, c2, i2, c1, f1, a4,
h2, k0, e4, j3, f0, b4, h3, c4, j2, e3, g2, c3, j4,
f3, k2, f2
| a b c d e f g h i j k
-----------------------------------
0| 0 16 32 20 4 43 29 13 9 25 40
1| 8 24 36 28 12 37 21 5 1 17 33
2| 2 18 34 22 6 54 49 39 35 47 53
3| 10 26 50 30 48 52 15 45 3 42 11
4| 38 44 46 14 41 31 7 23 19 51 27
If your sets X and Y are sizes m and n, and Xi is the index of the element from X that's in the ith pair in your Cartesian product (and similar for Y), then
Xi = i mod n;
Yi = (i mod n + i div n) mod m;
You could get your diagonals a little more spread out by filling out your matrix like this:
for (int i = 0; i < m*n; i++) {
int xi = i % n;
int yi = i % m;
while (matrix[yi][xi] != 0) {
yi = (yi+1) % m;
}
matrix[yi][xi] = i+1;
}

Slow aggregation on big neo4j graph

Configuration:
Windows 8.1
neo4j-enterprise-2.2.0-M03
cache type: hpc
8Gb RAM
6Gb for JVM Heap (wrapper.java.initmemory=6144 wrapper.java.maxmemory=6144)
5Gb out of 6Gb of JVM Heap for mapped memory (dbms.pagecache.memory=5G)
Model:
Model represents how users navigate through website.
27 522 896 nodes (394Mb)
111 294 796 relationships (3609Mb)
33 906 363 properties (1326Mb)
293 (:Page) nodes
27522603 (:PageView) nodes
0 (:User) nodes (not load yet)
each (:PageView) node connected with (:Page) node
each (:PageView) node connected with next (:PageView) node
each (:PageView) node connected with (:User) node (not yet)
Query
match (:Page {Name:'#########.aspx'})<-[:At]-(:PageView)-[:Next]->(:PageView)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
195363 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(All)(0)
|
+Filter(1)
|
+Expand(All)(1)
|
+Filter(2)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID105, FRESHID110, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID105, FRESHID110 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID105, FRESHID110 | |
| Projection(1) | 776404 | 35941815 | 71883630 | FRESHID105, p | |
| Filter(0) | 776404 | 35941815 | 35941815 | p | (NOT(anon[38] == anon[78]) AND hasLabel(p:Page)) |
| Expand(All)(0) | 776404 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Filter(1) | 384001 | 13345621 | 13345621 | | hasLabel(anon[67]:PageView) |
| Expand(All)(1) | 384001 | 13345621 | 19478500 | | ()-[:Next]->() |
| Filter(2) | 189923 | 6132879 | 6132879 | | hasLabel(anon[46]:PageView) |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
Total database accesses: 202202762
Query without unnecessary labels
match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
166751 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID82, FRESHID87, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID82, FRESHID87 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID82, FRESHID87 | |
| Projection(1) | 776388 | 35941815 | 71883630 | FRESHID82, p | |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
Total database accesses: 146782447
Message.log
Question
How can I perform this query much faster? (more RAM, refactor query, distributed cache, use another language/shell/method, ...)
UPD:
Profile info for last query in answer
neo4j-sh (?)$ profile match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count;
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "OutgoingDocumentsList.aspx" | 5172680 |
| "DocumentPreview.aspx" | 3846455 |
| "Dashboard.aspx" | 3579022 |
| "ActualTasks.aspx" | 3051043 |
| "DocumentFillMissingRequisites.aspx" | 1713004 |
| "EditDocument.aspx" | 1373928 |
| "PaymentsList.aspx" | 1338063 |
| "Login.aspx" | 1285447 |
| "ReportingRequisites.aspx" | 884077 |
| "ContractorInfo.aspx" | 759665 |
+------------------------------------------------+
10 rows
151328 ms
Compiler CYPHER 2.2
Planner COST
Projection
|
+Top
|
+EagerAggregation
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Projection | 881 | 10 | 20 | count, p, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | count, p | { AUTOINT1}; count |
| EagerAggregation | 881 | 173 | 0 | count, p | p |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+------------------+---------------------------+
Total database accesses: 74898837
As I mentioned before, in your other question, if you can write a Java based server extension you can do it pretty easily.
// initialize counters
Map<Node,AtomicInteger> pageCounts = new HashMap<>(300);
for (Node page : graphDb.findNode(Page)) pageCounts.put(page,new AtomicInteger());
// find start page
Label Page = DynamicLabel.label("Page");
Node page = graphDB.findNode(Page,"Name",pageName).iterator().next();
// follow page-view relationships
for (Relationship at : page.getRelationships(At, INCOMING)) {
// follow singular next relationship
Relationship at2 = at.getStartNode().getSingleRelationship(Next,OUTGOING);
if (at2==null) continue;
// follow singular page-view relationship to end-page
Node page2 = at2.getSingleRelationship(At,OUTGOING).getEndNode();
// increment counter
pageCounts.get(page2).incrementAndGet();
}
// sort pages by count descending
List pages = new ArrayList(pageCounts.entrySet())
Collections.sort(pages,new Comparator<Map.Entry<Node,Integer>>() {
public int compare(Map.Entry<Node,Integer> e1, Map.Entry<Node,Integer> e2) {
return - Integer.compare(e1.getValue(),e2.getValue());
}
});
// return top 10
return pages.subList(0,10);
For Cypher I would try something like this:
match (:Page {Name:'#########.aspx'})<-[:At]-(pv:PageView)
WITH distinct pv
MATCH (pv)-[:Next]->(pv2:PageView)
with distinct pv2
match (pv2)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Update
I wrote a test for it and ran it on my bigger linux machine, the results there are much more sensible: between 1.6s in Java and 5s max in Cypher.
Here is the code and the results: https://gist.github.com/jexp/94f75ddb849f8c41c97c
In Cypher:
-------------------
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
+-------------------+
| p.Name | count |
+-------------------+
| "Page169" | 975 |
| "Page125" | 959 |
| "Page106" | 955 |
| "Page274" | 951 |
| "Page176" | 947 |
| "Page241" | 944 |
| "Page30" | 942 |
| "Page44" | 938 |
| "Page1" | 938 |
| "Page118" | 938 |
+-------------------+
10 rows
in 3212 ms
[Compiler CYPHER 2.2
Planner COST
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Top | 488 | 10 | 0 | FRESHID71, FRESHID76 | { AUTOINT1}; |
| EagerAggregation | 488 | 300 | 0 | FRESHID71, FRESHID76 | |
| Projection | 238460 | 264828 | 529656 | FRESHID71, p | |
| Filter | 238460 | 264828 | 0 | p | NOT(anon[29] == anon[51]) |
| Expand(All)(0) | 238460 | 264828 | 529656 | p | ()-[:At]->(p) |
| Expand(All)(1) | 238460 | 264828 | 778522 | | ()-[:Next]->() |
| Expand(All)(2) | 476922 | 513694 | 513695 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
Total database accesses: 2351530]
And in Java:
-------------------
Java took 1618 ms
Node[169]=975
Node[125]=959
Node[106]=955
Node[274]=951
Node[176]=947
Node[241]=944
Node[30]=942
Node[1]=938
Node[44]=938
Node[118]=938
Something you can also do to speed up your Cypher query, is to only aggregate on the nodes, and only return the page.Name property for the last 10 rows, much faster.
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count

Is it normal query performance?

I have next graph model:
(:PaveView {Number:int, Page:string}), (:Page {Name:string})
(:PageView)-[:At]->(:Page)
(:PageView)-[:Next]->(:PageView)
Schema:
Indexes
ON :Page(Name) ONLINE (for uniqueness constraint)
ON :PageView(Page) ONLINE
ON :PageView(Revision) ONLINE (for uniqueness constraint)
Constraints
ON (pageview:PageView) ASSERT pageview.Number IS UNIQUE
ON (page:Page) ASSERT page.Name IS UNIQUE
I want to do something similar to this post
I have tried to find popular paths without loops of this structure:
(:PageView)-[:Next*2]->(:PageView)
That my tries:
1. Nicole White's method from post
MATCH p = (:PageView)-[:Next*2]->(:PageView)
WITH p, EXTRACT(v IN NODES(p) | v.Page) AS pages
UNWIND pages AS views
WITH p, COUNT(DISTINCT views) AS distinct_views
WHERE distinct_views = LENGTH(NODES(p))
RETURN EXTRACT(v in NODES(p) | v.Page), count(p)
ORDER BY count(p) DESC
LIMIT 10;
profile output:
10 rows
177270 ms
Compiler CYPHER 2.2-rule
ColumnFilter(0)
|
+Extract(0)
|
+ColumnFilter(1)
|
+Top
|
+EagerAggregation(0)
|
+Extract(1)
|
+ColumnFilter(2)
|
+Filter(0)
|
+Extract(2)
|
+ColumnFilter(3)
|
+EagerAggregation(1)
|
+UNWIND
|
+ColumnFilter(4)
|
+Extract(3)
|
+ExtractPath
|
+Filter(1)
|
+TraversalMatcher
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
| ColumnFilter(0) | 10 | 0 | EXTRACT(v in NODES(p) | v.Page), count(p) | keep columns EXTRACT(v in NODES(p) | v.Page), count(p) |
| Extract(0) | 10 | 0 | FRESHID225, FRESHID258, EXTRACT(v in NODES(p) | v.Page), count(p) | EXTRACT(v in NODES(p) | v.Page), count(p) |
| ColumnFilter(1) | 10 | 0 | FRESHID225, FRESHID258 | keep columns , |
| Top | 10 | 0 | FRESHID225, INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | { AUTOINT0}; Cached( INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 of type Integer) |
| EagerAggregation(0) | 212828 | 0 | FRESHID225, INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | |
| Extract(1) | 1749120 | 10494720 | FRESHID225, distinct_views, p | |
| ColumnFilter(2) | 1749120 | 0 | distinct_views, p | keep columns distinct_views, p |
| Filter(0) | 1749120 | 0 | FRESHID196, distinct_views, p | CoercedPredicate(anon[196]) |
| Extract(2) | 2115766 | 0 | FRESHID196, distinct_views, p | |
| ColumnFilter(3) | 2115766 | 0 | distinct_views, p | keep columns p, distinct_views |
| EagerAggregation(1) | 2115766 | 0 | INTERNAL_AGGREGATEb0939c81-a40c-4012-afd6-4852b17cf2e4, p | p |
| UNWIND | 6347298 | 0 | p, pages, views | |
| ColumnFilter(4) | 2115766 | 0 | p, pages | keep columns p, pages |
| Extract(3) | 2115766 | 12694596 | p, pages | pages |
| ExtractPath | 2115766 | 0 | p | |
| Filter(1) | 2115766 | 2115766 | | hasLabel(anon[34]:PageView(0)) |
| TraversalMatcher | 2115766 | 16926150 | | , , , |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
Total database accesses: 42231232
2.
match (p1:PageView)-[:Next]->(p2:PageView)-[:Next]->(p3:PageView)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;
profile output:
10 rows
28660 ms
Compiler CYPHER 2.2-cost
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(0)
|
+Filter(1)
|
+Expand(1)
|
+NodeByLabelScan
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection(0) | 1241 | 10 | 0 | FRESHID146, [p1.Page,p2.Page,p3.Page], count | [p1.Page,p2.Page,p3.Page], count |
| Top | 1241 | 10 | 0 | FRESHID146, count | { AUTOINT0}; count |
| EagerAggregation | 1241 | 212828 | 0 | FRESHID146, count | |
| Projection(1) | 1542393 | 1749120 | 10494720 | FRESHID146, p1, p2, p3 | |
| Filter(0) | 1542393 | 1749120 | 17872173 | p1, p2, p3 | (((hasLabel(p3:PageView(0)) AND NOT(Property(p1,Page(3)) == Property(p3,Page(3)))) AND NOT(anon[20] == anon[43])) AND NOT(Property(p2,Page(3)) == Property(p3,Page(3)))) |
| Expand(0) | 1904189 | 1985797 | 3971596 | p1, p2, p3 | (p2)-[:Next]->(p3) |
| Filter(1) | 1904191 | 1985799 | 10578840 | p1, p2 | (NOT(Property(p1,Page(3)) == Property(p2,Page(3))) AND hasLabel(p2:PageView(0))) |
| Expand(1) | 2115767 | 2115768 | 4231538 | p1, p2 | (p1)-[:Next]->(p2) |
| NodeByLabelScan | 2115770 | 2115770 | 2115771 | p1 | :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3. (With loops!? And I don't know why! I suggested that if identifiers are different then nodes are different)
match (pv1:PageView)-[:Next]->(pv2:PageView)-[:Next]->(pv3:PageView),
(pv1)-[:At]->(p1),(pv2)-[:At]->(p2),(pv3)-[:At]->(p3)
RETURN [p1.Name,p2.Name,p3.Name], count(*) as count
ORDER BY count DESC
LIMIT 10;
profile output:
10 rows
27678 ms
Compiler CYPHER 2.2-cost
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(0)
|
+Filter(1)
|
+Expand(1)
|
+Filter(2)
|
+Expand(2)
|
+Filter(3)
|
+Expand(3)
|
+Expand(4)
|
+NodeByLabelScan
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
| Projection(0) | 1454 | 10 | 0 | FRESHID139, [p1.Name,p2.Name,p3.Name], count | [p1.Name,p2.Name,p3.Name], count |
| Top | 1454 | 10 | 0 | FRESHID139, count | { AUTOINT0}; count |
| EagerAggregation | 1454 | 223557 | 0 | FRESHID139, count | |
| Projection(1) | 2115760 | 2115764 | 12694584 | FRESHID139, p1, p2, p3, pv1, pv2, pv3 | |
| Filter(0) | 2115760 | 2115764 | 0 | p1, p2, p3, pv1, pv2, pv3 | (NOT(anon[116] == anon[80]) AND NOT(anon[80] == anon[98])) |
| Expand(0) | 2115760 | 2115764 | 4231530 | p1, p2, p3, pv1, pv2, pv3 | (pv1)-[:At]->(p1) |
| Filter(1) | 2115762 | 2115766 | 2115766 | p2, p3, pv1, pv2, pv3 | (hasLabel(pv1:PageView(0)) AND NOT(anon[21] == anon[45])) |
| Expand(1) | 2115762 | 2115766 | 4231532 | p2, p3, pv1, pv2, pv3 | (pv2)<-[:Next]-(pv1) |
| Filter(2) | 2115764 | 2115766 | 0 | p2, p3, pv2, pv3 | NOT(anon[116] == anon[98]) |
| Expand(2) | 2115764 | 2115766 | 4231534 | p2, p3, pv2, pv3 | (pv2)-[:At]->(p2) |
| Filter(3) | 2115766 | 2115768 | 2115768 | p3, pv2, pv3 | hasLabel(pv2:PageView(0)) |
| Expand(3) | 2115765 | 2115768 | 4231536 | p3, pv2, pv3 | (pv3)<-[:Next]-(pv2) |
| Expand(4) | 2115767 | 2115768 | 4231538 | p3, pv3 | (pv3)-[:At]->(p3) |
| NodeByLabelScan | 2115770 | 2115770 | 2115771 | pv3 | :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
System info:
windows 8.1
250G ssd
neo4j enterprise 2.2.0-M02
cache: hpc
ram: 8G
jvm heap size: 4G
memory mapping: 50%
149 (:Page) nodes
2115770 (:PageView) nodes
Why even the fastest of this three methods is so slow? (I guess that all my data is in RAM)
What is the best way to filter paths with loops?
By specifying labels for all identifiers, you force Cypher to open the node headers and filter all labels in it.
This is where the names of your relationships are important. Relationships are made to drive you into the graph, for performance there would be no need to specify the labels, so if your sure all nodes along the Path have the Pageview label, just omit it except for the start of your query :
match (p1:PageView)-[:Next]->(p2)-[:Next]->(p3)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;
I posted some query plan results in this answer related to your question : Neo4j: label vs. indexed property?

Should we use _In_ instead of __in?

I read sal.h, which is in VS2010, today.
I was a little surprised.
// This section contains the deprecated annotations
|------------|------------|---------|--------|----------|----------|---------------|
| Level | Usage | Size | Output | NullTerm | Optional | Parameters |
|------------|------------|---------|--------|----------|----------|---------------|
| <> | <> | <> | <> | _z | <> | <> |
| _deref | _in | _ecount | _full | _nz | _opt | (size) |
| _deref_opt | _out | _bcount | _part | | | (size,length) |
| | _inout | | | | | |
| | | | | | | |
|------------|------------|---------|--------|----------|----------|---------------|
I always used these annotations.
I can't believe that they are deprecated. Is it true?
If so, why?
Should we use following annotations from now on? -It's not familiar to me :(
|--------------|----------|----------------|-----------------------------|
| Usage | Nullness | ZeroTerminated | Extent |
|--------------|----------|----------------|-----------------------------|
| _In_ | <> | <> | <> |
| _Out_ | opt_ | z_ | [byte]cap_[c_|x_]( size ) |
| _Inout_ | | | [byte]count_[c_|x_]( size ) |
| _Deref_out_ | | | ptrdiff_cap_( ptr ) |
|--------------| | | ptrdiff_count_( ptr ) |
| _Ret_ | | | |
| _Deref_ret_ | | | |
|--------------| | | |
| _Pre_ | | | |
| _Post_ | | | |
| _Deref_pre_ | | | |
| _Deref_post_ | | | |
|--------------|----------|----------------|-----------------------------|
By the way, SAL tag doesn't exist in SO.
Please make it, if you could.
It seems that you should getting used to the new "attribute" SAL format, see the comment in red in the middle of this post:
Link

Resources