Understanding CockroachDB Replicas - cockroachdb

I"m having hard time understanding why the CockroachDB admin console my single node setup has 37 replicas. Based on what I've read
CockroachDB replicates each range (3 times by default) and stores each replica on a different node.
This is directly from the docs https://www.cockroachlabs.com/docs/v20.2/architecture/overview#glossary
Running the command \l I see
database_name
-----------------
defaultdb
postgres
system
test2
(4 rows)
Running the command SHOW ALL ZONE CONFIGURATIONS; I get
target | raw_config_sql
---------------------------------------------------+------------------------------------------------------------------------------
RANGE default | ALTER RANGE default CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 90000,
| num_replicas = 3,
| constraints = '[]',
| lease_preferences = '[]'
DATABASE system | ALTER DATABASE system CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 90000,
| num_replicas = 5,
| constraints = '[]',
| lease_preferences = '[]'
RANGE meta | ALTER RANGE meta CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 3600,
| num_replicas = 5,
| constraints = '[]',
| lease_preferences = '[]'
RANGE system | ALTER RANGE system CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 90000,
| num_replicas = 5,
| constraints = '[]',
| lease_preferences = '[]'
RANGE liveness | ALTER RANGE liveness CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 600,
| num_replicas = 5,
| constraints = '[]',
| lease_preferences = '[]'
TABLE system.public.replication_constraint_stats | ALTER TABLE system.public.replication_constraint_stats CONFIGURE ZONE USING
| gc.ttlseconds = 600,
| constraints = '[]',
| lease_preferences = '[]'
TABLE system.public.replication_stats | ALTER TABLE system.public.replication_stats CONFIGURE ZONE USING
| gc.ttlseconds = 600,
| constraints = '[]',
| lease_preferences = '[]'
(7 rows)
I'm not sure where the 37 comes from, should it not just be 3 since I only created the database test2? Or If it replicates even the default databases it will still only be 3*4 = 12? None of my databases exceed 512M so at most it should only take 1 range for each database. I must be misunderstanding something, can someone give me a hand? Thank you.

Cockroach has a relatively large number of internal system ranges. It maintains internal system tables as well as other bootstrapping metadata. Cockroach splits ranges on table boundaries as well as some other hard-coded split points. You can discover the set of ranges by running a query like:
> SELECT start_pretty, end_pretty, database_name, table_name, replicas FROM crdb_internal.ranges_no_leases;
start_pretty | end_pretty | database_name | table_name | replicas
--------------------------------+-------------------------------+---------------+---------------------------------+-----------
/Min | /System/NodeLiveness | | | {1,2,3}
/System/NodeLiveness | /System/NodeLivenessMax | | | {1,2,3}
/System/NodeLivenessMax | /System/tsd | | | {1,2,3}
/System/tsd | /System/"tse" | | | {1,2,3}
/System/"tse" | /Table/SystemConfigSpan/Start | | | {1,2,3}
/Table/SystemConfigSpan/Start | /Table/11 | | | {1,2,3}
/Table/11 | /Table/12 | system | lease | {1,2,3}
/Table/12 | /Table/13 | system | eventlog | {1,2,3}
/Table/13 | /Table/14 | system | rangelog | {1,2,3}
/Table/14 | /Table/15 | system | ui | {1,2,3}
/Table/15 | /Table/16 | system | jobs | {1,2,3}
/Table/16 | /Table/17 | | | {1,2,3}
/Table/17 | /Table/18 | | | {1,2,3}
/Table/18 | /Table/19 | | | {1}
/Table/19 | /Table/20 | system | web_sessions | {1,2,3}
/Table/20 | /Table/21 | system | table_statistics | {1,2,3}
/Table/21 | /Table/22 | system | locations | {1,2,3}
/Table/22 | /Table/23 | | | {1,2,3}
/Table/23 | /Table/24 | system | role_members | {1,2,3}
/Table/24 | /Table/25 | system | comments | {1,2,3}
/Table/25 | /Table/26 | system | replication_constraint_stats | {1,2,3}
/Table/26 | /Table/27 | system | replication_critical_localities | {1,2,3}
/Table/27 | /Table/28 | system | replication_stats | {1,2,3}
/Table/28 | /Table/29 | system | reports_meta | {1}
/Table/29 | /NamespaceTable/30 | | | {1,2,3}
/NamespaceTable/30 | /NamespaceTable/Max | system | namespace2 | {1,2,3}
/NamespaceTable/Max | /Table/32 | system | protected_ts_meta | {1,2,3}
/Table/32 | /Table/33 | system | protected_ts_records | {1,2,3}
/Table/33 | /Table/34 | system | role_options | {1,2,3}
/Table/34 | /Table/35 | system | statement_bundle_chunks | {1,2,3}
/Table/35 | /Table/36 | system | statement_diagnostics_requests | {1,2,3}
/Table/36 | /Table/37 | system | statement_diagnostics | {1}
/Table/37 | /Table/38 | system | scheduled_jobs | {1,2,3}
/Table/38 | /Table/39 | | | {1,2,3}
/Table/39 | /Max | system | sqlliveness | {1,2,3}
(35 rows)
Each entry in the replicas field represents a replica. Hope this gives you some insights into what ranges exist.

Related

NiFi CaptureChangeMySQL converts varchar columns to nulls

I have problem with Apache NiFi 1.12.1. For some unknown for me reason CaptureChangeMySQL returns many nulls. Basically, only columns which are int, return correct values. I'm new in a matter of using NiFi so I might miss some obvious thing in configuration.
I have following table:
create table inventory.abc
(
id int auto_increment
primary key,
first_name varchar(100) not null,
last_name varchar(100) not null,
age int not null
);
Processor config:
Bin logs settings:
mysql> show variables like '%bin%';
+--------------------------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------------------------+--------------------------------+
| bind_address | * |
| binlog_cache_size | 32768 |
| binlog_checksum | CRC32 |
| binlog_direct_non_transactional_updates | OFF |
| binlog_error_action | ABORT_SERVER |
| binlog_format | ROW |
| binlog_group_commit_sync_delay | 0 |
| binlog_group_commit_sync_no_delay_count | 0 |
| binlog_gtid_simple_recovery | ON |
| binlog_max_flush_queue_time | 0 |
| binlog_order_commits | ON |
| binlog_row_image | FULL |
| binlog_rows_query_log_events | OFF |
| binlog_stmt_cache_size | 32768 |
| binlog_transaction_dependency_history_size | 25000 |
| binlog_transaction_dependency_tracking | COMMIT_ORDER |
| innodb_api_enable_binlog | OFF |
| innodb_locks_unsafe_for_binlog | OFF |
| log_bin | ON |
| log_bin_basename | /var/lib/mysql/mysql-bin |
| log_bin_index | /var/lib/mysql/mysql-bin.index |
| log_bin_trust_function_creators | OFF |
| log_bin_use_v1_row_events | OFF |
| log_statements_unsafe_for_binlog | ON |
| max_binlog_cache_size | 18446744073709547520 |
| max_binlog_size | 1073741824 |
| max_binlog_stmt_cache_size | 18446744073709547520 |
| sql_log_bin | ON |
| sync_binlog | 1 |
+--------------------------------------------+--------------------------------+
29 rows in set (0.00 sec)
And I get results like this:
Any idea why I get so many nulls in output? I thought it might be related to Distributed Map Cache Client but since this option is not mandatory I don't think that's a problem.

DAX Query with multiple filters in powerbi

I have two tables 'locations' and 'markets', where, a many to many relationship exists between these two tables on the column 'market_id'. A report level filter has been applied on the column 'entity' from 'locations' table. Now, I'm supposed to distinctly count the 'location_id' from 'markets' table where 'active=TRUE'. How can I write a DAX query such that the distinct count of location_id dynamically changes with respect to the selection made in the report level filter?
Below is an example of the tables:
locations:
| location_id | market_id | entity | active |
|-------------|-----------|--------|--------|
| 1 | 10 | nyc | true |
| 2 | 20 | alaska | true |
| 2 | 20 | alaska | true |
| 2 | 30 | miami | false |
| 3 | 40 | dallas | true |
markets:
| location_id | market_id | active |
|-------------|-----------|--------|
| 2 | 20 | true |
| 2 | 20 | true |
| 5 | 20 | true |
| 6 | 20 | false |
I'm fairly new to powerbi. Any help will be appreciated.
Here you go:
DistinctLocations = CALCULATE(DISTINCTCOUNT(markets[location_id]), markets[active] = TRUE())

Mariadb 2 explains plan : with Using join buffer and without

I run same query in 2 environnements with huge performance différence : 0.015 sec vs 25sec.
Exlain plan :
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | company1_ | const | PRIMARY | PRIMARY | 152 | const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | user2_ | ref | PRIMARY | PRIMARY | 152 | const | 1032 | 100.00 | Using where |
| 1 | SIMPLE | vacationpr5_ | eq_ref | PRIMARY | PRIMARY | 304 | user2_.ID_COMPANY_VACATION_PROFILE,.user2_.ID_VACATION_PROFILE | 1 | 100.00 | Using index |
| 1 | SIMPLE | vacationac0_ | ref | PRIMARY,I_VACATION_ACCUMULATION_EA | PRIMARY | 304 | const,.user2_.ID_USER | 4 | 100.00 | Using where |
| 1 | SIMPLE | vacationty3_ | eq_ref | PRIMARY | PRIMARY | 304 | const,.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | vacationst6_ | eq_ref | PRIMARY | PRIMARY | 608 | user2_.ID_COMPANY_VACATION_PROFILE,.user2_.ID_VACATION_PROFILE,const,.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | translatio9_ | eq_ref | PRIMARY | PRIMARY | 919 | vacationty3_.ID_COMPANY_TRANSLATION,.vacationty3_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio10_ | eq_ref | PRIMARY, | PRIMARY | 951 | vacationty3_.ID_COMPANY_TRANSLATION,.vacationty3_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
| 1 | SIMPLE | listvalue4_ | ALL | NULL | NULL | NULL | NULL | 5284 | 100.00 | Using where |
| 1 | SIMPLE | translatio7_ | eq_ref | PRIMARY | PRIMARY | 919 | listvalue4_.ID_COMPANY_TRANSLATION,.listvalue4_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio8_ | eq_ref | PRIMARY | PRIMARY | 951 | listvalue4_.ID_COMPANY_TRANSLATION,.listvalue4_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------+------+----------+---------------------------------+
next explain plan :
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------------------+------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------------------+------+----------+-------------------------------------------------+
| 1 | SIMPLE | company1_ | const | PRIMARY | PRIMARY | 152 | const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | user2_ | ref | PRIMARY, | PRIMARY | 152 | const | 1050 | 100.00 | Using where |
| 1 | SIMPLE | vacationpr5_ | eq_ref | PRIMARY | PRIMARY | 304 | validation2.user2_.ID_COMPANY_VACATION_PROFILE,validation2.user2_.ID_VACATION_PROFILE | 1 | 100.00 | Using index |
| 1 | SIMPLE | vacationac0_ | ref | PRIMARY,I_VACATION_ACCUMULATION_EA | PRIMARY | 304 | const,validation2.user2_.ID_USER | 5 | 100.00 | Using where |
| 1 | SIMPLE | vacationty3_ | eq_ref | PRIMARY | PRIMARY | 304 | const,validation2.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | vacationst6_ | eq_ref | PRIMARY | PRIMARY | 608 | validation2.user2_.ID_COMPANY_VACATION_PROFILE,validation2.user2_.ID_VACATION_PROFILE,const,validation2.vacationac0_.ID_VACATION_TYPE | 1 | 100.00 | Using where |
| 1 | SIMPLE | translatio9_ | eq_ref | PRIMARY | PRIMARY | 919 | validation2.vacationty3_.ID_COMPANY_TRANSLATION,validation2.vacationty3_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio10_ | eq_ref | PRIMARY, | PRIMARY | 951 | validation2.vacationty3_.ID_COMPANY_TRANSLATION,validation2.vacationty3_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
| 1 | SIMPLE | listvalue4_ | ALL | NULL | NULL | NULL | NULL | 5282 | 100.00 | Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | translatio7_ | eq_ref | PRIMARY | PRIMARY | 919 | validation2.listvalue4_.ID_COMPANY_TRANSLATION,validation2.listvalue4_.ID_TRANSLATION | 1 | 100.00 | Using index |
| 1 | SIMPLE | descriptio8_ | eq_ref | PRIMARY, | PRIMARY | 951 | validation2.listvalue4_.ID_COMPANY_TRANSLATION,validation2.listvalue4_.ID_TRANSLATION,const | 1 | 100.00 | Using where |
+------+-------------+---------------+--------+------------------------------------+---------+---------+---------------------------------------------------------------------------------------------------------------------------------------+------+----------+-------------------------------------------------+
How I can force to use join buffer (flat, BNL join) the first environment is the production one and has more memory and CPU.
In first environment :
join_buffer_size............ 16777216
join_buffer_space_limit..... 2097152
In second environment :
join_buffer_size............ 262144
join_buffer_space_limit..... 2097152
Is there any link/ratio between join_buffer_size and join_buffer_space_limit?
We configure 16Mo on join_buffer_size because it is a mysqlTuner hint.
I set join_buffer_space_limit at 128Mo and it resolves performance issue.
So mysqlTuner doesn't give hint for this configuration key.
SET GLOBAL join_buffer_space_limit = 1024 * 1024 * 128;
It takes time (hour) to improve performances.
https://mariadb.com/kb/en/library/multi-range-read-optimization/

Indexes hints in a Subquery

I have a SQL statement that has performance issues.
Adding the following index and a SQL hint to use the index improves the performance 10 fold but I do not understand why.
BUS_ID is part of the primary key(T1.REF is the other part fo the key) and clustered index on the T1 table.
The T1 table has about 100,000 rows. BUS_ID has only 6 different values. Similarly the T1.STATUS column can only have a limited number of
possibilities and the majority of these(99%) will be the same value.
If I run the query without the hint(/*+ INDEX ( T1 T1_IDX1) NO_UNNEST */) it takes 5 seconds and with the hint it takes .5 seconds.
I don't understand how the index helps the subquery as T1.STATUS isn't used in any of the 'where' or 'join' clauses in the subquery.
What am I missing?
SELECT
/*+ NO_UNNEST */
t1.bus_id,
t1.ref,
t2.cust,
t3.cust_name,
t2.po_number,
t1.status_old,
t1.status,
t1.an_status
FROM t1
LEFT JOIN t2
ON t1.bus_id = t2.bus_id
AND t1.ref = t2.ref
JOIN t3
ON t3.cust = t2.cust
AND t3.bus_id = t2.bus_id
WHERE (
status IN ('A', 'B', 'C') AND status_old IN ('X', 'Y'))
AND EXISTS
( SELECT /*+ INDEX ( T1 T1_IDX1) NO_UNNEST */
*
FROM t1
WHERE ( EXISTS ( SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq IN ( '0', '2' )
AND t1.bus_id = t6.bus_id)
OR (EXISTS
(SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq = '1'
AND (an_status = 'Y'
OR
an_status = 'X')
AND t1.bus_id = t6.bus_id))
AND t2.ref = t1.ref))
AND USER IN ('FRED')
AND ( t2.status != '45'
AND t2.status != '20')
AND NOT EXISTS ( SELECT
/*+ NO_UNNEST */
*
FROM t4
WHERE EXISTS
(
SELECT
/*+ NO_UNNEST */
*
FROM t5
WHERE pd IN ( '1',
'0' )
AND appl = 'RYP'
AND appl_id IN ( 'RL100')
AND t4.id = t5.id)
AND t2.ref = p.ref
AND t2.bus_id = p.bus_id);
Edited to include Explain Plan and index.
Without Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 20 | 1 | 211 | 15534188 | 19 |
view | | 20 | 1 | 211 | 15534188 | 19 |
count | | | | | | |
view | | 20 | 1 | 198 | 15534188 | 19 |
sort | ORDER BY | 20 | 1 | 114 | 15534188 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 2 | 1 | 15 | 7433 | 2 |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
With Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 21 | 1 | 211 | 15549142 | 19 |
view | | 21 | 1 | 211 | 15549142 | 19 |
count | | | | | | |
view | | 21 | 1 | 198 | 15549142 | 19 |
sort | ORDER BY | 21 | 1 | 114 | 15549142 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 3 | 1 | 15 | 22387 | 2 |
index-t1_idx#1 | FULL SCAN | 2 |97k| | 14643 | |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
Table Index
CREATE INDEX T1_IDX#1 ON T1 (BUS_ID, STATUS)

Slow aggregation on big neo4j graph

Configuration:
Windows 8.1
neo4j-enterprise-2.2.0-M03
cache type: hpc
8Gb RAM
6Gb for JVM Heap (wrapper.java.initmemory=6144 wrapper.java.maxmemory=6144)
5Gb out of 6Gb of JVM Heap for mapped memory (dbms.pagecache.memory=5G)
Model:
Model represents how users navigate through website.
27 522 896 nodes (394Mb)
111 294 796 relationships (3609Mb)
33 906 363 properties (1326Mb)
293 (:Page) nodes
27522603 (:PageView) nodes
0 (:User) nodes (not load yet)
each (:PageView) node connected with (:Page) node
each (:PageView) node connected with next (:PageView) node
each (:PageView) node connected with (:User) node (not yet)
Query
match (:Page {Name:'#########.aspx'})<-[:At]-(:PageView)-[:Next]->(:PageView)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
195363 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(All)(0)
|
+Filter(1)
|
+Expand(All)(1)
|
+Filter(2)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID105, FRESHID110, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID105, FRESHID110 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID105, FRESHID110 | |
| Projection(1) | 776404 | 35941815 | 71883630 | FRESHID105, p | |
| Filter(0) | 776404 | 35941815 | 35941815 | p | (NOT(anon[38] == anon[78]) AND hasLabel(p:Page)) |
| Expand(All)(0) | 776404 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Filter(1) | 384001 | 13345621 | 13345621 | | hasLabel(anon[67]:PageView) |
| Expand(All)(1) | 384001 | 13345621 | 19478500 | | ()-[:Next]->() |
| Filter(2) | 189923 | 6132879 | 6132879 | | hasLabel(anon[46]:PageView) |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
Total database accesses: 202202762
Query without unnecessary labels
match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
Profile info:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
166751 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID82, FRESHID87, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID82, FRESHID87 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID82, FRESHID87 | |
| Projection(1) | 776388 | 35941815 | 71883630 | FRESHID82, p | |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
Total database accesses: 146782447
Message.log
Question
How can I perform this query much faster? (more RAM, refactor query, distributed cache, use another language/shell/method, ...)
UPD:
Profile info for last query in answer
neo4j-sh (?)$ profile match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count;
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "OutgoingDocumentsList.aspx" | 5172680 |
| "DocumentPreview.aspx" | 3846455 |
| "Dashboard.aspx" | 3579022 |
| "ActualTasks.aspx" | 3051043 |
| "DocumentFillMissingRequisites.aspx" | 1713004 |
| "EditDocument.aspx" | 1373928 |
| "PaymentsList.aspx" | 1338063 |
| "Login.aspx" | 1285447 |
| "ReportingRequisites.aspx" | 884077 |
| "ContractorInfo.aspx" | 759665 |
+------------------------------------------------+
10 rows
151328 ms
Compiler CYPHER 2.2
Planner COST
Projection
|
+Top
|
+EagerAggregation
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Projection | 881 | 10 | 20 | count, p, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | count, p | { AUTOINT1}; count |
| EagerAggregation | 881 | 173 | 0 | count, p | p |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+------------------+---------------------------+
Total database accesses: 74898837
As I mentioned before, in your other question, if you can write a Java based server extension you can do it pretty easily.
// initialize counters
Map<Node,AtomicInteger> pageCounts = new HashMap<>(300);
for (Node page : graphDb.findNode(Page)) pageCounts.put(page,new AtomicInteger());
// find start page
Label Page = DynamicLabel.label("Page");
Node page = graphDB.findNode(Page,"Name",pageName).iterator().next();
// follow page-view relationships
for (Relationship at : page.getRelationships(At, INCOMING)) {
// follow singular next relationship
Relationship at2 = at.getStartNode().getSingleRelationship(Next,OUTGOING);
if (at2==null) continue;
// follow singular page-view relationship to end-page
Node page2 = at2.getSingleRelationship(At,OUTGOING).getEndNode();
// increment counter
pageCounts.get(page2).incrementAndGet();
}
// sort pages by count descending
List pages = new ArrayList(pageCounts.entrySet())
Collections.sort(pages,new Comparator<Map.Entry<Node,Integer>>() {
public int compare(Map.Entry<Node,Integer> e1, Map.Entry<Node,Integer> e2) {
return - Integer.compare(e1.getValue(),e2.getValue());
}
});
// return top 10
return pages.subList(0,10);
For Cypher I would try something like this:
match (:Page {Name:'#########.aspx'})<-[:At]-(pv:PageView)
WITH distinct pv
MATCH (pv)-[:Next]->(pv2:PageView)
with distinct pv2
match (pv2)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
Update
I wrote a test for it and ran it on my bigger linux machine, the results there are much more sensible: between 1.6s in Java and 5s max in Cypher.
Here is the code and the results: https://gist.github.com/jexp/94f75ddb849f8c41c97c
In Cypher:
-------------------
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
+-------------------+
| p.Name | count |
+-------------------+
| "Page169" | 975 |
| "Page125" | 959 |
| "Page106" | 955 |
| "Page274" | 951 |
| "Page176" | 947 |
| "Page241" | 944 |
| "Page30" | 942 |
| "Page44" | 938 |
| "Page1" | 938 |
| "Page118" | 938 |
+-------------------+
10 rows
in 3212 ms
[Compiler CYPHER 2.2
Planner COST
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Top | 488 | 10 | 0 | FRESHID71, FRESHID76 | { AUTOINT1}; |
| EagerAggregation | 488 | 300 | 0 | FRESHID71, FRESHID76 | |
| Projection | 238460 | 264828 | 529656 | FRESHID71, p | |
| Filter | 238460 | 264828 | 0 | p | NOT(anon[29] == anon[51]) |
| Expand(All)(0) | 238460 | 264828 | 529656 | p | ()-[:At]->(p) |
| Expand(All)(1) | 238460 | 264828 | 778522 | | ()-[:Next]->() |
| Expand(All)(2) | 476922 | 513694 | 513695 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
Total database accesses: 2351530]
And in Java:
-------------------
Java took 1618 ms
Node[169]=975
Node[125]=959
Node[106]=955
Node[274]=951
Node[176]=947
Node[241]=944
Node[30]=942
Node[1]=938
Node[44]=938
Node[118]=938
Something you can also do to speed up your Cypher query, is to only aggregate on the nodes, and only return the page.Name property for the last 10 rows, much faster.
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count

Resources