Cypher performance - dbHits vs execution time - performance

After managing to get everything working with spatial plugin now I'm optimising my query to get the best out of it
At the beginning the query took like 10 seconds
After finding this question
Improve Neo4j Cypher Performance On Lengthy Match
I got the query to run in 3 seconds but still it had like 140k dbHits
I tried to break the query even into more matches and after that I got it to 50k dbHits but strangely the execution time actually was increased
Can someone explain me if there's a direct relation between dbHits and execution time?
Here is an example of my query(it's a short version of the query, the real query is a lot bigger)
START user=node(1), friend=node:userPosition("withinDistance:[0.003000,0.003000,10.000000]")
WHERE friend.birthYear >= 1952 AND friend.birthYear <= 1977
WITH DISTINCT friend, user
MATCH (user)-[:SPEAKS]->(:Language)<-[:SPEAKS]-(friend)
WITH friend, user
MATCH (user)-[:EXPECTS]->(expectedObligatoryAnswers:Answer {isOptional : 0}))<-[:ANSWERED]-(friend)
WITH friend, user,COUNT(DISTINCT expectedObligatoryAnswers.questionId) AS countExpectedObligatoryAnswers
WHERE (countExpectedObligatoryAnswers = 2)
RETURN friend
LIMIT 3;
In the bigger query this got 3 sec of execution time but 140k of dbHits
After breaking the first two matches
MATCH (user)-[:SPEAKS]->(lang:Language)
WITH user, friend, lang
MATCH (lang)<-[:SPEAKS]-(friend)
WITH friend, user
MATCH (user)-[:EXPECTS]->(expectedObligatoryAnswers:Answer {isOptional : 0}))
WITH friend, user, expectedObligatoryAnswers
MATCH (expectedObligatoryAnswers)<-[:ANSWERED]-(friend)
This takes like 1/3 of dbHits (50k) but the execution time is increased like 2 seconds
Is there an in-progress documentation for profiling cypher?
Thanks

Related

Improving query execution time

I am working with spring data mongo, I have around 2000 documents stored(would probably reach 10000 in the upcoming 2-3 months), I would like to extract them all, however the query takes around ~2.5 seconds, which is pretty bad in my opinion, I am using MongoRepository default - findAll()
Tried to increase the cursor batchsize to 500,1000,2000 without any much improvement(best result was 2.13 seconds).
Currently I'm using a workaround - I store the documents in a different collection which used for cache, extracting this data takes around 0.25 seconds, but I would like to figure out how to fix the original query execution time.
Would like the answer will return in less then 1 sec, less is even better.
Without knowing the exact details i cannot confirm you a method.
But for data selection queries "Indexing" will help you.
Please Try Indexing the DB.
https://docs.mongodb.com/manual/indexes/

elapsed_time_delta in dba_hist_sqlstats is not actual query run time?

I've been trying to explore DBA_HIST_SQLSTAT table. Encountered ambiguity with a column (lot more in fact) ELAPSED_TIME_DELTA. I ran a simple delete query and noted the time taken. But when I query the DBA_HIST_SQLSTAT and look at the ELAPSED_TIME_DELTA column (I know the units are ms) is showing different time than what I've captured manually. What all comes under ELAPSED_TIME_DELTA in DBA_HIST_SQLSTAT table ? Any explanation with example is much appreciated.
(Assuming you mean ELAPSED_TIME columns. There is no EXECUTION_TIME column in DBA_HIST_SQLSTAT).
The elapsed_time_delta is the difference between the elapsed_time_total of the prior snap vs the current snap.
The elapsed_time_total is the total time spent executing that query since it was brought into the library cache. That will not necessarily equal the "wall-clock" time of any single execution of that query, except possibly for the very 1st execution of the query by the 1st user -- and that only if you grabbed the snap_id after that 1st execution and before any subsequent executions.
That's hard to do and not always possible. Generally speaking, you cannot use DBA_HIST_SQLSTAT to tell how long Oracle spent running a particular execution of a particular query.
What you can tell is how long Oracle spent running that query on average -- by finding the latest snap_id of interest and dividing elapsed_time_total by nullif(executions_total,0).

How to get the network cost in one MariaDB query using JDBC?

When I query using HeidiSql, the console will give the info like this:
/* Affected rows: 0 Found rows: 2,632,206 Warnings: 0 Duration for 1 query: 0.008 sec. (+ 389.069 sec. network) */
I want to use JDBC to do the performance testing on our database.
So distinguish the network cost and the actual query cost is important in my case.
How to get the network cost in one MariaDB query using JDBC? Is it possible?
In HeidiSQL, I am defining the query duration as the time which mysql_real_query took to execute.
That "network" duration is the time which mysql_store_result takes afterwards.
See also:
https://mariadb.com/kb/en/mariadb/mysql_real_query/
https://mariadb.com/kb/en/mariadb/mysql_store_result/
I guess JDBC has similar methods as the C API, so I guess the above mentioned logic from HeidiSQL should be easy to adapt.

A map that sometimes run fast, and sometimes slow

I built up a map containing this logic:
SOURCES -> SORTER -> AGG(FIRST BY GROUP) -> 2 LOOKUPS -> FILTER -> TARGET
Now, when I manually run the query generated by the sources, adding the 2 lookups with a LEFT JOIN and sorting, the query takes about 30 seconds.
I ran the same map in my DEV environment to try to debug it , but suddenly it ran in 2 minutes(connected to the same connection as in the PRODUCTION , and the map is trunc/insert)
I looked up the history of this session, and its running time is between 6 minutes up to hour+ , with the same amount of data every day!
I've tried adding statistics/increasing the commit interval but nothing seems to help.
Any suggestions?
Thanks in advance.
First thing, the query from source (with lookups) return you data within 30 seconds doesn't mean you will get all data by 30 second. The SQL client tool shows only first 50 to 500 records. Extracting complete data set may need more time.
Now, i don't see many reasons for slowness. Here are my thoughts -
Did you find any pattern of slowness? Like during month end or month start etc.? All i can see is mainly source and lookup (if table) data may be reason of slowness. See, when a table size varies rapidly or table is not analyzed or table undergoes lot of delete/load operation, its cost varies and SQL becomes slower. Make sure stats are gathered periodically for lookup and source tables.
May be some other operations (that is running in parallel to your map) is eating up all your resources so its taking 1 hour to complete the map.
how much data it processes? In thousands, millions or billions? Depending on that you can re-arrange map like this source > source qual> lookup > filter > sorter + aggregator > target to improve performance.

neo4j count all data of one node very slowly

I have a Node named "Event" how I put all events performed by my web portal
Now I have 1 500 000 Events!
So when I count the number of Event, I do this query :
MATCH (e:Event) RETURN count(e) AS numberOfEvent
But it's extremly slow : 25 000 ms!
The same query, in a classical SGBD like Postgres, is executed in 200 ms!
Is it normal or my query isn't correctly written?
Regards
Olivier
Is that the first time you run the query? Can you run it again?
I think postgres stores the total count, while neo will actually load the data.
So you measure your disk speed for loading the data.
We will work on improving that by using our database statistics internally for this kind of queries.

Resources