Neo4j cypher query is really slow - performance

I have the following graph structure
4m nodes
23m properties
13m relationships
Java version
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
Neo4j version
neo4j-community-2.0.0-M03
Machine
Mac OS X 10.8.4
2.5 GHz Intel Core i5
8 GB 1600 MHz DDR3
Problem
I am doing some experiments with three queries. #1 is taking 16 seconds, #2 is taking 8 minutes and #3 is "crashing". Both #2 and #3 put all the available CPU cores in ~90% usage. I am using the web interface for evaluating those queries (and I will be using the REST API to integrate the app with neo4j)
I would like to know what is wrong with those queries and how I could optimise them. I am currently using the default settings.
Cypher Queries
Query #1 (Currently taking 16 seconds (after warm-up))
START root=node:source(id="2")
MATCH root-[]->movies<-[]-others
WITH COUNT(movies) as movie_count, others as others
RETURN others.id, movie_count
ORDER BY movie_count DESC
LIMIT 10
Query #2 (8 minutes)
START root=node:source(id="2")
MATCH
root-[]->stuff<-[]-others
WITH DISTINCT(others) as dothers
MATCH dothers-[]->different
RETURN different.id, COUNT(different) as count
ORDER BY count DESC
LIMIT 10
Query #3 (OutOfMemoryError - GC overhead limit exceeded)
START root=node:source(id="2")
MATCH root-[*1..1]->stuff<-[*1..1]-other-[*1..1]->different
WHERE stuff.id <> different.id
WITH COUNT(different) as different_count, different as different
RETURN different.id, different_count
ORDER BY different_count DESC
LIMIT 10

Disclaimer: This advice is for 1.8 and 1.9. If you're using 2.0 or 2.1, these comments may no longer be valid.
Query 1: Make your WITH your RETURN, and skip that extra step.
Query 2: Don't do distinct in WITH as you are now. Go as far as you can without doing distinct. This looks like a premature optimization in the query that makes it not be lazy and has to store many more intermediate results to calculate the WITH results.
Query 3: Don't do -[*1..1]->; that's the same as -[]-> or -->, but it uses a slower matcher for variable length paths when it really just needs adjacent nodes and can use a fast matcher. Make the WITH your RETURN and take out that extra pipe it needs to go through so it can be lazier (although the order by kind of makes it hard to be lazy). See if you can get it to complete without the order by.
If you need faster responses and can't squeeze it out of your queries with my recommendations, you may need to turn to the Java API until Cypher performance improvements in 2.x. The unmanaged extension method makes these easy to call from the REST interface.

When looking for performance please go with the latest stable version (1.9.x at timepoint when writing this answer) of Neo4j.
2.0.0.M03 is a milestone build and not yet optimized. So far the focus is on feature completeness with regards to the new concept of labels and label based indexing.

Related

CouchDB Erlang Replication Filter - Slower than Javascript

We have couchbase lite replicating from Couchdb 1.6.1 using replication filters. The problem is that as the number of documents increases (we have atleast 100k documents in a single database), the replication time becomes slower and slower as the filter has to go through each of the documents(and revisions) to ensure that it matches the replication filter.
We initially had our filter written in Javascript and it was suggested in many places to replace that with Erlang because we can skip the time taken for the JSON serialization/de-serialization that has to be happen in the case of Javascript and also the sandboxing. Writing filters in erlang will bypass all this because its run in the same VM.
With all this and rewriting our filter in Erlang, we find that Erlang replication filters are slower than that of Javascript by an order of 2, which is very strange. Ive been reading Erlang and trying out based on what Ive read, so not very experienced by any means. Our filters are relatively straightforward, so Im struggling to understand why its so slow.
The filters take a query string that contains a list of ids. It then goes through the list of documents that match that id in a few different fields and returns those documents. For e.g. a Customer may have contacts, jobs, estimates etc, when the query contains ids of customers, it brings them all their jobs,estimates,contacts etc.
Im attaching the Javascript and Erlang versions of our filter
Javascript version:http://pastebin.com/c7AqstWy
Erlang version: http://pastebin.com/fta9JShM and http://pastebin.com/mseYiUaR
(The second link was another attempt to see if it will make it faster because it doesnt have to go through the case statement at all).
Both the versions using Erlang were 1.8 to 2 times slower than the Javascript ones.
Can someone please take a look and highlight why the Erlang filters could be slower ?

Performance decays exponentially when inserting bulk data into grails app

We need to seed an application with 3 million entities before running performance tests.
The 3 million entities should be loaded through the application to simulate 3 years of real data.
We are inserting 1-5000 entities at a time. In the beginning response times are very good. But after a while they decay exponentially.
We use at groovy script to hit a URL to start each round of insertions.
Restarting the application resets the response time - i.e. fixes the problem temporally.
Reruns of the script, without restarting the app, have no effect.
We use the following to enhance performance
1) Cleanup GORM after each 100 insertions:
def session = sessionFactory.currentSession
session.flush()
session.clear()
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
(old Ted Naleid trick: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql)
2) We use GPars for parallel insertions:
GParsPool.withPool {
(0..<1000).eachParallel {
def entity = new Entity(...)
insertionService.insert(entity)
}
}
Notes
When looking at the log output, I've noticed that the processing time for each entity are the same, but the system seems to pause longer and longer between each iteration.
The exact number of entities inserted are not important, just around 3 mill, so if some fail we can ignore it.
Tuning the number of entities at a time have little or no effect.
Help
I'm really hoping somebody have a good idea on how to fix the problem.
Environment
Grails: 2.4.2 (GRAILS_OPTS=-Xmx2G -Xms512m -XX:MaxPermSize=512m)
Java: 1.7.0_55
MBP: OS X 10.9.5 (2,6 GHz Intel Core i7, 16 GB 1600 MHz DDR3)
The pausing would make me think it's the JVM doing garbage collection. Have you used a profiler such as VisualVM to see what time is being spent doing garbage collection? Typically this will be the best approach to understanding what is happening with your application within the JVM.
Also, it's far better to load the data directly into the database rather than using your application if you are trying to "seed" the application. Performance wise of course.
(Added as answer per comment)

Neo4j 2.0.1 enterprise edition: Performance issue

I was happily using neo4j 1.8.1 community edition for a while on my system with the following configuration.
System Specs:
OS: 32-bit Ubuntu 12.04.3 LTS. Kernel version 3.2.0-52-generic-pae #78-Ubuntu
Memory: 4GB
Swap: 8GB (swapfile - not a partition)
Processor: Intel® Core™ i5-2430M CPU # 2.40GHz - Quad Core
Harddisk: 500GB Seagate ATA ST9500420AS. Dual boot - Ubuntu uses 100GB and the rest by the almighty Windows 7.
When I switched to neo4j 2.0.1 enterprise edition, my application's response time became 4x slower. So, as advised in http://docs.neo4j.org/chunked/stable/embedded-configuration.html, I started tuning my filesystem, virtual memory, I/O-schedular and JVM configurations.
Performance Tuning
Started Neo4j as a server with highest scheduling priority (nice value = -20)
Set vm.dirty_background_ratio=50 and vm.dirty_ratio=80 in /etc/sysctl.conf to reduce frequent flushing of dirty memory pages to disk.
Increased maximum number of open files from 1024 to 40,000 as suggested in Neo4j startup.
Set noatime,nodiratime for the neo4j ext4 partition in /etc/fstab so that inodes don't get updated every time there is a file/directory access.
Changed I/O scheular to "noop" from "cfq" as mentioned in
http://www.cyberciti.biz/faq/linux-change-io-scheduler-for-harddisk/
JVM parameters: In short, max heap size is 1GB and neostore memory mapped files size is 425 MB.
Xms and Xmx to 1GB.
GC to Concurrent-Mark-Sweep.
neostore.nodestore.db.mapped_memory=25M,
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M
Sadly, this didn't make any difference. I wrote a simple script which creates N nodes and M random relationships among these nodes to get a better picture.
Neo4j 1.8.1 community edition with oracle java version "1.6.0_45":
new-sys-admin#ThinkPad:~/temp$ php perftest.php
Creating 1000 Nodes with index
Time taken : 67.02s
Creating 4000 relationships
Time taken : 201.27s
Neo4j 2.0.1 enterprise edition with oracle java version "1.7.0_51":
new-sys-admin#ThinkPad:~/temp$ php perftest.php
Creating 1000 Nodes with index
Time taken : 75.14s
Creating 4000 relationships
Time taken : 206.52s
The above results are after 2 warm-up runs. 2.0.1 results seem slower than 1.8.1. Any suggestions on adjusting the relevant configurations to boost up neo4j 2.0.1 performance would be highly appreciated.
EDIT 1
All queries are issued using Gremlin via Everyman Neo4j wrapper.
http://grokbase.com/p/gg/neo4j/143w1fen8c/gremlin-plugin-extremely-slow-on-neo4j-2-0-1
In the mean time, I moved to neo4j-enterprise-edition-1.9.6 (the next recent stable release before 2.0.1) and things were back to Normal
From the fact that you're using PHP, and seeing that creating just a 1000 nodes is 67 seconds, I assume you're using the regular REST API (eg. POST /db/data/node). If this is correct, you may be right that 2.0.1 is some percentage point slower than 1.8 for these CRUD operations. In 2.0 we focused on optimizing Cypher and the new transactional endpoint.
As such, for best performance, I'd suggest these things:
Use the new transactional endpoint, /db/data/transaction
Use cypher, and use it to send as much work as possible in "one go" over to the server
When possible, send multiple cypher queries in the same HTTP request, you can do this as well through the transactional endpoint.
Make sure you re-use TCP connections if you can, I'm not sure exactly how this works in PHP, but sending "Connection: Keep-alive" header and ensuring you re-use the same tcp connection saves significant overhead, since you don't have to re-establish TCP connections over and over.
Creating a thousand nodes in one cypher query shouldn't take more than a few milliseconds. In terms of how many cypher statements you can send per second, on my laptop and from python (using https://github.com/jakewins/neo4jdb-python), I get about 10 000 cypher statements per second in a concurrent setup (10 clients).

Neo4j 1.9.4 (REST Server,CYPHER) performance issue

I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests.
Domain:
some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships.
Current data set footprint(from console):
LogicalLogSize=4294907 (32MB)
ArrayStoreSize=1675520 (12MB)
NodeStoreSize=1342170 (10MB)
PropertyStoreSize=1739548 (13MB)
RelationshipStoreSize=6395202 (48MB)
StringStoreSize=1478400 (11MB)
which is IMHO really small.
most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast):
START
targetUser=node({id}),
currentUser=node({current})
MATCH
targetUser-[contact:InContactsRelation]->n,
n-[:InLocationRelation]->l,
n-[:InCategoryRelation]->c
WITH
currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites
MATCH
n<-[followers?:InContactsRelation]-()
WITH
currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers
RETURN
id(n) as id,
n.name? as name,
n.title? as title,
n._class as _class,
n.avatar? as avatar,
n.avatar_type? as avatar_type,
l.name as location__name,
c.name as category__name,
true as isInContacts,
inFavorites as isInFavorites,
numFollowers
it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem.
Problem:
So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%.
Some relevant parts of configuration:
neostore.nodestore.db.mapped_memory=1280M
wrapper.java.maxmemory=8192
note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all).
Question:
Where to digg? configuration? scheme? queries? what i'm doing wrong?
if need more info(logs, configs) just ask ;)
The reason for subsequent invocations of the same query being much faster can be easily explained by the usage of caches. A common strategy is to run a cache warmup query upon startup, e.g.
start n=node(*) match n--m return count(n)
200% CPU usage on a 24 core means the machine is pretty lazy as only 2 cores are busy. When a query is in progress it's normal that CPU goes to 100% while running.
The Cypher statement above uses an optional match (in the 2nd match clause). These optional matches are known as being potentially slow. Check out if runtime changes if you make this a non-optional match.
When returning a larger result set consider that transferring the response is driven by network speed. Consider using streaming in the case, see http://docs.neo4j.org/chunked/milestone/rest-api-streaming.html.
You also should set wrapper.java.minmemory to the same value as wrapper.java.maxmemory.
Another approach for your rather small graph is to switch off MMIO caching and use cache_type=strong to keep the full dataset in the object cache. In this case you might need to increas wrapper.java.minmemory and wrapper.java.maxmemory.

Identify processor (core) is used by specific thread

I would like to know if it is possible to identify physical processor (core) used by thread with specific thread-id?
For example, I have a multithreaded application that has two (2) threads (thread-id = 10 and thread-id = 20, for instance). I run the application on a system that has a dual core processor (core 1 and core 2). So, how do I to get core number used by thread with thread-id = 20?
P.S. Windows platforms.
Thank you,
Denis.
Unless you use thread-affinity, threads are not assigned to specific cores. With every time slice, the thread can be executed on different cores. This means that if there would be a function to get the core of a thread, by the time you get the return value, there's a big chance that the thread is already executing on another core.
If you are using thread-affinity, you could take a look at the Windows thread-affinity functions (http://msdn.microsoft.com/en-us/library/ms684847%28v=VS.85%29.aspx).
There are functions called GetCurrentProcessorNumber (available since Server 2003 and Vista) and GetCurrentProcessorNumberEx (available since Server 2008 R2 and Windows 7).
See also this question's answers for more related options and considerations (including Windows XP - primarily this answer describing the use of cpuid instruction).
Of course the core number can be changed any time by the scheduler so if You need to be sure then perhaps it helps for a reasonable amount if You check the core number both before and after something You measured or executed for a short amount of time, and if the core number is still same then You know on which core most likely the intermediate code also executed.

Resources