I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests.
Domain:
some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships.
Current data set footprint(from console):
LogicalLogSize=4294907 (32MB)
ArrayStoreSize=1675520 (12MB)
NodeStoreSize=1342170 (10MB)
PropertyStoreSize=1739548 (13MB)
RelationshipStoreSize=6395202 (48MB)
StringStoreSize=1478400 (11MB)
which is IMHO really small.
most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast):
START
targetUser=node({id}),
currentUser=node({current})
MATCH
targetUser-[contact:InContactsRelation]->n,
n-[:InLocationRelation]->l,
n-[:InCategoryRelation]->c
WITH
currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites
MATCH
n<-[followers?:InContactsRelation]-()
WITH
currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers
RETURN
id(n) as id,
n.name? as name,
n.title? as title,
n._class as _class,
n.avatar? as avatar,
n.avatar_type? as avatar_type,
l.name as location__name,
c.name as category__name,
true as isInContacts,
inFavorites as isInFavorites,
numFollowers
it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem.
Problem:
So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%.
Some relevant parts of configuration:
neostore.nodestore.db.mapped_memory=1280M
wrapper.java.maxmemory=8192
note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all).
Question:
Where to digg? configuration? scheme? queries? what i'm doing wrong?
if need more info(logs, configs) just ask ;)
The reason for subsequent invocations of the same query being much faster can be easily explained by the usage of caches. A common strategy is to run a cache warmup query upon startup, e.g.
start n=node(*) match n--m return count(n)
200% CPU usage on a 24 core means the machine is pretty lazy as only 2 cores are busy. When a query is in progress it's normal that CPU goes to 100% while running.
The Cypher statement above uses an optional match (in the 2nd match clause). These optional matches are known as being potentially slow. Check out if runtime changes if you make this a non-optional match.
When returning a larger result set consider that transferring the response is driven by network speed. Consider using streaming in the case, see http://docs.neo4j.org/chunked/milestone/rest-api-streaming.html.
You also should set wrapper.java.minmemory to the same value as wrapper.java.maxmemory.
Another approach for your rather small graph is to switch off MMIO caching and use cache_type=strong to keep the full dataset in the object cache. In this case you might need to increas wrapper.java.minmemory and wrapper.java.maxmemory.
Related
Our application at startup checks for the presence of certain tables, sequences and a few other things. We had programmed that straight forward like so:
...
MetaData meta = connection.getMetaData();
...
ResultSet tables = meta.getTables(...);
... <checking for the presences of specific tables>
ResultSet sequences = meta.getSequences(...);
... <checking for the presences of specific sequences>
etc.
...
While so far the initial connection.getMetaData()-call always had a sub-second duration, after moving to a bigger, more powerful and shared Oracle DB Server this call now reproducibly takes more than 5 minutes(!). This time goes directly to the startup time of our application which has more than quadrupled by this and that is of course a big no-go!
Any idea why this JDBC call takes so long on one system but not on another? And are there any options or settings that could speed this up? Both databases report as "Oracle Database 11g Release 11.2.0.4.0 - 64bit Production". Both servers are in our intranet, so network-wise they should be similarly reachable. The new one CPU- and RAM-wise much more powerful and is configured in a fail-over config (i.e. the connection URL contains 2 servers in case one is down or not reachable). The old one was a simple one-machine setup.
Anything else that could be relevant to this or explain why that call now takes that much longer?
Addendum:
We tried to debug into the method (but didn't get very far). But the culprit seems to be in DatabaseMetadata.initSequences(), i.e. it seems that the fetching of the sequences is the part that takes so long on this server while it took split-seconds on the other. Any wisdom what could be causing this?
We found the culprit of our slow Metadata query!
The reason is that during initialization we set the comparison mode to LINGUISTIC, i.e. we execute a:
alter session set nls_comp=LINGUISTIC;
With that setting active the retrieval of sequences (as part of the getMetaData()) takes ~5 minutes! If we leave it at the default (which is alter session set nls_comp=BINARY) then the fetching of the metadata takes ~1 second only!
Apparently that comparison mode leads to a full table scan which causes this crazy query duration. However, we need this comparison mode since otherwise many of our queries don't yield matches (in our case names, company names, adresses, etc.) that contain accented characters.
We "fixed" this issue by switching the comparison mode at a later point in the application after we have completed the startup checks that verify the presence of certain tables and sequences, etc.
If someone knows an approach, how to speed up the MetaData-creation even in the presence of a non-default comparison mode (e.g. can one create a special index or similar) - please let me know!
Obviously there must be some additional setting involved here, because - as mentioned - on our previous DB-server the MetaData-fetch had taken ~1 second only even with that mode set to LINGUISTIC.
We need to seed an application with 3 million entities before running performance tests.
The 3 million entities should be loaded through the application to simulate 3 years of real data.
We are inserting 1-5000 entities at a time. In the beginning response times are very good. But after a while they decay exponentially.
We use at groovy script to hit a URL to start each round of insertions.
Restarting the application resets the response time - i.e. fixes the problem temporally.
Reruns of the script, without restarting the app, have no effect.
We use the following to enhance performance
1) Cleanup GORM after each 100 insertions:
def session = sessionFactory.currentSession
session.flush()
session.clear()
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
(old Ted Naleid trick: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql)
2) We use GPars for parallel insertions:
GParsPool.withPool {
(0..<1000).eachParallel {
def entity = new Entity(...)
insertionService.insert(entity)
}
}
Notes
When looking at the log output, I've noticed that the processing time for each entity are the same, but the system seems to pause longer and longer between each iteration.
The exact number of entities inserted are not important, just around 3 mill, so if some fail we can ignore it.
Tuning the number of entities at a time have little or no effect.
Help
I'm really hoping somebody have a good idea on how to fix the problem.
Environment
Grails: 2.4.2 (GRAILS_OPTS=-Xmx2G -Xms512m -XX:MaxPermSize=512m)
Java: 1.7.0_55
MBP: OS X 10.9.5 (2,6 GHz Intel Core i7, 16 GB 1600 MHz DDR3)
The pausing would make me think it's the JVM doing garbage collection. Have you used a profiler such as VisualVM to see what time is being spent doing garbage collection? Typically this will be the best approach to understanding what is happening with your application within the JVM.
Also, it's far better to load the data directly into the database rather than using your application if you are trying to "seed" the application. Performance wise of course.
(Added as answer per comment)
Short question is on the title: I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.
Long Question for those willing to know the context:
I am working on a huge set of data like
{
_id:ObjectId("azertyuiopqsdfghjkl"),
stringdate:"2008-03-08 06:36:00"
}
and some other fields and there are about 250M documents like that (whole database with the indexes weights 36Go). I want to convert the date in a real ISODATE field. I searched a bit how I could make an update query like
db.data.update({},{$set:{date:new Date("$stringdate")}},{multi:true})
but did not find how to make this work and resolved myself to make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value. The query use the _id so the default index is used.
Problem is that it takes a very long time. I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added. I also set an index on a relevant field to process the database chunk by chunk. Finally I ran several concurrent mongo clients on both the server and my workstation to ensure that the limitant factor is the database lock availability and not any other factor like cpu or network costs.
I monitored the whole thing with mongotop, mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time. I am a bit disappointed mongodb does not have a more precise granularity on its write lock, why not allowing concurrent write operations on the same collection as long as there is no risk of interference? Now that I think about it I should have sharded the collection on a dozen shards even while staying on the same server, because there would have been individual locks on each shard.
But since I can't do a thing right now to the current database structure, I searched how to improve performance to at least spend 90% of my time writing in mongo (from 70% currently), and I figured out that since I ran my script in the default mongo shell, every time I make an update, there is also a getLastError() which is called afterwards and I don't want it because there is a 99.99% chance of success and even in case of failure I can still make an aggregation request after the end of the big process to retrieve the single exceptions.
I don't think I would gain so much performance by deactivating the getLastError calls, but I think itis worth trying.
I took a look at the documentation and found confirmation of the default behavior, but not the procedure for changing it. Any suggestion?
I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.
You can use db.getLastError({w:0}) ( http://docs.mongodb.org/manual/reference/method/db.getLastError/ ) to do what you want but it won't help.
This is because for one:
make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value.
When using the shell in a non-interactive mode like within a loop it doesn't actually call getLastError(). As such downing your write concern to 0 will do nothing.
I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added.
I did tell people when they asked about this stuff to add those fields incase of movement but instead they listened to the guy who said "leave them out! They use space!".
I shouldn't feel smug but I do. That's an unfortunately side effect of being right when you were told you were wrong.
mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time
That's because of all the movement in your documents, kinda hard to fix that.
I am a bit disappointed mongodb does not have a more precise granularity on its write lock
The write lock doesn't actually denote the concurrency of MongoDB, this is another common misconception that stems from the transactional SQL technologies.
Write locks in MongoDB are mutexs for one.
Not only that but there are numerous rules which dictate that operations will subside to queued operations under certain circumstances, one being how many operations waiting, another being whether the data is in RAM or not, and more.
Unfortunately I believe you have got yourself stuck in between a rock and hard place and there is no easy way out. This does happen.
I have the following graph structure
4m nodes
23m properties
13m relationships
Java version
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
Neo4j version
neo4j-community-2.0.0-M03
Machine
Mac OS X 10.8.4
2.5 GHz Intel Core i5
8 GB 1600 MHz DDR3
Problem
I am doing some experiments with three queries. #1 is taking 16 seconds, #2 is taking 8 minutes and #3 is "crashing". Both #2 and #3 put all the available CPU cores in ~90% usage. I am using the web interface for evaluating those queries (and I will be using the REST API to integrate the app with neo4j)
I would like to know what is wrong with those queries and how I could optimise them. I am currently using the default settings.
Cypher Queries
Query #1 (Currently taking 16 seconds (after warm-up))
START root=node:source(id="2")
MATCH root-[]->movies<-[]-others
WITH COUNT(movies) as movie_count, others as others
RETURN others.id, movie_count
ORDER BY movie_count DESC
LIMIT 10
Query #2 (8 minutes)
START root=node:source(id="2")
MATCH
root-[]->stuff<-[]-others
WITH DISTINCT(others) as dothers
MATCH dothers-[]->different
RETURN different.id, COUNT(different) as count
ORDER BY count DESC
LIMIT 10
Query #3 (OutOfMemoryError - GC overhead limit exceeded)
START root=node:source(id="2")
MATCH root-[*1..1]->stuff<-[*1..1]-other-[*1..1]->different
WHERE stuff.id <> different.id
WITH COUNT(different) as different_count, different as different
RETURN different.id, different_count
ORDER BY different_count DESC
LIMIT 10
Disclaimer: This advice is for 1.8 and 1.9. If you're using 2.0 or 2.1, these comments may no longer be valid.
Query 1: Make your WITH your RETURN, and skip that extra step.
Query 2: Don't do distinct in WITH as you are now. Go as far as you can without doing distinct. This looks like a premature optimization in the query that makes it not be lazy and has to store many more intermediate results to calculate the WITH results.
Query 3: Don't do -[*1..1]->; that's the same as -[]-> or -->, but it uses a slower matcher for variable length paths when it really just needs adjacent nodes and can use a fast matcher. Make the WITH your RETURN and take out that extra pipe it needs to go through so it can be lazier (although the order by kind of makes it hard to be lazy). See if you can get it to complete without the order by.
If you need faster responses and can't squeeze it out of your queries with my recommendations, you may need to turn to the Java API until Cypher performance improvements in 2.x. The unmanaged extension method makes these easy to call from the REST interface.
When looking for performance please go with the latest stable version (1.9.x at timepoint when writing this answer) of Neo4j.
2.0.0.M03 is a milestone build and not yet optimized. So far the focus is on feature completeness with regards to the new concept of labels and label based indexing.
[Prescript: I know that nothing here is specific to Delayed::Job. But it helps establish the context.]
update
I believe the SQL queries are not being garbage collected. My application generates many large SQL insert/update operations (160K bytes each, about 1 per second) and sends them to PostgreSQL via:
ActiveRecord::Base.connection.execute(my_large_query)
When I perform these db operations, my application slowly grows without bound. When I stub out the db operations (but perform all the other functions in my app) the bloating stops.
So: any ideas on why this is happening, how I can pinpoint it, or how I can make it stop?
original question
I have delayed tasks that slurp data from the web and create records in a PostgreSQL database. They seem to be working okay, but they start at vmemsize=100M and within ten minutes bulk up to vmemsize=500M and just keeps growing. My MacBook Pro with 8G of RAM starts thrashing when the VM runs out.
How can I find where the memory is going?
Before you refer me to other SO posts on the topic:
I've added the following to my #after(job) method:
def after(job)
clss = [Object, String, Array, Hash, ActiveRecord::Base, ActiveRecord::Relation]
clss.each {|cls| object_report(cls, " pre-gc")}
ObjectSpace.each_object(ActiveRecord::Relation).each(&:reset)
GC.start
clss.each {|cls| object_report(cls, "post-gc")}
end
def object_report(cls, msg)
log(sprintf("%s: %9d %s", msg, ObjectSpace.each_object(cls).count, cls))
end
It reports usage on the fundamental classes, explicitly resets ActiveRecord::Relation objects (suggested by this SO post), explicitly does a GC (as suggested by this SO post), and reports on how many Objects / Strings / Arrays / Hashes, etc there are (as suggested by this SO post). For what it's worth, none of those classes are growing significantly. (Are there other classes I should be looking at? But wouldn't that be reflected in the number of Objects anyway?)
I can't use memprof because I'm running Ruby 1.9.
And there are other tools that I'd consider if I were running on Linux, but I'm on OS X.
update
I'm afraid this was all a red herring: left running long enough, each ruby job grows to a vmsize of about 1.2GB (yeah, that big, but not huge by today's standards), then shrinks back down to 850MB and bobbles between those two values thereafter without continuing to grow bigger.
My real problem was that I was trying to run more than four such processes on my machine with 8GB RAM, which filled up all available RAM and then went into swapping hypoxia. Running only four processes almost fills up available memory, so the system doesn't start swapping.
update 2
Nope, still a problem -- I didn't let the jobs run long enough: the jobs grow continually (albeit slowly). Even running just two external jobs eventually consumes all VM and my machine starts thrashing.
I tried running the in production mode (thinking that dev mode may cache things that don't get freed), but it didn't make any appreciable difference.