Neo4j Cypher queries really slow after upgrade to 2.1.3 - performance

This morning, with some struggles (see: Upgrading a neo4j database from 2.0.1 to 2.1.3 fails), i upgraded my database from version 2.0.1 to 2.1.3. My main goal with the upgrade was to gain performance on certain queries (see: Cypher SORT performance).
Everything seems to be working, except for the fact that all Cypher queries - without exception - have become much, much, much slower. Queries that used to take 75ms now take nearly 2000ms.
As i was running on an A1 (1xCPU ~2GB RAM) VM in Azure, i thought that giving neo4j some more ram and an extra core would help, but after upgrading to an A2 VM i get more or less the same results.
I'm no wondering, did i loose my indexes by doing a backup and upgrading/using that db? I have perhaps 50K nodes in my db, so it's not that spectacular, right?
I'm now still running on an A2 VM (2xCPU, ~4GB RAM), but had to downgrade to 2.0.1 again.
UPDATE: #1 2014-08-12
After reading Michael's first comment, on how to inspect my indexes using the shell, i did the following:
With my 2.0.1 database service running (and performing well), i executed Neo4jShell.bat and then executed the Schema command. This yielded the following response:
I uninstalled the 2.0.1 service using the Neo4jInstall.bat remove command.
I installed the 2.1.3 service using the Neo4jInstall install command.
With my 2.1.3 database service running, I again executed the Neo4jShell.bat and then executed the schema command. This yielded the following response:
I think it is safe to conclude that either the migration process (in 2.1.3) or the backup process (in 2.0.1) has removed the indexes from my database. This does explain why my backed up database is much smaller (~110MB) than the online database (~380MB). After migration to 2.1.3, my database became even smaller (~90MB).
Question is now, is it just a matter of recreating my indexes and be done with it?
UPDATE: #2 2014-08-12
I guess i have answered my own question. After recreating the constraints and indexes, my queries perform like they used to (some even faster, as expected).

Eventually, it turned out that in the process of backing up my database (in version 2.0.1) or during the migration process at startup (in version 2.1.3) i lost my indexes and constraints. Obvious solution is to manually recreate them (http://docs.neo4j.org/chunked/stable/cypher-schema.html) and be on your way.

Related

CouchDB Performance: 1.6.1 vs 2.1.1

We are looking at upgrading our CouchDB on our RHEL servers from 1.6.1 to 2.1.1. Before we do that, though, we wanted to run a performance test. So we created a JMeter test that goes directly against the database. It does not use any random values, so that the test would be exactly the same, and we could compare the two results. This is just a standalone server, we are not using clustering. I ran the tests the exacts same way for both. I ran the tests for 1.6.1, and then installed 2.1.1 on the same machine. And I created the database new for each test run. [I also updated Erlang to R19.3.]
The results were very shocking:
Average response times:
1.6.1: 271.15 ms
2.1.1: 494.32 ms
POST and PUTs were really bad ...
POST:
1.6.1: 38.25 ms
2.1.1: 250.18 ms
PUT:
1.6.1: 37.33 ms
2.1.1: 358.76
We are just using the default values for all the config options, except that we changed 1.6.1 to have delayed_commits = false (that is now the default in 2.1.1). I'm wondering if there's some default that changed that would make 2.1.1 so bad.
When I ran the CouchDB setup from the Fauxton UI, it added the following to my local.ini:
[cluster]
n = 1
Is that causing CouchDB to try to use clustering, or is that the same as if there were no entries here at all? One other thing, I deleted the _global_changes database, since it seemed as if it would add extra processing that we didn't need.
Is that causing CouchDB to try to use clustering, or is that the same as if there were no entries here at all?
It's not obvious from your description. If you setup CouchDB 2.0 as clustered then that's how it will work. This is something you should know depending on the setup instructions you followed: http://docs.couchdb.org/en/2.1.1/install/setup.html
You can tell by locating the files on disk and seeing if they are in a shards directory or not.
I'm pretty sure you want at least two, so setting n = 1 doesn't seem like something you should be doing.
If you're trying to run in single node follow the instructions I linked above to do that.
One other thing, I deleted the _global_changes database, since it seemed as if it would add extra processing that we didn't need.
You probably don't want to delete random parts of your database unless there are instructions saying this is OK?

Sinatra + Chartkick + Sequel gem, chart not updating

I'm running a very basic Sinatra server, which simply shows a Chartkick graph of some data I have through the Sequel gem. I'm noticing that the data on the chart doesn't seem to update unless I quit the Sinatra server script and rerun it. I don't really understand how that would be possible... the only non-normal thing option I'm using when reading my database using Sequel is the read-only option.. would that cause this?
It turns out, from reading another post on here:
First, by default, multiple processes can have the same SQLite
database open at the same time, and several read accesses can be
satisfied in parallel.
In case of writing, a single write to the database locks the database
for a short time, nothing, even reading, can access the database file
at all.
Beginning with version 3.7.0, a new “Write Ahead Logging” (WAL) option
is available, in which reading and writing can proceed concurrently.
By default, WAL is not enabled. To turn WAL on, refer to the SQLite
documentation.
I currently have script A, which maintains a connection to the DB file and writes to it regularly, and script B, which is my Sinatra server that reads information from that DB file. I worked around this issue by using a block connection in my Sinatra script. I don't know how to turn on WAL with Sequel though...

mongo shell not showing all dbs

Good Day.
I've been developing with meteorJS which uses mongodb. No problems there. I've been using the mongo shell to access the database on my dev machine (osx 10.11). This is my first project with mongo and when the shell would load, it would connect to db.test and I'd always show dbs and get the list of database, then use myApp.
Yesterday whenever I go into the shell and I type show dbs the only one shown is local 0.078GB. However my app is still working and pulling and pushing data to the database.
I've checked the dbpath in the mongod.conf and that seems ok. I'm not entirely sure about the exact order of things, but two things where different (I'm not sure if these happened prior to the show dbs not showing everything or after, and I'm not sure which came first):
when loading the mongo shell I was getting this error:
WARNING: soft rlimits too low. Number of files is 256, should be at least 1000"
I followed these directions which seemed to stop that error from appearing (https://github.com/basho/basho_docs/issues/1402 )
I use Meteor Toys and for the first time I update user.profile.companyName (which is a custom field within the standard profile from within the Meteor Toys widget.
Just odd that the app can still access the database and collections, but that the mongo shell doesn't show. I've update mongod via brew upgrade mongodb from 3.0.2 to 3.0.7 to no avail.
Any ideas?
If you want to use the regular mongo console you have to specify the port to be 3001 for meteor apps instead of the default 27017. Otherwise it's much simpler to just type meteor mongo and connect that way. Then you can type 'show collections' and it will show them all just like normal.
MongoDB do not show the database unless if there is minimum of one collection with a document in it.
Refer to this link

Pentaho CassandraInput slow response time

I have a problem that affects the response time of the queries from CassandraInput. I use Datastax Enterprise 3.2.4 - Cassandra 1.2.13.2.
If I try to run the same query (any) directly from the Cassandra client, the answer is considerably faster than the same query executed on the node CassandraInput from Pentaho Data Integration.
What can cause this?
And above all, there is a way to improve the response time from the node CassandraInput in Pentaho?
I hope that some of you might have some suggestions.
Thank you
Federica
Generally it has not to happen.
Try below thing and check whether performance in increasing or not.
open spoon.bat or spoon.sh file according to the OS you are using and change below thing.
Below thing has to be change according to the size of the RAM of your Machine.
PENTAHO_DI_JAVA_OPTIONS="-Xmx2g"

MySQL database backup: performance issues

Folks,
I'm trying to set up a regular backup of a rather large production database (half a gig) that has both InnoDB and MyISAM tables. I've been using mysqldump so far, but I find that it's taking increasingly longer periods of time, and the server is completely unresponsive while mysqldump is running.
I wanted to ask for your advice: how do I either
Make mysqldump backup non-blocking - assign low priority to the process or something like that, OR
Find another backup mechanism that will be better/faster/non-blocking.
I know of the existence of MySQL Enterprise Backup product (http://www.mysql.com/products/enterprise/backup.html) - it's expensive and this is not an option for this project.
I've read about setting up a second server as a "replication slave", but that's not an option for me either (this requires hardware, which costs $$).
Thank you!
UPDATE: more info on my environment: Ubuntu, latest LAMPP, Amazon EC2.
If replication to a slave isn't an option, you could leverage the filesystem, depending on the OS you're using,
Consistent backup with Linux Logical Volume Manager (LVM) snapshots.
MySQL backups using ZFS snapshots.
The joys of backing up MySQL with ZFS...
I've used ZFS snapshots on a quite large MySQL database (30GB+) as a backup method and it completes very quickly (never more than a few minutes) and doesn't block. You can then mount the snapshot somewhere else and back it up to tape, etc.
Edit: (previous answer was suggestion a slave db to back up from, then I noticed Alex ruled that out in his question.)
There's no reason your replication slave can't run on the same hardware, assuming the hardware can keep up. Grab a source tarball, ./configure --prefix=/dbslave; make; make install; and you'll have a second mysql server living completely under /dbslave.
EDIT2: Replication has a bunch of other benefits, as well. For instance, with replication running, you'll may be able to recover the binlog and replay it on top your last backup to recover the extra data after certain kinds of catastrophes.
EDIT3: You mention you're running on EC2. Another, somewhat contrived idea to keep costs down is to try setting up another instance with an EBS volume. Then use the AWS api to spin this instance up long enough for it to catch up with writes from the binary log, dump/compress/send the snapshot, and then spin it down. Not free, and labor-intensive to set up, but considerably cheaper than running the instance 24x7.
Try mk-parallel-dump utility from maatkit (http://www.maatkit.org/)
regards,
Something you might consider is using binary logs here though a method called 'log shipping'. Just before every backup, issue out a command to flush the binary logs and then you can copy all except the current binary log out via your regular file system operations.
The advantage with this method is your not locking up the database at all, since when it opens up the next binary log in sequence, it releases all the file locks on the prior logs so processing shouldn't be affected then. Tar'em, zip'em in place, do as you please, then copy it out as one file to your backup system.
An another advantage with using binary logs is you can restore up to X point in time if the logs are available. I.e. You have last year's full backup, and every log from then to now. But you want to see what the database was on Jan 1st, 2011. You can issue a restore 'until 2011-01-01' and when it stops, your at Jan 1st, 2011 as far as the database is concerned.
I've had to use this once to reverse the damage a hacker caused.
It is definately worth checking out.
Please note... binary logs are USUALLY used for replication. Nothing says you HAVE to.
Adding to what Rich Adams and timdev have already suggested, write a cron job which gets triggered on low usage period to perform the slaving task as suggested to avoid high CPU utilization.
Check mysql-parallel-dump also.

Resources