Improve performance of Neo4j - performance

I'm using NEO4j enterprise edition, and I have created a database with around 20200 relations and some 15000 nodes. When I try to find the 3-shortest path between two random nodes, it takes around 3-15 seconds(which is quite slow). How can I improve the response time from neo4j?
I'm using Ubuntu 16X and I have 4 GB ram configured.

Related

Apache Solr 6.6.2 configuration update to utilize maximum system resources

I have setup solr 6.x on a dedicated system with 8 cores and 16 GB RAM. There are about 8 Million documents in Solr. Each doc with 20 fields and most fields are both indexed and stored. Following is the plot of last 2 hours CPU usage.
This plot shows that CPU usage is very minimum. Even if I do stress test, then CPU usage goes to 40-50%. Similar graph is observed in RAM. Now the question is how to configure Solr so that it should use maximum system resources i.e., CPU upto 70-80%.
As I think, it is not fair to ask my boss to provide a new machine when solr performance start to degrade without using max system resources. I know Solr can run in Cloud but my question remains the same, how to use max resources by Solr.

Cassandra integration with hadoop for read performance

I am using Apache Cassandra for storing around 100 million records. There is one single node with the following specifications-
RAM-32GB, HDD-2TB, Intel quad core processor.
With cassandra there is a read performance problem. For some queries it takes around 40mins for giving the output. After searching for how to improve the read performance i came to know about the following factors-
Compaction strategy,compression techniques, key cache, increase the heap space, turning off the swap space for cassandra.
After doing these optimizations, the performance remains the same. After seraching, I came around for integrating Hadoop with cassandra.Is it the correct way to do the queries in cassandra or any other factors I am missing here??
Thanks.
It looks like you data model could be improved. 40 minutes is something impossible. I download all data from 6 million records (around 10gb) within few minutes. And think it because I convert data in the process of download and store them. Trivial selects must take milliseconds.
Did you build it on the base of queries that you must do ?

Which one is faster, OpenTSDB or KairosDB?

OpenTSDB is super fast. KairosDB is known as re-write of OpenTSDB and as claimed that it's even faster than OpenTSDB (see here). However I did some tests with a pseudo-distributed cluster (1 master, 1 slave, locally) for OpenTSDB and 1-node cluster for KairosDB on my VirtualBox (5 GB RAM, 3 cores). The insertion speed was around 100,000 records / sec for OpenTSDB and 30,000 records / sec for KairosDB. Did I configure something wrong with KairosDB or OpenTSDB is actually faster?
I don't have measurements on OpenTSDB. We use KairosDB and it's quite fast.
What database did you use for KairosDB? H2 is for test only and is desperately slow.
What interface did you use for pushiong data? And if you use the REST API how did you build your queries?
On a single bare metal node we were above 50,000 samples per second using Telnet (limited by the acquisition agent), and about 3 times this speed for gzipped JSON using the REST API (this is for batch inserts of historical data, and the JSON is build to insert data with one array of datapoints per series, and with up to 10,000,000 samples per document).
Maybe VirtualBox slows everything down too much (guest VMs have very poor performances).
On another hand the last time I spoke with OpenTSDB they were a lot under the 100,000 points per seconds in insertion... So they may have improved performances.

cassandra massive write perfomance problem

I have server with 4 GB RAM and 2x 4 cores CPU. When I start perform massive writes in Cassandra all works fine initially, but after a couple hours with 10K inserts per second database grows up to 25+ GB, and performance go down to 500 insert per seconds!
I find out this because compacting operations is very slow but I don't understand why? I set 8 concurrent compacting threads but Cassandra don't use 8 threads; only 2 cores are loaded.
Appreciate any help.
We've seen similar problems with Cassandra out-the-box, see:
http://www.acunu.com/blogs/richard-low/cassandra-under-heavy-write-load-part-ii/
One solution to these sort of performance degradation issues (but by no means the only) is to consider a different storage engine, like Castle, used in the above blog post - its opensource (GPL v2), has much better performance and degrades much more gracefully. The code is here (I've just pushed up a branch for Cassandra 0.8 support):
https://bitbucket.org/acunu/fs.hg
And instructions on how to get started are here:
http://support.acunu.com/entries/20216797-castle-build-instructions
(Full disclosure: I work for Acunu, so may be a little biased ;-)

memcached limitations

Has anyone experienced memcached limitations in terms of:
of objects in cache store - is there a point where it loses performance?
Amount of allocated memory - what are the basic numbers to work with?
I can give you some metrics for our environment. We run memcached for Win32 on 12 boxes (as cache for a very database heavy ASP.NET web site). These boxes each have their own other responsibilities; we just spread the memcached nodes across all machines with memory to spare. Each node had max 512MB allocated by memcached.
Our nodes have on average 500 - 1000 connections open. A typical node has 60.000 items in cache and handles 1000 requests per second (!). All of this runs fairly stable and requires little maintenance.
We have run into 2 kinds of limitations:
1. CPU use on the client machines. We use .NET serialization to store and retrieve objects in memcached. Works seamless, but CPU use can get very high with our loads. We found that some object can better be first converted to strings (or HTML fragments) and then cached.
2. We have had some problems with memcached boxes running out of TCP/IP connections. Spreading across more boxes helped.
We run memcached 1.2.6 and use the .NET client from http://www.codeplex.com/EnyimMemcached/
I can't vouch for the accuracy of this claim, but at a linux/developer meetup a few months ago an engineer talked about how his company scaled memcache back to using 2GB chunks, 3-4 per memcache box. They found that throughput was fine, but with very large memcache daemons that they were getting 4% more misses. He said they couldn't figure out why there was a difference but decided to just go with what works.

Resources