Redis performance issues? - performance

I was trying to put some heavy load on my Redis for testing purposes and find out any upper limits. First I loaded it with 50,000 and 100,000 keys of size 32characters with values around 32 characters. It took no more than 8-15 seconds in both key sizes. Now I try to put 4kb of data as value for each key. First 10000 keys take 800 milli seconds to set. But from that point it slows down gradually and to set whole 50,000 keys it takes aroudn 40 minutes. I am loading the database using NodeJs with node_redis (Mranney) . Is there any mistake I am doing or is Redis just that slow with big values of size 4 KB?
One more thing I found now is when I run another client parallel to the current one and update keys this 2nd client finishes up loading the 50000 keys with 4kb values within 8 seconds while the first client still does its thing forever. Is it a bug in node or the redis library? This is alarming and not acceptable for production.

You'll need to get some kind of back pressure for doing bulk writes from node into Redis. By default, node will queue all writes and does not enforce an upper bound on the outgoing queue size.
node_redis has a "drain" event that you can listen for to implement some rudimentary back pressure.

The default redis configuration is not optimized for that sort of usage. I suspect you have it swapping to disk with a page size of 32 bytes, which means that each key added has to find 128 contiguous free pages and may end up using system VM or needing to expand the swap file a lot.
When you update a key, the space is already allocated so you don't see any performance issues.

Since I was doing lot of set (Key value) in NodeJs which is done asynchronously, lot of socket connections are concurrently open. The NodeJs socket write buffer might be overloaded and GC might come and fiddle with the node process.
PS: I changed redis memory configurations as Tom suggested but it was still performing the same.

Related

Rejecting new http requests when memory usage is high in GoLang

I have a simple http server written in Go that accepts big chunks of data (up to 5MB per single request in some cases, but can be even tens of KB, depends on the usage pattern). Then server asynchronously processes received data by adding it to the buffer from which one of the workers (goroutines) pick ups a task. This server is running as a container in kubernetes and has a memory limit set. Also unfortunately I'm not allowed to use HPA as only one pod is allowed per client.
Problem occurs when, someone tries to send a lot of big chunks of data to my server, due to memory limit kubelet kills my container and as a result all data stored in the buffer is lost.
I have tried next ways to mitigate a problem:
Remove memory limit in pod specs. Unfortunately my server is running in multitenant environment and I'm forced to set memory limit.
Limited a number of requests processed in flight by adding a buffered channel and timeout when request can't be added to it in 10 seconds. This has partially mitigated a problem. But first it's quite tricky to find a good balance between a buffer size and timeout, second if client has a lot of small requests server is dropping part of them even if it has a lot of free memory.
I have found that I get current memory usage of my binary by calling runtime.GetMemStats. So my next idea is to drop requests if for example memory goes above some threshold (80%). Is it the only solution to resolve a problem?

Cassandra client code with high read throughput with row_cache optimization

Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over? I believe row_cache_size_in_mb is supposed to cache frequently used records in memory, but setting it to say 10MB seems to make no difference.
I tried cassandra-stress of course, but the highest read throughput it achieves with 1KB records (-col size=UNIFORM\(1000..1000\)) is ~15K/s.
With low numbers like above, I can easily write an in-memory hashmap based cache that will give me at least a million reads per second for a small working set size. How do I make cassandra do this automatically for me? Or is it not supposed to achieve performance close to an in-memory map even for a tiny working set size?
Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over?
There are some solution for this scenario
One idea is to use row cache but be careful, any update/delete to a single column will invalidate the whole partition from the cache so you loose all the benefit. Row cache best usage is for small dataset and are frequently read but almost never modified.
Are you sure that your cassandra-stress scenario never update or write to the same partition over and over again ?
Here are my findings: when I enable row_cache, counter_cache, and key_cache all to sizable values, I am able to verify using "top" that cassandra does no disk I/O at all; all three seem necessary to ensure no disk activity. Yet, despite zero disk I/O, the throughput is <20K/s even for reading a single record over and over. This likely confirms (as also alluded to in my comment) that cassandra incurs the cost of serialization and deserialization even if its operations are completely in-memory, i.e., it is not designed to compete with native hashmap performance. So, if you want get native hashmap speeds for a small-working-set workload but expand to disk if the map grows big, you would need to write your own cache on top of cassandra (or any of the other key-value stores like mongo, redis, etc. for that matter).
For those interested, I also verified that redis is the fastest among cassandra, mongo, and redis for a simple get/put small-working-set workload, but even redis gets at best ~35K/s read throughput (largely independent, by design, of the request size), which hardly comes anywhere close to native hashmap performance that simply returns pointers and can do so comfortably at over 2 million/s.

How much load can cassandra handle on m1.xlarge instance?

I setup 3 nodes of Cassandra (1.2.10) cluster on 3 instances of EC2 m1.xlarge.
Based on default configuration with several guidelines included, like:
datastax_clustering_ami_2.4
not using EBS, raided 0 xfs on ephemerals instead,
commit logs on separate disk,
RF=3,
6GB heap, 200MB new size (also tested with greater new size/heap values),
enhanced limits.conf.
With 500 writes per second, the cluster works only for couple of hours. After that time it seems like not being able to respond because of CPU overload (mainly GC + compactions).
Nodes remain Up, but their load is huge and logs are full of GC infos and messages like:
ERROR [Native-Transport-Requests:186] 2013-12-10 18:38:12,412 ErrorMessage.java (line 210) Unexpected exception during request java.io.IOException: Broken pipe
nodetool shows many dropped mutations on each node:
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
BINARY 0
READ 2
MUTATION 4072827
_TRACE 0
REQUEST_RESPONSE 1769
Is 500 wps too much for 3-node cluster of m1.xlarge and I should add nodes? Or is it possible to further tune GC somehow? What load are you able to serve with 3 nodes of m1.xlarge? What are your GC configs?
Cassandra is perfectly able to handle tens of thousands small writes per second on a single node. I just checked on my laptop and got about 29000 writes/second from cassandra-stress on Cassandra 1.2. So 500 writes per second is not really an impressive number even for a single node.
However beware that there is also a limit on how fast data can be flushed to disk and you definitely don't want your incoming data rate to be close to the physical capabilities of your HDDs. Therefore 500 writes per second can be too much, if those writes are big enough.
So first - what is the average size of the write? What is your replication factor? Multiply number of writes by replication factor and by average write size - then you'll approximately know what is required write throughput of a cluster. But you should take some safety margin for other I/O related tasks like compaction. There are various benchmarks on the Internet telling a single m1.xlarge instance should be able to write anywhere between 20 MB/s to 100 MB/s...
If your cluster has sufficient I/O throughput (e.g. 3x more than needed), yet you observe OOM problems, you should try to:
reduce memtable_total_space_mb (this will cause C* to flush smaller memtables, more often, freeing heap earlier)
lower write_request_timeout to e.g. 2 seconds instead of 10 (if you have big writes, you don't want to keep too many of them in the incoming queues, which reside on the heap)
turn off row_cache (if you ever enabled it)
lower size of the key_cache
consider upgrading to Cassandra 2.0, which moved quite a lot of things off-heap (e.g. bloom filters and index-summaries); this is especially important if you just store lots of data per node
add more HDDs and set multiple data directories, to improve flush performance
set larger new generation size; I usually set it to about 800M for a 6 GB heap, to avoid pressure on the tenured gen.
if you're sure memtable flushing lags behind, make sure sstable compression is enabled - this will reduce amount of data physically saved to disk, at the cost of additional CPU cycles

How slow is Redis when full and evicting keys ? (LRU algorithm)

I am using Redis in a Java application, where I am reading log files, storing/retrieving some info in Redis for each log. Keys are IP addresses in my log file, which mean that they are always news keys coming, even if the same appears regularly.
At some point, Redis reaches its maxmemory size (3gb in my case), and starts evicting some keys. I use the "allkeys-lru" settings as I want to keep the youngest keys.
The whole application then slows a lot, taking 5 times longer than at the beginning.
So I have three questions:
is it normal to have such a dramatic slowdown (5 times longer)? Did anybody experience such slowdown? If not, I may have another issue in my code (improbable as the slowdown appears exactly when Redis reaches its limit)
can I improve my config ? I tried to change the maxmemory-samples setting without much success
should I consider an alternative for my particular problem? Is there a in-memory DB that could handle evicting keys with better performances ? I may consider a pure Java object (HashMap...), even if it doesn't look like a good design.
edit 1:
we use 2 DBs in Redis
edit 2:
We use redis 2.2.12 (ubuntu 12.04 LTS). Further investigations explained the issue: we are using db0 and db1 in redis. db1 is used much less than db0, and keys are totally different. When Redis reaches max-memory (and LRU algo starts evicting keys), redis does remove almost all db1 keys, which slows drastically all calls. This is a strange behavior, probably unusual and maybe linked to our application. We fixed the issue by moving to another (better) memory mechanism for keys that were loaded in db1.
thanks !
I'm not convinced Redis is the best option for your use case.
Redis "LRU" is only a best effort algorithm (i.e. quite far from an exact LRU). Redis tracks memory allocations and knows when it has to free some memory. This is checked before the execution of each command. The mechanism to evict a key in "allkeys-lru" mode consists in choosing maxmemory-samples random keys, comparing their idle time, and select the most idle key. Redis repeats these operations until the used memory is below maxmemory.
The higher maxmemory-samples, the more CPU consumption, but the more accurate result.
Provided you do not explicitly use the EXPIRE command, there is no other overhead to be associated with key eviction.
Running a quick test with Redis benchmark on my machine results in a throughput of:
145 Kops/s when no eviction occurs
125 Kops/s when 50% eviction occurs (i.e. 1 key over 2 is evicted).
I cannot reproduce the 5 times factor you experienced.
The obvious recommendation to reduce the overhead of eviction is to decrease maxmemory-samples, but it also means a dramatic decrease of the accuracy.
My suggestion would be to give memcached a try. The LRU mechanism is different. It is still not exact (it applies only on a per slab basis), but it will likely give better results that Redis on this use case.
Which version of Redis are you using? The 2.8 version (quite recent) improved the expiration algorithm and if you are using 2.6 you might give it a try.
http://download.redis.io/redis-stable/00-RELEASENOTES

Cassandra running out of memory (heap space)

We are experimenting a bit with Cassandra lately (version 1.0.7) and we seem to have some problems with memory. We use EC2 as our test environment and we have three nodes with 3.7G of memory and 1 core # 2.4G, all running Ubuntu server 11.10.
The problem is that the node we hit from our thrift interface dies regularly (approximately after we store 2-2.5G of data). Error message: OutOfMemoryError: Java Heap Space and according to the log it in fact used all of the allocated memory.
The nodes are under relatively constant load and store about 2000-4000 row keys a minute, which are batched through the Trift interface in 10-30 row keys at once (with about 50 columns each). The number of reads is very low with around 1000-2000 a day and only requesting the data of a single row key. The is currently only one used column family.
The initial thought was that something was wrong in the cassandra-env.sh file. So, we specified the variables 'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1) according to our nodes' specification. We also changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think the second is related to the Garbage Collection). Unfortunately, that did not solve the issue and the node we hit via thrift keeps on dying regularly.
In case you find this useful, swap is off and unevictable memory seems to be very high on all 3 servers (2.3GB, we usually observe the amount of unevictable memory on other Linux servers of around 0-16KB) (We are not quite sure how the unevictable memory ties into Cassandra, its just something we observed while looking into the problem). The CPU is pretty much idle the entire time. The heap memory is clearly being reduced once in a while according to nodetool, but obviously grows over the limit as time goes by.
Any ideas? Thanks in advance.
cassandra-env.sh defaults are perfect for almost all workloads, so until you know why this is happening best to put them back to their defaults or you may be making things worse without realizing.
I see concurrent reads and writes of 2k/sec/node on our cluster, so 2k-4k writes per minute is very little, although the fact that it's only the node accepting your connections that is dying is a little strange.
If you connect your app to the thrift endpoint on one of the other nodes is it then that one that dies?
Client connections use memory so might be worth double checking you're not connecting too many at a time. "netstat -A inet | grep 9160" on the dying cassandra node should tell you how many client connections you have. Depending heavily on your application you'd expect 10s or 100s rather than 1000s.
What do the writes look like?
Are you writing the same row keys repeatedly and if so are you appending new column names or overwriting the same ones?
How big is each write? Anything else you can tell me?
If you're overwriting the same column names in the same row keys constantly compaction may be struggling.
If you're appending new column names to the same row keys constantly you might be growing your rows too large to fit into memory.
the output of "nodetool -h localhost tpstats" on the dying node might also give some clues as to where you're falling down. Anything constantly pending is probably bad news, especially at such a low write rate.
If you're going to use cassandra in production you should get graphing of the internals to better understand what's going on. jmxtrans and graphite should be your new best friends.
There are some things you can try tweaking. First make sure you dont have row caching on your column family. Also worth while checking the log for errors and tpstats incase something died due to an error and something is getting backed up in a queue. The stack trace of the exception could be meaningful too since there are actually different types of OOMs that could just mean kernel tweaks.
If your just using too much memory per node then you want for the size of your data set try checking the cfstats, you can identify roughly how much space is spent on bloom filters. As you have more rows in a CF this can get linearly larger and is part of the base minimum memory your nodes are going to require.
nodetool cfstats | grep Bloom.*Used | awk '{ SUM += $5} END { print SUM " bytes" }'
Since you dont read very often you can probably increase the false positive rate on them. Each SSTable has a bloom filter it uses to check if a row exists in it or not. You can change with cqlsh
ALTER TABLE MyColumnFamily WITH bloom_filter_fp_chance = 0.1;
After that call an upgrade on that CF (this will be slow) per node
nodetool upgradesstables MyKeyspace MyColumnFamily
There are consequences to this where reads may take longer since there is a 10%-ish (the .1) chance it will check SSTables for rows that dont exist in it, resulting in extra disk seeks.
Another major memory sink if you have column families with large amount of rows is the sampling rate of the index. This can be modified per node level in the cassandra.yaml
http://www.datastax.com/docs/1.1/configuration/node_configuration#index-interval
If you have it set up to take heap dumps on OOM (-XX:+HeapDumpOnOutOfMemoryError on by default I believe) there should be some heap dumps available in the /var/lib/cassandra/data directory. You can open these up in visualvm or whatever tool you like to identify what part of the heaps is where.

Resources