I am looking to deploy a single node cassandra on aws m4.large instance. Our use case is more read oriented i.e there will be much more reads than writes. We have around 1 gb data now. Now i am wondering about latency of each read and write? Also how many concurrent read a single node can handle? And i am very confused about when to scale i.e have another node deployed. Does this depend solely on data size or we have to scale if read write request reaches a certain limit?
Casssandra can handle quite an amount of requests per node. You will want to look at cassandra-stress (https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html) and YCSB (https://github.com/brianfrankcooper/YCSB/wiki) for some testing.
Cassandra can be scaled out to hande more data (more disk space, same replication factor), or to handle more requests (more replicas) or even both.
1GB of data is such a small value you an m4.large value can keep all your data in memory - if you need really low latency you can enable the row cache with a proper value of row_cache_size_in_mb and maybe row_cache_save_period (see https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSetCaching.html for caching details). Then all your data will be cached in memory all the time and you can get really low latencies. Fast disks (io1) will also give you lower latency.
But try for yourself with some testing.
Related
I want to understand what is the benefit of running an in-memory cache instance on a separate server to lookup data in distributed caching. The application server will have to make a network call to get the data from Cache. Isn't network call adding to the latency while reading the data ? Wouldn't it make more sense to get the data directly from the database instance ?
Network calls are an order of magnitude faster than disk look-ups (less than 100 micros seconds RTT within adata center). Look-up from memory is also fairly fast (10-20 micro seconds per read). On other hand, databases often have to read from the disk and they maintain extra transaction meta-data and locks.
So caches provide higher throughput as well as better latencies. The final design depends on the type of databases and data access scenarios.
I am working on an application having web job and azure function app. Web job generates the redis cache for function app to consume. Cache size is around 10 Mega Bytes. I am using lazy loading and all as per the recommendation. I still find that the overall cache operation is slow. Depending upon the size of the file i am processing, i may end up calling Redis cache upto 100,000 times . Wondering if I need to hold the cache data in a local variabke instead of reading it every time from redis. Has anyone experienced any latency in accessing Redis? Does it makes sense to create a singletone object in c# function app and refresh it based on some timer or other logic?
could you consider this points in your usage this is some good practices of azure redis cashe
Redis works best with smaller values, so consider chopping up bigger data into multiple keys. In this Redis discussion, 100kb is considered "large". Read this article for an example problem that can be caused by large values.
Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
Remember that Redis is an In-Memory data store. so that you are aware of scenarios where data loss can occur.
Reuse connections - Creating new connections is expensive and increases latency, so reuse connections as much as possible. If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).
Locate your cache instance and your application in the same region. Connecting to a cache in a different region can significantly increase latency and reduce reliability. Connecting from outside of Azure is supported, but not recommended especially when using Redis as a cache (as opposed to a key/value store where latency may not be the primary concern).
Redis works best with smaller values, so consider chopping up bigger data into multiple keys.
Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions, especially for write-heavy workloads or if you are storing larger values (100KB or more) in Redis. I would recommend starting with 10% of the size of your cache, then increase if you have write-heavy loads. See some considerations when selecting a value.
Avoid Expensive Commands - Some redis operations, like the "KEYS" command, are VERY expensive and should be avoided.
Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load, use an even larger value. If you use a large number of connections in a single application, consider adding some type of staggered reconnect logic to prevent a flood of connections hitting the server at the same time.
Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over? I believe row_cache_size_in_mb is supposed to cache frequently used records in memory, but setting it to say 10MB seems to make no difference.
I tried cassandra-stress of course, but the highest read throughput it achieves with 1KB records (-col size=UNIFORM\(1000..1000\)) is ~15K/s.
With low numbers like above, I can easily write an in-memory hashmap based cache that will give me at least a million reads per second for a small working set size. How do I make cassandra do this automatically for me? Or is it not supposed to achieve performance close to an in-memory map even for a tiny working set size?
Can someone point me to cassandra client code that can achieve a read throughput of at least hundreds of thousands of reads/s if I keep reading the same record (or even a small number of records) over and over?
There are some solution for this scenario
One idea is to use row cache but be careful, any update/delete to a single column will invalidate the whole partition from the cache so you loose all the benefit. Row cache best usage is for small dataset and are frequently read but almost never modified.
Are you sure that your cassandra-stress scenario never update or write to the same partition over and over again ?
Here are my findings: when I enable row_cache, counter_cache, and key_cache all to sizable values, I am able to verify using "top" that cassandra does no disk I/O at all; all three seem necessary to ensure no disk activity. Yet, despite zero disk I/O, the throughput is <20K/s even for reading a single record over and over. This likely confirms (as also alluded to in my comment) that cassandra incurs the cost of serialization and deserialization even if its operations are completely in-memory, i.e., it is not designed to compete with native hashmap performance. So, if you want get native hashmap speeds for a small-working-set workload but expand to disk if the map grows big, you would need to write your own cache on top of cassandra (or any of the other key-value stores like mongo, redis, etc. for that matter).
For those interested, I also verified that redis is the fastest among cassandra, mongo, and redis for a simple get/put small-working-set workload, but even redis gets at best ~35K/s read throughput (largely independent, by design, of the request size), which hardly comes anywhere close to native hashmap performance that simply returns pointers and can do so comfortably at over 2 million/s.
I'm trying to figure out how I determine the IOPS my application is driving so I can property size our cloud infrastructure components. I understand what IOPS are between a database and the storage layer but I'd like to understand how I go about calculating what my application drives. Here are some of my applications characteristics:
1) 90% write and 10% read
2) We have a java based application that ultimately inserts into an HBase database
3) Process about 50 msg/sec where each message results in probably 2 HBase inserts
Here is what I'm not sure about:
1) Is the only way to calculate the IOPS is by running iostat or something on the actual server during load?
2) Is there a general way I can calculate what needed from the data volume/size coming in and not on the actual storage unit?
3) Is there any relationship to the # of transactions and the # of bytes in each transaction (I read somewhere an IO is usually 3K, most inserts don't contain that much info so it doesn't matter).
Any help would be greatly appreciated.
Not very familiar with Hbase. But from the documentation, it uses a log structure, which means the writes will be sequential writes. It also has compactions, which will cause both sequential reads and writes of multi-MB. The read queries will cause random reads on the storage layer.
So here is the answer to your questions:
As far as I know, yes. The only way to get IOPS is running iostat. You can probably get some compaction stats from the application level. But it is hard to extract IOPS level details.
Compaction will cause more storage than the entire data size. And if your application is write heavy(compaction might not catch up with the speed of inserts), the size of actual data volume will be much larger. Given the 50 msg/sec in your question, this should not be the case. I will provision disks double the size of expected data volume per instance.
As mentioned above, Hbase is log structured. Writes are accumulated in memory and flushed to disk together. So it doesn't matter the size of each transaction.
I setup 3 nodes of Cassandra (1.2.10) cluster on 3 instances of EC2 m1.xlarge.
Based on default configuration with several guidelines included, like:
datastax_clustering_ami_2.4
not using EBS, raided 0 xfs on ephemerals instead,
commit logs on separate disk,
RF=3,
6GB heap, 200MB new size (also tested with greater new size/heap values),
enhanced limits.conf.
With 500 writes per second, the cluster works only for couple of hours. After that time it seems like not being able to respond because of CPU overload (mainly GC + compactions).
Nodes remain Up, but their load is huge and logs are full of GC infos and messages like:
ERROR [Native-Transport-Requests:186] 2013-12-10 18:38:12,412 ErrorMessage.java (line 210) Unexpected exception during request java.io.IOException: Broken pipe
nodetool shows many dropped mutations on each node:
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
BINARY 0
READ 2
MUTATION 4072827
_TRACE 0
REQUEST_RESPONSE 1769
Is 500 wps too much for 3-node cluster of m1.xlarge and I should add nodes? Or is it possible to further tune GC somehow? What load are you able to serve with 3 nodes of m1.xlarge? What are your GC configs?
Cassandra is perfectly able to handle tens of thousands small writes per second on a single node. I just checked on my laptop and got about 29000 writes/second from cassandra-stress on Cassandra 1.2. So 500 writes per second is not really an impressive number even for a single node.
However beware that there is also a limit on how fast data can be flushed to disk and you definitely don't want your incoming data rate to be close to the physical capabilities of your HDDs. Therefore 500 writes per second can be too much, if those writes are big enough.
So first - what is the average size of the write? What is your replication factor? Multiply number of writes by replication factor and by average write size - then you'll approximately know what is required write throughput of a cluster. But you should take some safety margin for other I/O related tasks like compaction. There are various benchmarks on the Internet telling a single m1.xlarge instance should be able to write anywhere between 20 MB/s to 100 MB/s...
If your cluster has sufficient I/O throughput (e.g. 3x more than needed), yet you observe OOM problems, you should try to:
reduce memtable_total_space_mb (this will cause C* to flush smaller memtables, more often, freeing heap earlier)
lower write_request_timeout to e.g. 2 seconds instead of 10 (if you have big writes, you don't want to keep too many of them in the incoming queues, which reside on the heap)
turn off row_cache (if you ever enabled it)
lower size of the key_cache
consider upgrading to Cassandra 2.0, which moved quite a lot of things off-heap (e.g. bloom filters and index-summaries); this is especially important if you just store lots of data per node
add more HDDs and set multiple data directories, to improve flush performance
set larger new generation size; I usually set it to about 800M for a 6 GB heap, to avoid pressure on the tenured gen.
if you're sure memtable flushing lags behind, make sure sstable compression is enabled - this will reduce amount of data physically saved to disk, at the cost of additional CPU cycles