How to implement 50M key-value pair memcache with 4 M qps? - performance

The business scenario requires:
50M key-value pairs, 2K each , 100G memory in total.
About 40% of key-value will change in a second.
The Java application need Get() once and set() once for each changed pair, it will be 50M*40%*2=4M qps (query per second) .
We tested memcached - which shows very limited qps.
Our benchmarking is very similar to results showed here
http://xmemcached.googlecode.com/svn/trunk/benchmark/benchmark.html
10,000 around qps is the limitation of one memcached server.
That mean we need 40 partitioned memcached servers in our business scenario- which seems very uneconomic and unrealistic.
In your experience, is the benchmarking accurate in term of memcached’s designed performance?
Any suggestion to tune memcached system(client or server)?
Or any other alternative memory store system that is able meet the requirement more economically?
Many thanks in advance!

If you look at the graphs in the benchmark you talked about you need to realize that the in many of those instances the limit was the network and not memcached. For instance, if you will have 2k values for all of your items then your maximum throughput on a GigE network if about 65k ops/sec. (1024*1024*128/2048=65536). Memcached can do a lot more operations per second than this. I have personally hit 200K ops/sec with (I think) 512b values and I have heard of others getting much higher throughput than I did. This all depends heavily on the network though.
Also, memcached is barely doing anything at 10k ops/sec. My guess is your aren't taking advantage of concurrency in your benchmarks.

Related

Redis vs memcached vs Scylla Cache - Which one to choose?

I'm designing an application where I want to cache million data each around 10kb.. I did some analysis and on the fence between using Redis vs memcached vs Scylla as Cache.. Can some experts suggests which might best suits my needs?
Highly performant
High availability
High Throughput
Low pricing?
Full disclosure - I work on the Scylla project.
I think it is a question of latency and HA vs cost. As a RAM-based system, Redis will be the lowest latency. If you need < 1 millisecond response, then Redis or memcached are the choice.
Scylla is a disk-based system. Those values that are in Scylla's RAM will be low latency, but those that need to pull from disk will be slower. So your 99p latency is likely to be slower. How slow? Depends on your disk. NVME can be 99p 3-5 ms. SSD, maybe 5-10 ms. If this is an acceptable latency, then Scylla will be much less expensive, as even NVME is much cheaper than RAM.
As for HA - Redis and memcached are intended as a cache. While there are some features and frameworks that you can use to replicate data around, these are all bolt-ons and increase complexity. Scylla is a distributed system by design. So the replication to allow for multiple layers of HA is built-in (node, rack and DC-availability)
Redis (and to a lesser extend, memcached) are phenomenal caches. But, depending upon your use case, Scylla might be the right choice.
All three options you mentioned are open-source software, so the pricing is the same - zero :-) However, both Scylla and Redis are written and backed by companies (ScyllaDB and RedisLabs, respectively), so if your use case is mission-critical you may choose to pay these companies for enteprise-level support, you can inquire with these companies what are their prices.
The more interesting difference between the three is in the technology.
You described a use case where you have 10 GB of data in the cache. This amount can be easily held in memory, so a completely in-memory database like Memcached or Redis is a natural choice. However, there are still questions you need to ask yourself, which may lead you to a distributed database, such as Scylla depending on your answers:
Would you be using powerful many-core machines? If so, you should probably rule out Memcached - my experience (and others' - see
Can memcached make full use of multi-core?) suggests that it does not scale well with many cores. On an 8-core machine you will not get anywhere close to 8 times the performance of a one-core machine.
Redis is also not really meant for multi-core use - https://redis.io/topics/benchmarks says that Redis "is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed.". Scylla, on the other hand, thrives on multi-core machines. You should probably test the performance of all three products on your use case before making a decision.
How much of a disaster would be to suddenly lose the entire content of your cache? In some use cases, it just means you would need to query some slightly-slower backend server, so suddenly losing the cache on reboot is acceptable. In such cases, a memory-only cache like Memached or Redis is probably exactly what you need. However, in other cases, there may be a big penalty for starting from scratch with an empty cache - the backend server might be very slow, or maybe the original content is stored on a far-away server with a slow and expensive WAN. In such a case you would want a disk-backed cache, so if the memory cache is lost, you can still refresh it from disk and not from the backend server. Redis has a disk backing option, and in Scylla disk backing is the main way.
You mentioned a working set of 10 GB, which can easily fit memory of a single server. But is it possible this will grow and in a year you'll find yourself needing to cache 100 GB or 1 TB, which no longer fits the memory of a single server? In memcached you'll be out of luck. Redis used to have a "virtual memory" solution for this purpose, but it is deprecated and https://redis.io/topics/virtual-memory now states that Redis is "without considering at least for now the support for databases bigger than RAM". Scylla does handle this issue in two ways. First, your cache would be stored on disk which can be much larger than memory (and whatever amount of memory you have will be used to further speed up that cache, but it doesn't need to fit memory). Second, Scylla is a distributed server. It can distribute a 100 GB working set to 10 different nodes. Redis also has "replication", but it copies the entire data to all nodes - while Scylla can optionally store different subsets of the data on different nodes.
In-memory is actually a bad thing since RAM is expensive and not persistent.
So Scylla will be a better option for K/V or columnar workloads.
Scylla also has a limited Redis api with good results [1], using the CQL
api will result in better results.
[1] https://medium.com/#siddharthc/redis-on-nvme-with-scylladb-5e12afd38dbc

Loading PetaBytes of data at scale

I need to load petabytes of text data into a storage (RAM/SSD) within a second.
Below are some of the question to solve the above problem.
1) Is it practically/theoretically possible to load petabytes of data in a second?
2) What will be the best design approach in order to achive fast loading of petabyte scale data in sub seconds.
3) Any benchmark approach available?.
I am okay to implement with any kind of technologies like Hadoop, spark, HPCC etc...
"petabytes .... within a second". seriously? Please check wikipedia Petabyte: it is 1.000.000 GB!
Also check wikipedia Memory bandwidth. Even the fastest RAM cannot handle more than a few 10 GB / s (in practice this is far lower).
Just curious: what is your use-case?
No, it is not technically possible at this time. Not even RAM memory is fast enough (not to mention the obvious capacity constraints). The fastest SSD (M.2 drives) you can get write speed around 1.2GB/s and with raid 0, you might achieve speeds just around 3GB/s at most. There are also economical constraints, as those drives by themselves are quite expensive. So to answer your question, those speeds are technically impossible at current time.
From HPCC perspective...
Thor is designed to load data and support multiple servers. However the biggest cluster I heard about is about 4000 servers. Thor is designed to load a lot of data over long time (even a week).
In the other hand Roxie is designed to serve data quickly but is not what you are asking for...nor it could serve Petabytes under a second.

Rethinkdb Scalability

How scalable is rethinkdb? Can it be used for TB's of data?
I have around 400 GB's of data, which are bound to increase by 10-25 GB's per week. Any suggestions, would be of great help.
Rethinkdb is very scalable and you should be able to use it with TBs of data no problem, that's what its designed for. As for its stability it should be very stable as of version 2.0 and fully production.
http://www.rethinkdb.com/stability/
Note that a NoSQL-Database does increase the amount of storage required massively, so be sure to have enough disk space on your nodes.
But some terabytes are no issue for RethinkDB, it's designed to handle a nearly infinite amount of data if your nodes have a lot of RAM it also provides nearly Memory-alike performance because the caching algorithms are very good.
You can easily spread out your data on many nodes, by sharding, which also increases the absolute throughput of your cluster, but slightly increases the latency of queries, because of the round trips between the cluster-members.

Redis Vs. Memcached

I am using memcached right now as a LRU cache to cache big data. I've set the max object size to 128 MB (I know this is inefficient and not recommended) and total memcached to 1 GB. But 128 MB is not enough for my purposes so I am planning to move to Redis. A couple questions:
memcached is extremely slow - My current memcached setup is taking 3-4 seconds to return just one request. This is extremely slow. I sometimes need to make up to 30 memcached requests to serve one user request. And just doing this takes 90 seconds!! Am I doing something wrong or is memcached actually this slow?
Redis would be faster? - I plan to use Redis lists to cache the data. I'll fetch full lists using 0 to -1. I hope Redis be faster because I might as well not use any cache if its going to take 90 seconds!
Thanks!
I'd recommend doing a little profiling to see where the bottleneck is. My uninformed guess is that with such large objects, you may be limited by the connection between your app server and memcached and thus you'll see similar results with redis. It could also be that your app is taking a lot of time marshaling and unmarshaling a lot of objects. If it's easy, it might be worth trying a caching scheme where you're just caching the request being sent down to the client (which I'm sure is much less than 128MB).
Another thing to try would be turning on compression. This would give added latency compressing/uncompressing but would reduce network latency if that is indeed the issue.

Duplicate Key Filtering

I am looking for a distributed solution to screen/filter a large volume of keys in real-time. My application generates over 100 billion records per day, and I need a way to filter duplicates out of the stream. I am looking for a system to store a rolling 10 days’ worth of keys, at approximately 100 bytes per key. I was wondering how this type of large scale problem has been solved before using Hadoop. Would HBase be the correct solution to use? Has anyone ever tried a partially in-memory solution like Zookeeper?
I can see a number of solutions to your problem, but the real-time requirement really narrows it down. By real-time do you mean you want to see if a key is a duplicate as its being created?
Let's talk about queries per second. You say 100B/day (that's a lot, congratulations!). That's 1.15 Million queries per second (100,000,000,000 / 24 / 60 / 60). I'm not sure if HBase can handle that. You may want to think about something like Redis (sharded perhaps) or Membase/memcached or something of that sort.
If you were to do it in HBase, I'd simply push the upwards of a trillion keys (10 days x 100B keys) as the keys in the table, and put some value in there to store it (because you have to). Then, you can just do a get to figure out if the key is in there. This is kind of hokey and doesn't fully utilize hbase as it is only fully utilizing the keyspace. So, effectively HBase is a b-tree service in this case. I don't think this is a good idea.
If you relax the restraint to not have to do real-time, you could use MapReduce in batch to dedup. That's pretty easy: it's just Word Count without the counting. You group by the key you have and then you'll see the dups in the reducer if multiple values come back. With enough nodes an enough latency, you can solve this problem efficiently. Here is some example code for this from the MapReduce Design Patterns book: https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch3/DistinctUserDriver.java
ZooKeeper is for distributed process communication and synchronization. You don't want to be storing trillions of records in zookeeper.
So, in my opinion, you're better served by a in-memory key/value store such as redis, but you'll be hard pressed to store that much data in memory.
I am afraid that it is impossible with traditional systems :|
Here is what U have mentioned:
100 billion per days means approximation 1 million per second!!!!
size of the key is 100 bytes.
U want to check for duplicates in a 10 day working set means 1 trillion items.
These assumptions results in look up in a set of 1 trillion objects that totally size in 90 TERABYTES!!!!!
Any solution to this real-time problem shall provide a system that can look up 1 million items per second in this volume of data.
I have some experience with HBase, Cassandra, Redis, and Memcached. I am sure that U cannot achieve this performance on any disk-based storage like HBase, Cassandra, or HyperTable (and add any RDBMSs like MySQL, PostgreSQl, and... to these). The best performance of redis and memcached that I have heard practically is around 100k operations per second on a single machine. This means that U must have 90 machines each having 1 TERABYTES of RAM!!!!!!!!
Even a batch processing system like Hadoop cannot do this job in less than an hour and I guess it will take hours and days on even a big cluster of 100 machines.
U R talking about very very very big numbers (90 TB, 1M per second). R U sure about this???

Resources