I am looking for an in-memory cache solution which can handle big data (<5GB). For a user inputted search term, the database (elasticsearch) will return a large amount of data which the tool will analyze and show via different webpages of the tool. Now my problem is that I want to cache this big data temporarily till the user session gets over so that I don't have to fetch it again from elasticsearch every time the user opens a new page. It will have to be in-memory because disk based will take over a minute which would be very slow.
I initially thought memcached but it has a max limit of 128MB. After reading quite a bit, Redis seems suitable but it is unclear to me whether a bunch of Redis nodes can work in tandem or not. Is it possible to set up a pool of many Redis nodes so that a suitable node will be automatically chosen on SET and the data returned upon GET without me having to specify the node?
TL;DR
Problem: Cache big data (<5GB) in an in-memory cache
Possible solution: Redis
Question: Can I pool a bunch of Redis nodes so that I can fetch a key stored in any of them without specifying a particular node. I don't need to distribute my data since data for a single user will fit into the RAM of a single node.
A Redis Cluster sounds like a good fit for your usecase!
Redis cluster provides a mechanism for data sharding by means of hash slots. These slots are equally distributed over the nodes in your cluster when setting it up.
Whenever you store a value in the cluser, the corresponding hash slot for the given key is calculated and the data is forwarded to the responsible node. And the same way you can afterwards query your data again. So the answer to your question is certainly yes.
However, the max value size per key is 512MB. I'm not sure if I got your storage requirement correctly. I assume 5GB is the estimated total amount over all users.
Checkout the redis cluster tutorial.
You can also look into NCache(.net) / Tayzgrid(java) by Alachisoft,
Both of these solutions provide distributed caching with dynamic clustering which allows to add or remove nodes in cluster at runtime with out losing any data. Also intelligent client makes sure to refer to appropriate node to fetch/store a record against any key.
Related
Apache Ignite has two concepts, one of them is NearCache, and another one is the CacheMode enumaration.
What is the main difference between two concepts?
Near cache is the local hot cache that keeps often accessed data. It significantly speeds up data processing, saving time on network round-trips.
CacheMode defines how your data will be stored. It could be LOCAL for single node, which means data are not distributed in grid. Other two PARTITIONED and REPLICATED means respectively: cache data divided between nodes on some equal parts (called partitions) or each node keeps full data from that cache.
PARTITIONED allows you to keep in grid more data than available in separate machine, REPLICATED gives 100% data survivorability (if all nodes crashed except one - you will not loose your data).
More details you can find in documentation https://apacheignite.readme.io/docs/near-caches and https://apacheignite.readme.io/docs/cache-modes
I have a hazelcast cluster that populates a distributed IMap with data from a separate, remote (REST) service. I want to keep a local copy of the IMap data for HA/DR purposes so I implemented a file based MapStore.
It didn't work out like I expected. I noticed that each node stores what is probably only the items in the partition of that node, which isn't necessarily a problem, but I also noticed that after a restart of all the nodes, the IMap only contains the items from the disk of the first node that starts up.
I couldn't find a good explanation in the docs about how the MapStore is used throughout the lifecycle of the nodes in a cluster. Can someone explain?
The MapStore, as you figured out already, is called on nodes where the partition resides. Since the partition table is randomized on startup there is a very low chance to end up with the same partition distribution as on the last restart.
One way to work around this and still use your implementation is to introduce a distributed filesystem like Ceph or similar to store data files into. That way, no matter how partition table looks after the restart every node could read its partitions.
I am working on Hbase. I have query regarding how Hbase store the data in sorted order with LSM.
As per my understanding, Hbase use LSM Tree for data transfer in large scale data processing. when Data comes from client, it store in-memory sequentially first and than sort and store as B-Tree as Store file. Than it is merging the Store file with Disk B-Tree(of key). is it correct ? Am I missing something ?
If Yes, than in cluster env. there are multiple RegionServers who take the client request. On that case, How all the Hlogs (of each regionServer) merge with disk B-Tree(as existing key spread across the all dataNode disk) ?
Is it like Hlog only merge the data with Hfile of same regionServer ?
You can take a look at this two articles that describe exactly what you want
http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
http://blog.cloudera.com/blog/2012/06/hbase-write-path/
In brief:
The client send data to the region server that is responsible to handle the key
(.META. contains key ranges for each region)
The user operation (e.g. put) is written to the Write-Ahead-Log (WAL, the HLog)
(The log is used just for "safety" if the region server crash the log is replayed to recover data not written to disk)
After writing to the log, data is also written to the MemStore
...once the memstore reach a threshold (conf property)
The memstore is flushed on disk, creating a single hfile
...when the number of hfiles grows too much (conf property) the compaction kicks in (merge)
In terms of on disk data structure:
http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
The article above cover the hfile format...
it's an append only format, and can be seen like a b+tree. (Keeping in mind that this b+tree cannot be modified in place)
The HLog is only used for "safety", once the data is written to the hfiles, the logs can be thrown away
According to LSM-tree model in HBase the data consists of two parts - in-memory tree which contains most recent updates upon the data and disk store tree which arranges the rest part of the data into a form of immutable sequential B-tree located on the hard drive. From time to time HBase service decides that it has enough changes in memory to flush them into file storage. In that case it performs the rolling merge of data from the virtual space to disc, executing an operation similar to merge step of Merge sort algorithm.
In HBase infrastructure such data model is based on several components which organize all data across the cluster as a collections of LSM-trees located on slave servers and driven by the main master service. The system is driven by the following components:
HMaster - primary HBase service which maintains the correct state of slave Region Server nodes by managing and balancing the data among them. Besides it drives the changes of metadata information in the storage, like table or column creations and updates.
Zookeeper - represents a distributed cache used by HBase services and its clients to store reconciled up-to-date information about naming and configurations.
Regional servers - HBase worker nodes which perform the management and storage of pieces of the information in LSM-tree fashion
HDFS - used by Regional servers behind the scene for the actual storage of the data
From Low-level the most part of HBase functionality is located within Regional server which performs the read-write work upon the tables. Every table technically can be distributed across different Regional servers as a collection of of separate pieces called HRegions. Single Regional server node can hold several HRegions of one table. Each HRegion holds a certain range of rows shared between the memory and disc space and sorted by key attribute. These ranges do not intersect between different regions so we can relay on their sequential behavior across the cluster. Individual Regional server HRegion includes following parts:
Write Ahead Log (WAL) file - the first place when data is been persisted on every write operation before getting into Memory. As I've mentioned earlier the first part of the LSM-tree is kept in memory, which means that it can be affected by some external factors like power lose from example. Keeping the log file of such operations in a separate place would allow to restore this part easily without any looses.
Memstore - keeps a sorted collection of most recent updates of the information in the memory. It is the actual implementation of the first part of LMS-tree structure, described earlier. Periodically performs rolling merges into the store files called HFiles on the local hard drives
HFile - represents a small pieces of date received from the Memstore and saved in HDFS. Each HFile contains sorted KeyValues collection and B-Tree+ index which allows to seek the data without reading the whole file. Periodically HBase performs merge sort operations upon these files to make them fit the configured size of standard HDFS block and avoid small files problem
You can walk through these elements manually by pushing the data and passing it through the whole LSM-tree process. I described how to do it in my recent article:
https://oyermolenko.blog/2017/02/21/hbase-as-primary-nosql-hadoop-storage/
I have an Oracle Database with around 7 millions of records/day and I want to switch to MongoDB. (~300Gb)
To setup a POC, I'd like to know how many nodes I need? I think 2 replica of 3 node in 2 shard will be enough but I want to know your thinking about it :)
I'd like to have an HA setup :)
Thanks in advance!
For MongoDB to work efficiently, you need to know your working set size..You need to know how much data does 7 million records/day amounts to. This is active data that will need to stay in RAM for high performance.
Also, be very sure WHY you are migrating to Mongo. I'm guessing..in your case, it is scalability..
but know your data well before doing so.
For your POC, keeping two shards means roughly 150GB on each.. If you have that much disk available, no problem.
Give some consideration to your sharding keys, what fields does it make sense for you to shared your data set on? This will impact on the decision of how many shards to deploy, verses the capacity of each shard. You might go with relatively few shards maybe two or three big deep shards if your data can be easily segmented into half or thirds, or several more lighter thinner shards if you can shard on a more diverse key.
It is relatively straightforward to upgrade from a MongoDB replica set configuration to a sharded cluster (each shard is actually a replica set). Rather than predetermining that sharding is the right solution to start with, I would think about what your reasons for sharding are (eg. will your application requirements outgrow the resources of a single machine; how much of your data set will be active working set for queries, etc).
It would be worth starting with replica sets and benchmarking this as part of planning your architecture and POC.
Some notes to get you started:
MongoDB's journaling, which is enabled by default as of 1.9.2, provides crash recovery and durability in the storage engine.
Replica sets are the building block for high availability, automatic failover, and data redundancy. Each replica set needs a minimum of three nodes (for example, three data nodes or two data nodes and an arbiter) to enable failover to a new primary via an election.
Sharding is useful for horizontal scaling once your data or writes exceed the resources of a single server.
Other considerations include planning your documents based on your application usage .. for example, if your documents will be updated frequently and grow in size over time, you may want to consider manual padding to prevent excessive document moves.
If this is your first MongoDB project you should definitely read the FAQs on Replica Sets and Sharding with MongoDB, as well as for Application Developers.
Note that choosing a good shard key for your use case is an important consideration. A poor choice of shard key can lead to "hot spots" for data writes, or unbalanced shards if you plan to delete large amounts of data.
We're looking for a good solution to a caching problem. We'd like to distribute a relatively small amount of data (perhaps 10's of GBs) among a cluster of web servers such that:
The data is replicated to all nodes
The data is persistent
The data can be accessed locally
Our motivation for a caching solution is that we currently have a single point of failure: a SQL Server database. We're unable to set up a fail-over cluster for this database, unfortunately. We're already using Memcached to a large extent, but we want to avoid the problem where if a Memcached node goes down, we'd suddenly have a large amount of cache misses and therefore experience a massive amount of requests to one endpoint.
We'd prefer instead to have local persistent caches on each web server node so that the resulting load would be distributed. When a retrieval is made, it would pass through the following:
Check for data in Memcached. If it's not there...
Check for data in local persistent storage. If it's not there...
Retrieve data from the database.
When data changes, the cache key is invalidated at both caching layers.
We've been looking at a bunch of potential solutions, but none of them seem to match exactly what we need:
CouchDB
This is pretty close; the data model we'd like to cache is very document-oriented. However, its replication model isn't exactly what we're looking for. It seems to me as though replication is an action you have to perform rather than a permanent relationship among nodes. You can set up continuous replication, but this doesn't persist between restarts.
Cassandra
This solution seems to be mostly geared toward those with large storage requirements. We have a large amount of users, but small amounts of data. Cassandra looks to be able to support n number of fail-over nodes, but 100% replication among nodes doesn't seem to be what it's intended for; instead, it seems more geared toward distribution only.
SAN
One attractive idea is that we can store a bunch of files on a SAN or similar type of appliance. I haven't worked with these before, but it seems like this would still be a single point of failure; if the SAN goes down, we'd suddenly be going to the database for all cache misses.
DFS Replication
A simple Google search revealed this. It seems to do what we want; it synchronizes files across all nodes in a replication cluster. But the marketing text makes it look like it's more of a system for ensuring documents are copied to different office locations. Also, it has limits, like a file count maximum, that wouldn't work well for us.
Have any of you had similar requirements to ours and found a good solution that meets your needs?
We've been using Riak successfully in production for several months now for a problem that's somewhat similar to what you describe. We too have evaluated CouchDB and Cassandra before.
The advantage of Riak in this sort of problems imo is that distribution and data replication are at the core of the system. You define how many replicas of the data across the cluster you want and it takes care of the rest (it's a bit more complicated than that of course, but that's the essence). We went through adding nodes, removing nodes, had nodes crush, and it's proven surprisingly resilient.
It's a lot like Couch in other matters - document oriented, REST interface, Erlang.
You can check the hazelcast.
It does not persist the data but provides a fail-over system. Each node can have a number of nodes to backup it's data in case a node fails.