Firebase with classic data structures - data-structures

Firebase allows you to store data in remote JSON tree and it can be nested up to 32 levels.
It's cool, but is there any way (or service) to store your data in lists, sets or hashes like Redis does, but remotely, like Firebase does?

A list is a collection of ordered data? If so: see Firebase's documentation on saving lists of data. If you're used to arrays, you might want to read these two blog posts on arrays in Firebase and real-time synchronized arrays too.
In JSON (and thus in Firebase) any associative array is essentially a set: you can associate one value with each key. So I'd map sets to regular Firebase set operations.
As you may see: there are quite some links to the Firebase guide in the above.

Related

Is it bad practice to store JSON members with Redis GEOADD?

My application should handle a lot of entities (100.000 or more) with location and needs to display them only within a given radius. I basically store everything in SQL but using Redis for caching and optimization (mainly GEORADIUS).
I am adding the entities like the following example (not exactly this, I use Laravel framework with the built-in Redis facade but it does the same as here in the background):
GEOADD k 19.059982 47.494338 {\"id\":1,\"name\":\"Foo\",\"address\":\"Budapest, Astoria\",\"lat\":47.494338,\"lon\":19.059982}
Is it bad practice? Or will it make a negative impact on performance? Should I store only ID-s as member and make a following query to get the corresponding entities?
This is a matter of the requirements. There's nothing wrong with storing the raw data as members as long as it is unique (and it unique given the "id" field). In fact, this is both simple and performant as all data is returned with a single query (assuming that's what actually needed).
That said, there are at least two considerations for storing the data outside the Geoset, and just "referencing" it by having members reflect some form of their key names:
A single data structure, such as a Geoset, is limited by the resources of a single Redis server. Storing a lot of data and members can require more memory than a single server can provide, which would limit the scalability of this approach.
Unless each entry's data is small, it is unlikely that all query types would require all data returned. In such cases, keeping the raw data in the Geoset generates a lot of wasted bandwidth and ultimately degrades performance.
When data needs to be updated, it can become too expensive to try and update (i.e. ZDEL and then GEOADD) small parts of it. Having everything outside, perhaps in a Hash (or maybe something like RedisJSON) makes more sense then.

How to store two different cache "tables" in Redis under the same database/index?

Trying to build a data set of two cache tables (which are currently stored in SQL Server) - one is the actual cache table (CacheTBL); the other is the staging table (CacheTBL_Staging).
The table structure has two columns - "key", "value"
So I'm wondering how to implement this in Redis as I'm a total noob to this NoSQL stuff. Should I use a SET or LIST? Or something else?
Thank you so much in advance!
You need to decide whether you want separate REDIS keys for all entries using SET and GET, or put them into hashes with HSET and HGET. If you use the first approach, your keys should include a prefix to distinguish between main and staging. If you use hashes, this is not necessary, because the hash name can also be used to distinguish these. You probably also need to decide how you want to check for cache validity, and what your cache flushing strategy should be. This normally requires some additional data structures in REDIS.

How can I enumerate all keys in our Redis database?

We have a huge Redis database containing about 100 million keys, which maps phone numbers to hashes of data.
Once in a while all this data needs to be aggregated and saved to an SQL database. During aggregation we need to iterate over all the stored keys, and take a look at those arrays.
Using Redis.keys is not a good option because it will retrieve and store the whole list of keys in memory, and it take a loooong time to complete. We need something that will give back an enumerator that can be used to iterate over all the keys, like so:
redis.keys_each { |k| agg(k, redis.hgetall(k)) }
Is this even possible with Redis?
This would prevent Ruby from constructing an array of 100 million elements in memory, and would probably be way faster. Profiling shows us that using the Redis.keys command makes Ruby hog the CPU at 100%, but the Redis process seems to be idle.
I know that using keys is discouraged against building a set from the keys, but even if we construct a set out of the keys, and retrieve that using smembers, we'll be having the same problem.
Incremental enumeration of all the keys is not possible with the current Redis version.
Instead of trying to extract all the keys of a live Redis instance, you could just dump the database (bgsave) and convert the resulting dump to a json file, to be processed with any Ruby tool you want.
See https://github.com/sripathikrishnan/redis-rdb-tools
Alternatively, you can use the redis-rdb-tools API to directly write a parser in Python and extract the required data (without generating a json file).

What's the differences between Object Storage and Key-Value Database?

It seems a same thing from users aspect.
A key-value database does not care about the contents or format of the value. It just allows you to store stuff under keys, and get it back again, and iterate keys.
Object Storage or Document Databases can look at the contents of the data you store in them, and allow you to query or index on something other than the key.
The would be one distinction to draw. But googling around for Object Storage, it seems that this is a rather ill-defined buzzword.

How to sort/order data?

I've already experiences with MongoDB, CouchDB, Redis, Tokyo Cabinet, and other NoSQL Databases. Recently I stumbled upon Riak and it looks very interesting to me. To getting started with it, I decided to write a small Twitter clone, the "hello world" in the NoSQL World. To get a fully working clone, it's necessary to order the tweets chronologically. After reading the Riak docs I discovered that Map-Reduce is the right tool for this job. In my development-environment it works quite well, but how's the performance in production, with hundreds of parallel queries? Are there other, maybe faster, methods for sorting data, or is it possible to store data in an ordered form (like Cassandra)?
I think I've found another solution to this problem - a simple linked list. So one possible implementation could be, that every user gets his/her own "timeline bucket", where links to the tweets-data itself gets stored (tweets gets stored separately in the "tweets" bucket). As you would know, this timeline-bucket must contain a key named "first", which links to the latest timeline-object and is the starting point of the list. To insert a new tweet in the timeline, just insert a new item in the timeline bucket, set the "next"-link of this new item to the "first"-item, after that, make the new item to "first".
In short: Insert an item as you would do in a linked list...
As with Twitter, the personal timeline just holds 20 tweets shown to the user. To receive the last 20 tweets, there are only 2 queries necessary. To speed things up, the first query uses the link-walking ability of Riak to get the latest 20 objects, tagged by "next". Finally, the second, and last query uses the keys computed by the first query to receive the tweets itself (using map/reduce).
To remove the tweets of users you've just unfollowed, I would use the secondary index ability of Riak 1.0 to receive the related timeline-objects/tweets.
It is not possible to store data in an ordered form in Riak without resorting to re-writing portions of the Riak core. Data is stored, roughly, in bucket + key order. The actual order depends on the backend storage mechanism that you're using for Riak.
Riak 1.0 has some features that might help you, too. There's support for secondary indexes as well as improvements to Map Reduce operations - in particular, they perform much better in highly concurrent scenarios.
Alexander Siculars wrote an article about Pagination with Riak. It outlines the problem pretty well. Yammer also make extensive use of Riak and two of their engineers put together a presentation about Riak at Yammer. It doesn't go into a lot of implementation details, but you can learn a lot about how they have designed their solution.
Combining secondary index queries and Map Reduce makes it possible to solve your problem very easily.
As Jeremiah says it's not possible to store the data in sorted order, but you can still make it return sorted results by using secondary indexes and map/reduce. The problem, as described, is that you can't efficiently limit the query in a sorted way.
Here is an example using range query to list all keys and then sorting them using the built in functions in *riak_kv_mapreduce*::
{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087),
riakc_pb_socket:mapred(Pid
, {index, colonel_riak:bucket(context), <<"$key">>, <<0>>, <<255>>}
, [{reduce, {modfun, riak_kv_mapreduce, reduce_sort}, none, true}])
You can use functions in the lists module in erlang or use the native javascript sort function. Order by can be achieved by lists:reverse/1 in erlang.

Resources