There is sorted set functionality in tarantool? - tarantool

I'm starting work on project which will require to do many work with sorted sets. I need to keep some sets sorted and do CRUDs as fast as possible, there is any tarantool functionality that allows to insert data to sorted set like redis ZADD function? Or i have to sort data on my own (using C or lua scripts) or maybe sorted selects from tarantool is fast enough? Please give me some opinions or advices

In Tarantool, TREE index automatically sorts your data. Create a simple space with TREE primary key on the first field. You can store any json data in the second or third, fourth, ... field, or you can then format the space to reflect your schema and set values will conform to the schema, just like in a relational database.

Related

HBASE versus HIVE: What is more suitable for data that is uniquely defined by multiple fields?

We are building a DB infrastructure on the top of Hadoop systems. We will be paying a vendor to do that and I do not think we are getting the right answers from the first vendor. So, I need the help from some experts to validate if I am right or I miss something
1. We have about 1600 fields in the data. A unique record is identified by those 1600 records
We want to be able to search records in a particular timeframe
(aka, records for a given time frame)
There are some fields that change overtime (monthly)
The vendor stated that the best way to go is HBASE and that they have to choices: (1) make the search optimize for machine learning (2) make adhoc queries.
The (1) will require a concatenate key with all the fields of interest. The key length will determine how slow or fast the search will run.
I do not think this is correct.
1. We do not need to use HBASE. We can use HIVE
2. We do not need to concatenate field names. We can translate those to a number and have a key as a number
3. I do not think we need to choose one or the other.
Could you let me know what you think about that?
It all depends on what is your use case. In simpler terms, Hive alone is not good when it comes to interactive queries however one of the best when it comes to analytics.
Hbase on the other hand, is really good for interactive queries, however doing analytics would not be that easy as hive.
We have about 1600 fields in the data. A unique record is identified by those 1600 records
HBase
Hbase is a NoSQL, columner database which stores information in Map(Dictionary) like format. Where each row needs to have one column which uniquely identifies the row. This is called key.
You can have key as a combination of multiple columns as well if you don't have a single column which can uniquely identifies the row. And then you can search record using partial key. However this is going to affect the performance ( as compared to have single column key).
Hive:
Hive has a SQL like language (HQL) to Query HDFS, which you can use for analytics. However, it doesn't require any primary key so you can insert duplicate records if required.
The vendor stated that the best way to go is HBASE and that they have
to choices: (1) make the search optimize for machine learning (2) make
adhoc queries. The (1) will require a concatenate key with all the
fields of interest. The key length will determine how slow or fast the
search will run.
In a way your vendor is correct, as I explained earlier.
We do not need to use HBASE. We can use HIVE 2. We do not need to concatenate field names. We can translate those to a number and have a key as a number 3. I do not think we need to choose one or the other.
Weather you can use HBASE or Hive is depends on your use case. However, if you are planning to use Hive then you don't even need to generate a pseudo key (row numbers you are talking about)
There is one more option if you have hortonworks deployment. Consider Hive for analytics and LLAP for interactive queries.

How to store two different cache "tables" in Redis under the same database/index?

Trying to build a data set of two cache tables (which are currently stored in SQL Server) - one is the actual cache table (CacheTBL); the other is the staging table (CacheTBL_Staging).
The table structure has two columns - "key", "value"
So I'm wondering how to implement this in Redis as I'm a total noob to this NoSQL stuff. Should I use a SET or LIST? Or something else?
Thank you so much in advance!
You need to decide whether you want separate REDIS keys for all entries using SET and GET, or put them into hashes with HSET and HGET. If you use the first approach, your keys should include a prefix to distinguish between main and staging. If you use hashes, this is not necessary, because the hash name can also be used to distinguish these. You probably also need to decide how you want to check for cache validity, and what your cache flushing strategy should be. This normally requires some additional data structures in REDIS.

Data structure functioning like Database in C or C++

Is there a data structure which gives you functions of a database (like insert, update, delete etc)? For example:
create a struct like the database table
store data on it and query on it
selectively delete it
I know that with a hashtable you can do this (ex: uthash library). But as far as I know updating one column element only is not easy in a hash table.
Look at sqlite. Rather than a relational database system, it is essentially a connectionless, file-based database library that supports SQL. You link your program against it and it provides functions to perform SQL queries over data files.
Look At NoSQL itis The RMDBS used By FaceBook
Use C structs to represent rows of data and then trees (or maybe hashes) for indexes. There are a lot of little problems you will need to solve, specially in order to make all the operations efficient, but this forms the basis for an in-memory table.
For simple things, a tree structure may be enough.

Best-performing method for associating arbitrary key/value pairs with a table row in a Postgres DB?

I have an otherwise perfectly relational data schema in place for my Postgres 8.4 DB, but I need the ability to associate arbitrary key/value pairs with several of my tables, with the assigned keys varying by row. Key/value pairs are user-generated, so I have no way of predicting them ahead of time or wrangling orderly schema changes.
I have the following requirements:
Key/value pairs will be read often, written occasionally. Reads must be reasonably fast.
No (present) need to query off of the keys or values. (But it might come in handy some day.)
I see the following possible solutions:
The Entity-Attribute-Value pattern/antipattern. Annoying, but the annoyance would be generally offset by my ORM.
Storing key/value pairs as serialized JSON data on a text column. A simple solution, and again the ORM comes in handy, but I can kiss my future self's need for queries good-bye.
Storing key/value pairs in some other NoSQL db--probably a key/value or document store. ORM is no help here. I'll have to manage the separate queries (and looming data integrity issues?) myself.
I'm concerned about query performance, as I hope to have a lot of these some day. I'm also concerned about programmer performance, as I have to build, maintain, and use the darned thing. Is there an obvious best approach here? Or something I've missed?
That's precisely what the hstore datatype is for in PostgreSQL.
http://www.postgresql.org/docs/current/static/hstore.html
It's really fast (you can index it) and quite easy to handle. The only drawback is that you can only store character data, but you'd have that problem with the other solutions as well.
Indexes support "exists" operator, so you can query quite quickly for rows where a certain key is present, or for rows where a specific attribute has a specific value.
And with 9.0 it got even better because some size restrictions were lifted.
hstore is generally good solution for that, but personally I prefer to use plain key:value tables. One table with definitions, other table with values and relation to bind values to definition, and relation to bind values to particular record in other table.
Why I'm against hstore? Because it's like a registry pattern. Often mentioned as example of anti pattern. You can put anything there, it's hard to easy validate if it's still needed, when loading a whole row (in ORM especially), the whole hstore is loaded which can have much junk and very little sense. Not mentioning that there is need to convert hstore data type into your language type and convert back again when saved. So you get some overhead of type conversion.
So actually I'm trying to convert all hstores in company I'm working for into simple key:value tables. It's not that hard task though, because structures kept here in hstore are huge (or at least big), and reading/writing an object crates huge overhead of function calls. Thus making a simple task like that "select * from base_product where id = 1;" is making a server sweat and hits performance badly. Want to point that performance issue is not because db, but because python has to convert several times results received from postgres. While key:value is not requiring such conversion.
As you do not control data then do not try to overcomplicate this.
create table sometable_attributes (
sometable_id int not null references sometable(sometable_id),
attribute_key varchar(50) not null check (length(attribute_key>0)),
attribute_value varchar(5000) not null,
primary_key(sometable_id, attribute_key)
);
This is like EAV, but without attribute_keys table, which has no added value if you do not control what will be there.
For speed you should periodically do "cluster sometable_attributes using sometable_attributes_idx", so all attributes for one row will be physically close.

Is there anything like memcached, but for sorted lists?

I have a situation where I could really benefit from having system like memcached, but with the ability to store (per each key) sorted list of elements, and modifying the list by addition of values.
For example:
something.add_to_sorted_list( 'topics_list_sorted_by_title', 1234, 'some_title')
something.add_to_sorted_list( 'topics_list_sorted_by_title', 5436, 'zzz')
something.add_to_sorted_list( 'topics_list_sorted_by_title', 5623, 'aaa')
Which I then could use like this:
something.get_list_size( 'topics_list_sorted_by_title' )
// returns 3
something.get_list_elements( 'topics_list_sorted_by_title', 1, 10 )
// returns: 5623, 1234, 5436
Required system would allow me to easily get items count in every array, and fetch any number of values from the array, with the assumption that the values are sorted using attached value.
I hope that the description is clear. And the question is relatively simple: is there any such system?
Take a look at MongoDB. It uses memory mapped files, so is incredibly fast and should perform at a comparative level to MemCached.
MongoDB is a schema-less database that should support what you're looking for (indexing/sorting)
Redis supports both lists and sets. You can disable disk saving and use it like Memcached instead of going for MongoDB which would save data to disk.
MongoDB will fit. What's important it has indexes, so you can add an index by title for topics collection and then retrieve items sorted by the index:
db.topics.ensureIndex({"title": 1})
db.topics.find().sort({"title": 1})
why not just store an array in memcached? at least in python and PHP the memcached APIs support this (i think python uses pickle but i dont recall for sure).
if you need permanent data storage or backup, memcacheDB uses the same API.
basic pseudopython example:
getting stored data
stored=cache.get(storedDataName)
initialize list if you have not stored anything previously
if(stored==None):
stored={}
----------------
finding stored items
try:
alreadyHaveItem=stored[itemKey]
except KeyError:
print 'no result in cached'
----------------
adding new items
for item in newItemsDict:
stored[item]=newItems[item]
----------------
saving the results in cache
cache.set(storedDataName,stored,TTL)

Resources