Is it possible to save data in both vinyl and memtx at the same time? - tarantool

In my project I want to store some data in fast storage, but some data is too large to be stored in RAM. Can I store some data in memtx and some in vinyl?

Tarantool works just fine with memtx and vinyl spaces under one process. So you just create a space, specify engine type (memtx|vinyl) and you are good to go. See documentation for more info.
box.cfg{}
box.schema.create_space('memtx_test', {engine = 'memtx'})
box.schema.create_space('vinyl_test', {engine = 'vinyl'})
However, there may be complex scenarios. For example, you may want to expire your memtx (hot/fast) data into vinyl (cold/slow) storage and you want to do it atomically. Tarantool doesn't yet support cross-engine transactions so you won't be able to delete a tuple from memtx space and insert it into vinyl space in one transaction. For such use case you may need to build some workaround like maintaining a third space (memtx) for temporary data.

Related

Solutions for Recording Google Protocol Buffer Messages

My project is using protobufs for our data types. We need to be able to record our data so it can be played back later. Our use case is to recreate the event or to reprocess the same data but with new algorithms and check for improvements.
As data is flowing through our system it is all protobufs. These are easily serilized to a byte array which could be recorded to files or maybe as blobs in a database. Playback would simply mean reading the byte array and converting back to a protobuf, then sending it off into our software again.
Are there any existing technologies used for recording protobufs?
Even though the initial use case is very simple, eventually the solution will get more complex. It will probably need to:
Farm out recording to multiple hosts to keep up with the input data rate
Allow querying to find out how much data exists during a specific time period
Play back only those data records where some field has a specific value
Save the data for long term storage, e.g. never delete a record but instead move it to a tape backup
I think the above is best accomplished using a database which stores some subset of meta data along with the protobuf byte array itself. Before I go reinventing the wheel, I would like opinions on anything that exists already that might do this job.

Serializeable In-Memory Full-Text Index Tool for Ruby

I am trying to find a way to build a full-text index stored in-memory in a format that can be safely passed through Marshal.dump/Marshal.load so I can take the index and encrypt it before storing it to disk.
My rationale for needing this functionality: I am designing a system where a user's content needs to be both encrypted using their own key, and indexed for full text searching. I realize there would be significant overhead and memory usage if for each user of the system I had to un-marshal and load the entire index of their content into memory. For this project security is far more important than efficiency.
A full text index would maintain far too many details about a user's content to leave unencrypted, and simply storing the index on an encrypted volume is insufficient as each user's index would need to be encrypted using the unique key for that user to maintain the level of security desired.
User content will be encrypted and likely stored in a traditional RDBMS. My thought is that loading/unloading the serialized index would be less overhead for a user with large amounts of content than decrypting all the DB rows belonging to them and doing a full scan for every search.
My trials with ferret got me to the point of successfully creating an in-memory index. However, the index failed a Marshal.dump due to the use of Mutex. I am also evaluating xapian and solr but seem to be hitting roadblocks there as well.
Before I go any further I would like to know if this approach is even a sane one and what alternatives I might want to consider if its not. I also want to know if anyone has had any success with serializing a full-text index in this manner, what tool you used, and any pointers you can provide.
Why not use a standard full-text search engine and keep each client's index on a separate encrypted disk image, like TrueCrypt? Each client's disk image could have a unique key, it would use less RAM, and would probably take less time to implement.

How is wordweb english dictionary implemented?

We need to keep some in-memory data structure to keep english word dictionary in memory.
When the computer/wordweb starts,we need to read dictionary from disk into an in-memory data structure.
This question asks how do we populate from disk to in-memory data structure in typical real world dictionaries say wordweb?
Ideally we would like to keep dictionary in disk in the way, we require it in in-memory, so that we don't have to spend time building in-memory data structure, we just read it off the disk. But for linked lists, pointers etc, how do we store the same image in disk. Some relative addresses etc would help here?
Typically, is the entire dictionary read and stored in memory. or only part/handlers and leaf page IOs are done, when searching for a specific word.
If somebody wants to help with what that in-memory data structure is typically, please go ahead.
Thanks,
You mentioned pointers, so I'm assuming you're using C++; if that's the case and you want to read directly from disk into memory without having to "rebuild" your data structure, then you might want to look into serialization: How do you serialize an object in C++?
However, you generally don't want to load the entire dictionary anyway, especially if it's a user application. If the user is looking up dictionary words, then reading from disk happens so fast that the user will never notice the "delay." If you're servicing hundreds or thousands of requests, then it might make sense to cache the dictionary into memory.
So how many users do you have?
What kind of load are you expecting to have on the application?
Wordweb is using Sqlite Database at backend. It makes sense to me to use a Database system to store the content so its easier to GET the content which the user is looking for quickly.
Wordweb has Word prediction as well... so it will be a query to database like
select word from table where word='ab%';
on the other hand, when the user presses enter for the word
select meaning from table where word='abandon'
You do not want to be Serializing the content from disk to memory while the user is typing or after he has pressed Enter to search. Since the data will be large (Dictionary), Serialization will probably take time more then the user will tolerate for every word search.
Else why don't you create a JSON format File containing all the meaning creating a short form of Dictionary ?

Effect of a Large CoreData Store on Time Machine

A project I'm working on potentially entails storing large amounts (e.g. ~5GB) of binary data in CoreData. I'm wondering if this would negatively impact the user's Time Machine backup. From reading the documentation it seems that CoreData's persistent store uses a single file (e.g. XML, SQLite DB, etc) so it would seem to me that any time the user changes a piece of data in the datastore Time Machine would copy the data store in its entirety to the backup drive.
Does CoreData offer a different datastore format that is more Time Machine friendly?
Or is their a better way to do this?
You can use configurations in your data model to separate the larger entities into a different persistent store. You will need to create the persistent store coordinator yourself, using addPersistentStoreWithType:configuration:URL:options:error: to add each store with the correct configuration.
To answer your question directly, the only thing I can think of is to put your Core Data store in a sparsebundle disk image, so only the changed bands would be backed up by Time Machine. But really, I think if you're trying to store this much data in SQLite/Core Data you'd run into other problems. I'd suggest you try using a disk-based database such as PostgreSQL.

When do we really need a key/value database instead of a key/value cache server?

Most of the time,we just get the result from database,and then save it in cache server,with an expiration time.
When do we need to persistent that key/value pair,what's the significant benifit to do so?
If you need to persist the data, then you would want a key/value database. In particular, as part of the NoSQL movement, many people have suggested replacing traditional SQL databases with Key/Value pair databases - but ultimately, the choice remains with you which paradigm is a better fit for your application.
Use a key/value database when you are using a key/value cache and you don't need a sql database.
When you use memcached/mysql or similar, you need to write two sets of data access code - one for getting objects from the cache, and another from the database. If the cache is your database, you only need the one method, and it is usually simpler code.
You do lose some functionality by not using SQL, but in a lot of cases you don't need it. Only the worst applications actually leave constraint checking to the database. Ad-hoc queries become impractical at scale. The occasional lost or inconsistent record simply doesn't matter if you are working with tweets rather than financial data. How do you justify the added complexity of using a SQL database?

Resources