LevelDB: Iterate keys by insertion order - leveldb

What's a good strategy for generating auto-incrementing keys in LevelDB? My goal is to be able to iterate over the keys in the order that they were inserted.

two methods:
use the default comparator, but use a function to convert the index key '1' to something like '000000001', convert '20' to '000000020', so leveldb will place them near each other;
self define a new comparator, which convert the key from type string to type integer, then you can compare the integer.
with any of the above 2 methods, you need to store a key-value pair in the leveldb: current_id ----> integer, or you can store the current id in a new file using mmap.
then, with yourself defined Add() function, after you get the current id from key current_id you can insert a new key-value pair: id ----> value, then you can update the current_id to plus one.

Since a LevelDB instance can only be accessed from one application at a time, you might as well use a 64-bit long and increment it in the application. When opening the DB (and before allowing any writes), to find the last inserted key you can use the SeekToLast() method of the Iterator.

As I just pointed out in a question on integer keys, if you want to use binary integers you need to create a custom Comparator for the database, otherwise you don't get them in ascending binary order. It's not hard but you may have overlooked the need.
I'm not quite sure what you're asking. If the only data you are adding is keys which are supposed to record an entry as a log then yes, just use an integer key.
However, if you are inserting keys you are going to search for some other reason PLUS you want to later iterate them in insertion order, it gets a bit more complex.
Basically you want to insert two keys for each key value, using a prefix to determine whether keys are "value keys" or "ordering keys". e.g., say you have Frank, John, Sally and Amy as keys and use prefix ~N for Name keys and ~I for Iterator keys.
The database looks like the following, note that the "Iterator keys" don't have a value associated with them as we can just get the names out of the key. I've shown it as if you used a string of two digits for the number, rather than using an integer value and needing a special Comparator.
~I00Frank
~I01John
~I02Sally
~I03Amy
~NAmy => Amy's details
~NFrank => frank's details
~NJohn => John's details
~NSally => Sally's details

Related

Redis : Get all keys by providing one of the value in the values list

In redis I'm planning to store key as a unique string and value will be a list.
I have a use case where I need to do 2 things.
First, I need to get all the values associated with a key by providing the key as input.
Second, I want to get all the keys associated with a value by providing one of the value in the values list.
Second part is where I need the advice, how we can achive this ?
I cannot get all the keys or key value pair and loop through because I will have millions of entries in Redis.
As mentioned in the comment above the retrieving of all keys with associated value at will probably sometimes create a performance issue as this will be a run through large entries.As also suggested in the official documentation about retrieving data from the memory caches you can try and use the following Redis command to get the value and see if that is what can solve your purpose.
GET
MGET

hash table index design

I want to use a hash table to store words.
For example, I have two words aba and aab ,because they are made up of the same elements just in different order , so I want to store them with the same index, and plug a link list at that link list. It's easy for me to search in a certain way. The elements of words are just 26 letters. How to design a proper index of the hash table? How to organize the table?
So the questions you want to answer with your hash table is: What words can be built with the letters I have?
I assume you are reading some dictionary and want to put all the values in the hash table. Then you could use an int array with the count how many times each letter occurs as key (e.g. 'a' would be index 0 and 'z' index 25) and for the value you would have to use a list, so that you can add more than one word to that entry.
But the simplest solution is probably just to use the sorted word as key (e.g. 'aba' gets key 'aab' and 'aab' obviously too), because words are not very long the sort isn't expensive (avoid creating new strings all the time by working with the character array).
So in Java you could get the key like this:
char[] key = word.toCharArray();
Arrays.sort(key);
// and if you want a string
String myKey = new String(key);

Alternative to ORA_HASH?

We are working with a table in a 3rd party database that does not have a primary key but does have a unique index.
I have therefore been looking at using the ORA_HASH function to produce a de facto unique Id by passing in the values of the columns in the unique index.
Unfortunately, I can already see that we have a few collisions, which means that we can't derive a unique id using this method.
Is there an alternative to ORA_HASH that would provide a unique id for a unique input?
I suppose I could generate an Id using DBMS_CRYPTO.Hash but I'd ideally like to get a numeric value.
Edit
The added complication is that I then need to store these records in another (SQL Server) database and then compare the records from the original and the replica tables. So rank doesn't help me here since records can be added or deleted in the original table.
DBMS_CRYPTO.HASH could be used to generate a high-bit hash (high enough to give you a very low, but not zero, chance of collisions), but it returns 'RAW' not 'NUMBER'.
To guarantee no collisions ever, you need a one-to-one hash function. As far as I know, Oracle does not provide one.
A practical approach would be to create a new table to map unique keys to a newly generated primary key. E.g., unique value ("ABC",123, 888) maps to 838491 (where you generated 838491 using a sequence).
You'd have to update the mapping table periodically, to account for inserted rows, and that would be a pain, but it would let you generate your own PKs and keep track of them without a lot of complication.
Have you tried:
DBMS_UTILITY.GET_HASH_VALUE (
name VARCHAR2,
base NUMBER,
hash_size NUMBER)
RETURN NUMBER;

How does RethinkDB generate auto ids?

I'm writing a script which supposed to merge some data from sql-based db. Each row has a long-integer as a primary key (incremental). I was thinking about hashing these ids so that they'll somehow 'look' like the other ids already in my RethinkDB table. What I'm trying to achive here is to avoid dups in case of an attempt to merge the same data again, but keeping the original integers as ids along with the generated ids of the data saved directly to RethinkDB's table feels weird.
Can I do that?
How does RethinkDB generate auto ids anyways?
And am I approaching this correctly..?
RethinkDB uses a string-encoding of 128 bit UUIDs (basically hashed integers).
The string format looks like this: "HHHHHHHH-HHHH-HHHH-HHHH-HHHHHHHHHHHH" where every 'H' is a hexadecimal digit of the 128 bit integer. The characters 0-9 and a-f (lower case) are used.
If you want to generate such UUIDs from an existing integer, I recommend hashing the integer first. This will give you an even distribution over the whole key space (this makes sharding easier and avoids hotspots).
As a second step you have to format the hash value in a string of the format shown above. If you don't have enough digits, it's fine to leave some of the last 'H' as constant 0.
If you really want to go into the details of UUID generation, here are two links for further reading:
RFC 4122 "A Universally Unique IDentifier (UUID) URN Namespace" https://www.rfc-editor.org/rfc/rfc4122
RethinkDB's implementation of UUID generation and formatting https://github.com/rethinkdb/rethinkdb/blob/next/src/containers/uuid.cc

Algorithm and data structure to store First name and last name

Is there a efficient way to store first name and last name in data structure so that we can lookup using either first or last name? I would consider a binary search tree with first name. It would be efficient to search first name. But wouldnt be efficient when trying to search last name. we can also consider one more BST with last name. Any ideas to implement it efficiently?
What if the question is
String names[] = { "A B","C D"};
A requirement is to be able to extend this directory dynamically at runtime,
without persistent storage. The directory can eventually grow to hundreds or
thousands of names and must be searchable by first or last name.
Now we can't have hash tables to store. Any ideas?
Two hash tables: one from first name to person, and one from last name to person.
Simple is best.
Why not put both first and last names in a trie?
As a bonus, this way you can even get suggestions on partial names by traversing all leaves after current node (maybe on an asynchronous call)
You're idea is pretty good, but here's another option: how about implementing to hash tables?
The first hash table would use first names as a key, and the associated value would either be the last name or a pointer to a Name object. The second hash table would use last names as keys, with the first names or pointers to Name as the values.
Personally, for choosing the values, I would go for a pointer to a Name object, since this method would be more applicable in case you'd like to store even more information (e.g. data of birth, etc.)
Also, see Does Java have a HashMap with reverse lookup?…, which is specific to Java but the discussion on the data structures is relevant to any language.
Note that structures such as Bidirectional Sorted Maps also allow range searches (which dual hash tables don't).
if you need to search only by first name or only by last name then yes, two hashmaps are the best (and notice you're not duplicating the data, you're partitioning it) but if you don't mind then put both first and last names in a single hashmap and don't differentiate between the two.

Resources