I need to develop a cache system for tree based key-value (very similar to windows registry editor).
in that cache keys are strings which represents path in the tree to the value,which can be primitive type(int,string,bool,double etc.) or subtree by it self.
for example :
key = root\x\y\z\w , value = the whole subtree under w
key = root\x\y\z\w\t , value = integer
I thought about using Redis as simple cache implemntation, but naive key-value will miss the point of tree hierarchy.
in addition, in this naive way , guessing I have in cache already
key = root\x\y, value = the whole subtree under y
and i am looking for
key = root\x\y\z
the naive key-value won't find it although it is already exist in cache.
the best data strucutre which I can think about is a prefix-tree (Trie) which can handle the keys in more efficient way and can easily find cases of sub-string as I mentioned above.
I could not find any implmentation of Redis which can handle this data strcutre yet.
Can Redis handle this kind of cache? if not,there is an alternative strcuture to use?
Redis doesn't do trees (yet). If you must have a tree-like structure stored in Redis, I recommend you look at http://rejson.io.
Alternatively, you can develop a Redis module that does anything you want/need.
Disclaimer: I am one of ReJSON's authors.
xref: https://groups.google.com/d/msg/redis-db/ROSocq9sQ34/NmxeF0QFAQAJ
Try this redis tree module. It provide native polytree data structure in redis
Related
I was reading an answer to a question asked here:
Why does hashcode() returns an integer and not long?
My question is: Why hashcode based data structures use an array to create bins?
Because array is a low-level data structure which allows random access to its elements.
You need a "low-level" data structure to base a "higher-level" data structure on.
You need random access so that you can address bins very fast.
cause, an array is based on integer-based indexes! now you can show some curiosity, why array using integer-based indexing. one of the assumptions should be -- if you could able to use other types (real numbers) rather than using integer, just think how many dimension you would capable to add --
for example --
for 1-th index, you could capable to add sub-indexes like -- 1.1, 1.2, 1.1.2, 1.1.1.1.2 and so on so forth!
doing so, it will create more overhead, rather than popping up the solution we want.
I wonder what is the best way for storing huge amount of strings and checking for duplication.
We have to think about our priority:
duplicate check speed
inserting new string time
storage space on hard disk
random access time
What is the best solution, when our target is fast duplicate checking and inserting new strings time (no random access or storage space matter) ?
I think about SQL database, but which of DB's is best for this solution ?
If we use SQL DB, like MySQL, which storage engine will be the best ? (of course, we have to exclude memory because of data amount)
Use a hash function on the input string. the output hash would be the primary key/id of the record.
Then you can check if the DB has this hash/id/primary key:
If it doesnt: this is a new string; you add a new record including the string and hash as id.
If it does: check that the string from the loaded record is the same as the input string.
if the string is the same: it is a duplicate
if the string is different: this is a collision. Use a collision resolution scheme to resolve. (A couple of examples below)
You will have to consider which hash function/scheme/strength to use based on speed and expected number of strings and hash collision requirements/guarantees.
A couple of ways to resolve collisions:
Use a 2nd hash function to come up with a new hash in the same table.
Mark the record (e.g. with NULL) and repeat with a stronger 2nd hash function (with wider domain) on a secondary "collision" table. On query, if the string is marked as collided (e.g. NULL) then do the lookup again in the collision table. You might also want to use dynamic perfect hashing to ensure that this second table does not have further collisions.
Of course, depending on how persistent this needs to be and how much memory you are expecting to take up/number of strings, you could actually do this without a database, directly in memory which would be a lot faster.
You may want to consider a NoSQL solution:
Redis. Some of the use cases solved using Redis:
http://highscalability.com/blog/2011/7/6/11-common-web-use-cases-solved-in-redis.html
http://dr-josiah.blogspot.com/2011/02/some-redis-use-cases.html
(Josiah L. Carlson is the author of Redis in Action)
http://www.paperplanes.de/2010/2/16/a_collection_of_redis_use_cases.html
memcached. Some comparisons between memcached and Redis:
http://www.quora.com/What-are-the-differences-between-memcached-and-redis
Is memcached a dinosaur in comparison to Redis?
http://coder.cl/2011/06/concurrency-in-redis-and-memcache/
Membase/Couchbase who counts OMGPOP's Draw Something as one of their success stories. Comparison between Redis and Membase:
What is the major difference between Redis and Membase?
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Some questions:
how large is the set of strings?
will the application be read heavy or write heavy? or both?
how often would you like data to be persisted to disk?
is there a N most recent strings requirement?
Hope this helps.
Generate Suffix trees to store strings . Ukkonen's algorithm as in http://www.daimi.au.dk/~mailund/slides/Ukkonen-2005.pdf will give some insight how to create Suffix tree .There are number of ways to store this suffix tree. But once generated , the lookup time is very low.
Suppose you want to write a program that implements a simple phone book. Given a particular name, you want to be able to retrieve that person's phone number as quickly as possible. What data structure would you use to store the phone book, and why?
the text below answers your question.
In computer science, a hash table or hash map is a data structure that
uses a hash function to map identifying values, known as keys (e.g., a
person's name), to their associated values (e.g., their telephone
number). Thus, a hash table implements an associative array. The hash
function is used to transform the key into the index (the hash) of an
array element (the slot or bucket) where the corresponding value is to
be sought.
the text is from wiki:hashtable.
there are some further discussions, like collision, hash functions... check the wiki page for details.
I respect & love hashtables :) but even a balanced binary tree would be fine for your phone book application giving you in worst case a logarithmic complexity and avoiding you for having good hash functions, collisions etc. which is more suitable for huge amounts of data.
When I talk about huge data what I mean is something related to storage. Every time you fill all of the buckets in a hash-table you will need to allocate new storage and re-hash everything. This can be avoided if you know the size of the data ahead of time. Balanced trees wont let you go into these problems. Domain needs to be considered too while designing data structures, for an example for small devices storage matters a lot.
I was wondering why 'Tries' didn't come up in one of the answers,
Tries is suitable for Phone book kind of data.
Also, saving space compared to HashTable at the same cost(almost) of Retrieval efficiency, (assuming constant size alphabet & constant length Names)
Tries also facilitate the 'Prefix Matches' sometimes required while searching.
A dictionary is both dynamic and fast.
You want a dictionary, where you use the name as the key, and the number as the data stored. Check this out: http://en.wikipedia.org/wiki/Dictionary_%28data_structure%29
Why not use a singly linked list? Each node will have the name, number and link information.
One drawback is that your search might take some time since you'll have to traverse the entire list from link to link. You might order the list at the time of node insertion itself!
PS: To make the search a tad bit faster, maintain a link to the middle of the list. Search can continue to the left or right of the list based on the value of the "name" field at this node. Note that this requires a doubly linked list.
I'm trying to translate an idea I had from OOP concepts to FP concepts, but I'm not quite sure how to best go about it. I want to have multiple collections of records, but have individual records linked across the collections. In C# I would probably use multiple Dictionary objects with an Entity-specific ID as a common key, so that given any set of the dictionaries, a method could extract a particular Entity using its ID/Name.
I guess I could do the same thing in F#, owing to its hybrid nature, but I'd prefer to be more purely functional. What is the best structure to do what I'm talking about here?
I had considered maybe a trie or a patricia trie, but I shouldn't need very deep name searching, and I'm more likely to have one or two of some things and lots of other things. It's a game design idea, so, for example, you'd only have one "Player" but could have tons of "Enemy1", "Enemy2" etc.
Is there a really good data structure for fast keyed lookup in FP, or should I just stick to Dictionary/Hashmaps?
A usual functional data structure for representing dictionaries that's available in F# is Map (as pointed out by larsmans). Under the cover, this is implemented as a ballanced binary tree, so the complexity of lookup is O(log N) for a tree containing N elements. This is slower than hash-based dictionary (which has O(1) for good hash keys), but it allows adding and removing elements without copying the whole collection - only a part of the tree needs to be changed.
From your description, I have the impression that you'll be creating the data structure only once and then using it for a long time without modifying it. In this case, you could implement a simple immutable wrapper type that uses Dictionary<_, _> under the cover, but takes all elements as a sequence in the constructor and doesn't allow modifications:
type ImmutableMap<'K, 'V when 'K : equality>(data:seq<'K * 'V>) = // '
// Store data passed in constructor in hash-based dictionary
let dict = new System.Collections.Generic.Dictionary<_, _>()
do for k, v in data do dict.Add(k, v)
// Provide read-only access
member x.Item with get(k) = dict.[k]
let f = new ImmutableMap<_,_ >( [1, "Hello"; 2, "Ahoj" ])
let str = f.[1]
This should be faster than using F# Map as long as you don't need to modify the collection (or, more precisely, create copies with elements added/removed).
Use the F# module Collections.Map. My bet is that it implements balanced binary search trees, the data structure of choice for this task in functional programming.
Tries are hard to program and mostly useful in specialized applications such as search engine indexing, where they are commonly used as a secondary store on top of an array/database/etc. Don't use them unless you know you need to.
What Data Structure could I use to find the Phone number of a person given the person's name?
Assuming you will only ever query using the person's name, the best option is to use an associative data structure. This is basically a data structure, usually implemented as a hashtable or a balanced binary search tree, that stores data as key=>value (or, stated in another way, as (key,value) pairs). You query the data structure by using the key and it returns the corresponding value. In your case, the key would be the name of the person and the value would be the phone number.
Rather than implementing a hashtable or a binary search tree for this yourself, check to see if your language has something like this already in its library, most languages these days do. Python has dict, perl has hashes, Java and C# has Map, and C++ has the STL map.
Things can get a little trickier if you have several values for the same key (e.g. the same person having multiple phone numbers), but there are workarounds like using a list/vector as the value, or using a slightly different structure that supports multiple values for the same key (e.g. STL multimap). But you probably don't need to worry about that anyway.
An associative array, such as a hashtable.
Really, anything that maps keys to values. The specific data structure will depend on the language you are using (unless you want to implement your own, in which case you have free reign).