I'm creating a dictionary app for WP.
One of its features is the real-time lookup. While the use is inputing a word, the dictionary will automatically find the recommended-result that is close to what user inputed.
And when the user input "bility", the dictionary must find "reusability", "ability" as the recommened-result.
MY QUESTION IS: what data structures fits my need?
Hashtable and tree structure isn't possible in this case because hashtable can only perform the lookup when the word is fully inputed, and tree structure can just find something close to "userinputed*" ( assumed the user input "n", the dictionary using tree structure can just find "nice" or "night" but can't find "and" or "ten" )
inputed data sample : recommended-result sample
"bility" => "reuseability"; "responsibility"...
"n" => "and"; "ten"; "nice"; "night"
A suffix trie can find the longest common substring in a string.
Related
Given a sentence like this, and i have the data structure (dictionary of lists, UNIQUE keys in dictionary):
{'cat': ['feline', 'kitten'], 'brave': ['courageous', 'fearless'], 'little': ['mini']}
A courageous feline was recently spotted in the neighborhood protecting her mini kitten
How would I efficiently process these set of text to convert the word synonyms of the word cat to the word CAT such that the output is like this:
A fearless cat was recently spotted in the neighborhood protecting her little cat
The algorithm I want is something that can process the initial text to convert the synonyms into its ROOT word (key inside dictionary), the keywords and synonyms would get longer as well.
Hence, first, I want to inquire if the data structure I am using is able to perform efficiently and whether there are more efficient structure.
For now, I am only able to think of looping through each list inside the dictionary, searching for the synonym's then mapping it back to its keyword
edit: Refined the question
Your dictionary is organised in the wrong way. It will allow you to quickly find a target word, but that is not helpful when you have an input that does not have the target word, but some synonyms of it.
So organise your dictionary in the opposite sense:
d = {
'feline': 'cat',
'kitten': 'cat'
}
To make the replacements, you could create a regular expression and call re.sub with a callback function that will look up the translation of the found word:
import re
regex = re.compile(rf"\b(?:{ '|'.join(map(re.escape, d)) })\b")
s = "A feline was recently spotted in the neighborhood protecting her little kitten"
print(regex.sub(lambda match: d[match[0]], s))
The regular expression makes sure that the match is with a complete word, and not with a substring -- "cafeline" as input will not give a match for "feline".
My question is theoretical,
I'm trying to make a design for a mapreduce example in Big data processing.
The case which I have requires a pair of keys to be mapped to a pair of values.
for example if we have below text:
"Bachelors in Engineering has experience of 5 years"
I am trying to count the words Engineering & Experience in a way where I would have a value for each word separately.
So my key would be (Engineering,Experience) and my value would be (1,1) as per the above given text example.
Note that there is a relationship between both key values in my homework, therefore I want them both in one set of a key-value to determine if both keys are mentioned in one text file, or only one key is mentioned, or none is mentioned.
Please let me know if above case is possible to do in map-reduce of big data or not..
Having a string key of "(Engineering,Experience)" is no different than just having a String of one of those words.
If you want to have some more custom type, then you will want to subclass the Writable and maybe the WritableComparable interfaces.
Simlarly, for the value, you could put the entire tuple as Text and parse it later, or you can create your own Writable subclass that can store two integers.
Thanks for the Answer, but I figured I could use "Engineering Experience" as a string for the key.
I'm a bit new to Redis, so please forgive if this is basic.
I'm working on an app that sends automatic replies to users for certain events. I would like to use Redis to store who has received what event.
Essentially, in ruby, the data structure could look like this where you have a map of users to events and the dates that each event was sent.
{
"mary#example.com" => {
"sent_comment_reply" => ["12/12/2014", "3/6/2015"],
"added_post_reply" => ["1/4/2006", "7/1/2016"]
}
}
What is the best way to represent this in a Redis data structure so you can ask, did Mary get a sent_comment_reply? and if so, when was the latest?
In short, the question is, how(if possible) can you have a Hash structure that holds an array in Redis.
The rationale as opposed to using a set or list with a compound key is that hashes have O(1) lookup time, whereas lookups on lists(lrange) and sets(smembers) will be O(s+n) and sets O(n), respectively.
One way of structuring it in Redis, depending on the idea that you know the events of the user and you want the latest to be fresh in memory :
A sorted set per user. the content of the sorted set will be event codes; sent_comment_reply, added_post_reply with the score of the latest event as the highest. you can use ZRANK to get the answer for the question :
Did Mary get a sent_comment_reply?
A hash also for the user, this time you will have the field as the event sent_comment_reply and the value is the content of it which should be updated with the latest value including the body, date, etc. this will answer the question:
and if so, when was the latest?
Note: Sorted sets are really fast , and in this example we are depending on the events as the data.
With sorted sets you can add, remove, or update elements in a very
fast way (in a time proportional to the logarithm of the number of
elements). Since elements are taken in order and not ordered
afterwards, you can also get ranges by score or by rank (position) in
a very fast way. Accessing the middle of a sorted set is also very
fast, so you can use Sorted Sets as a smart list of non repeating
elements where you can quickly access everything you need: elements in
order, fast existence test, fast access to elements in the middle!
A possible approach to use a hash to map an array is as follows:
add_element(key , value):
len := redis.hlen(key)
redis.hset(key , len , value)
this will map array[i] element to i field in a hash key.
this will work for some cases, but I would probably go with the answer suggested in https://stackoverflow.com/a/34886801/2868839
Design a datastructure for telephone directory for storing the name and phone number so that we can search for key given name and vice versa.
We can use 2 hash maps as follows
Map<String,int>
Map<int,String>
But it requires twice the memory.Can anyone suggest any other solution?
One person can have more than one number, and one number can belong to more than one person (members of a family). And as Nick said, a telephone number in general case can have non-numeric characters. All considered, instead of Map<String,int> you might be using Map<String,List<String>>, or to have only pointers to strings (in C++ terms), to avoid redundancy: Map<String*,List<String*>>.
A bimap (or "bidirectional map") is a
map that preserves the uniqueness of
its values as well as that of its
keys. This constraint enables bimaps
to support an "inverse view", which is
another bimap containing the same
entries as this bimap but with
reversed keys and values.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/BiMap.html
BiMap<String, Integer> biMap = HashBiMap.create();
biMap.put("Mike", 123);
biMap.put("John", 456);
biMap.put("Steven", 789);
BiMap<Integer, String> invertedBiMap = biMap.inverse();
Edit: Multimaps
Multimap<String, String> multimap = HashMultimap.create();
multimap.put("John", "111111111");
multimap.put("John", "222222222");
multimap.put("Mike", "333333333");
System.out.println(multimap.get("John")); //[222222222, 111111111]
for(Map.Entry<String, String> entry : multimap.entries()){
if(entry.getValue().equals("222222222")){
System.out.println(entry.getKey()); //John
}
}
//or
Multimap<String, String> inverted = HashMultimap.create();
Multimaps.invertFrom(multimap, inverted);
System.out.println(inverted.get("222222222")); //[John]
another possibility is storing the Strings in a trie, and have a '$' sign indicate the end of each string. use doubly linked pointers for each step, and hold a double linked pointer from each '$' (end of name) to its number (in an array or a list).
now, when you want to get a phone from name:
find the '$' indicating the end of the word for this string.
it is connected to a number in a list - that is your number.
when you want to get a name from a phone:
find the number, it is connected to a '$' sign.
follow the up-links all the way to the root, this will get you the name (reversed).
reverse it and you are done.
also, as I said in the comment (regarding the double map approach): assuming your strings are pretty large, and the map holds a pointer/reference to the string (and not the actual string), you can assume the storage space needed will not be double, but something much better.
Using a binary search tree for implementing the telephone directory is the best way to do it. Think about the practical implementation of the mobile phone telephone contact list. It is sorted alphabetically. If one uses map template then we shall not get a sorted list. You can not sort the elements of the map, and it's not effective.
The only way to do it is the binary tree way. Since, while adding a new entry it's inserted in a ordered way. So, no more sorting is needed. It's already ordered. Remember, left_tree < root, and root < right_tree in case of binary tree.
I need to be able to lookup based on the full key or part of the key..
e.g. I might store keys like 10,20,30,40 11,12,30,40, 12,20,30,40
I want to be able to search for 10,20,30,40 or 20,30,40
What is the best data structure for achieving this..best for time.
our programming language is Java..any pointers for open source projects will be appreciated..
Thanks in advance..
If those were the actual numbers I'd be working with, I'd use an array where a given index contains an array of all records that contain the index. If the actual numbers were larger, I'd use a hash table employed the same way.
So the structure would look like (empty indexes elided, in the case of the array implementation):
10 => ((10,20,30,40)),
11 => ((11,12,30,40)),
12 => ((11,12,30,40), (12,20,30,40)),
20 => ((10,20,30,40), (12,20,30,40)),
30 => ((10,20,30,40), (11,12,30,40), (12,20,30,40)),
40 => ((10,20,30,40), (11,12,30,40), (12,20,30,40)),
It's not clear to me whether your searches are inclusive (OR-based) or exclusive (AND-based), but either way you look up the record groups for each element of the search set; for the inclusive search you find their union, and for the exclusive search you find their intersection.
Since you seen to care about retrieval time over other concerns (such as space), I suggest you use a hashtable and you enter your items several times, once per subkey. So you'd put("10,20,30,40",mydata), then put("20,30,40",mydata) and so on (of course this would be a method, you're not going to manually call put so many times).
Use a tree structure. Here is an open source project that might help ... written in Java :-)
http://suggesttree.sourceforge.net/