What data structure should i use for caching ordered entities - caching

I need to store in cache some items, like chat messages. I also need to slice these items on the key value range. For example (back to chat messages) the most common operation with cache will be getting chat messages from begin date to end date.
What data structure should I be considering? I was thinking about simple array, but it will work for O(n). Is there any data structure that will work faster?

You can use a self balancing binary search tree like Red-Black Tree which stores the entries in ordered fashion and will provide the insert,delete,search in O(logn) in both average and worst case.
So when you need the chat message between a date interval you can search your RB-tree for data range which are already ordered.

use an associative set, store you data in an array<data>, but use a hash table to store pair<data , arrayIndex>. this way you can search , insert and delete with o(1).

Related

Best DataStructure to implement Excel Spreadsheet

How we can implement Excel spreadsheet with creation and deletion of rows and creation and deletion if cells, with also can modify data inside any cell.
I was looking for best data structure to implement this.
The problem statement is little vague in my opinion. We do not have any information about the kind of operations that will be very frequent or even the amount of data that this DS is going to hold.
So assuming there can be fair amount of data. Also the operations are addition and deletion of rows and cells.
For excel spreadsheet, If I have to implement it with a custom Data Structure, I would take each row as a node of a linked list. This is helpful because as opposed to an array (n dimensional), the memory can be assigned in non contiguous manner. Also with that benefit, it will make adding and deletion of rows much easy.
Inside each node, we can have array of string to hold cell values and a Id field to hold the Id of the row.
The head node of the DS will have column names as value of its string array. So in a way each column is mapped to an index of the array.
To add a row: It will be an insert into the linked list. Make a new row and append in the end.
To delete a row: Same as deletion of node in a linked list.
To add/update a cell value: You basically know the row Id, you have column name so you can know the index of the column in the array from head node. So once you have the node corresponding to the row, access the index of string array to add/read/update/delete the value of cell.
In order to optimize node access you can keep indexes on the actual linked list to easily locate node by row Id. Some more optimizations would be store row-Id to node pointers mapping some where in auxiliary map or array so that inserting rows in between in also fast.
However I would re-iterate that implementation should be done on the use-case basis. If there are heavy column addition/deletion ops for example, it will be quite slow. There are different kind of trade-offs for each kind of use case.
I think the easy way to go ahead with this is to simply use a JSON structure to hold each row. Column names as keys and the cell values as values. This handles null/empty values quite easily.
A spreadsheet is essentially similar to a table, changes can be made on any cell at any row. Hence going with a simple list structure would not be too bad. The downside to this is that deletion and insertion of in between rows is not performant. But the insertion of rows at end, which is the most common use case and modification of cells can be made quite easy.
To facilitate faster insertion and deletion a linked list structure will help, but it will affect random access adversely, so a simple list of json objects would be the better.

How to implement dynamic indexes?

I know, Maybe the title is a little confusing. however, my actual question is basic I think.
I'm working on a brand new LRU implementation for that I use an Index Table which maps the name of the incoming packet to index of where the content of packet stored in CS.
As illustrated below each incoming packet store in the CS and can be addressed by Index Table.
Now suppose new packet arrived, as we know, regarding LRU, its index must set to top of CS (zero) and it needs to upgrade other indexes, they need to be incremented as a result.
One obvious solution is to loop over all entries in the Index Table and increment them.
Is there any solution or structure that is using for such a problem?
I don't see how you are establishing the order of your cache in the description. But to answer your question, it's possible to reduce the LRU store method to O(1) time complexity.
The classical way to do it is to have these two data structures:
Doubly Linked List : for order in the cache. Each node stores a data element (it plays the role of your content store).
HashMap that associates each key to the pointer to the node in the linked list. (it plays the role of your index table)
So when you access already stored data in your cache, it must be at the top of the list, so you delete the corresponding node from the linked list (in O(1) time because you have access to its previous and next nodes) and store it at the head.
For new data it is simpler, only store it at the head of the list and store your (key, value) in the hashmap.

Suitable data structure for finding a person's phone number, given their name?

Suppose you want to write a program that implements a simple phone book. Given a particular name, you want to be able to retrieve that person's phone number as quickly as possible. What data structure would you use to store the phone book, and why?
the text below answers your question.
In computer science, a hash table or hash map is a data structure that
uses a hash function to map identifying values, known as keys (e.g., a
person's name), to their associated values (e.g., their telephone
number). Thus, a hash table implements an associative array. The hash
function is used to transform the key into the index (the hash) of an
array element (the slot or bucket) where the corresponding value is to
be sought.
the text is from wiki:hashtable.
there are some further discussions, like collision, hash functions... check the wiki page for details.
I respect & love hashtables :) but even a balanced binary tree would be fine for your phone book application giving you in worst case a logarithmic complexity and avoiding you for having good hash functions, collisions etc. which is more suitable for huge amounts of data.
When I talk about huge data what I mean is something related to storage. Every time you fill all of the buckets in a hash-table you will need to allocate new storage and re-hash everything. This can be avoided if you know the size of the data ahead of time. Balanced trees wont let you go into these problems. Domain needs to be considered too while designing data structures, for an example for small devices storage matters a lot.
I was wondering why 'Tries' didn't come up in one of the answers,
Tries is suitable for Phone book kind of data.
Also, saving space compared to HashTable at the same cost(almost) of Retrieval efficiency, (assuming constant size alphabet & constant length Names)
Tries also facilitate the 'Prefix Matches' sometimes required while searching.
A dictionary is both dynamic and fast.
You want a dictionary, where you use the name as the key, and the number as the data stored. Check this out: http://en.wikipedia.org/wiki/Dictionary_%28data_structure%29
Why not use a singly linked list? Each node will have the name, number and link information.
One drawback is that your search might take some time since you'll have to traverse the entire list from link to link. You might order the list at the time of node insertion itself!
PS: To make the search a tad bit faster, maintain a link to the middle of the list. Search can continue to the left or right of the list based on the value of the "name" field at this node. Note that this requires a doubly linked list.

Best data structure for a given set of operations - Add, Retrieve Min/Max and Retrieve a specific object

I am looking for the optimal (time and space) optimal data structure for supporting the following operations:
Add Persons (name, age) to a global data store of persons
Fetch Person with minimum and maximum age
Search for Person's age given the name
Here's what I could think of:
Keep an array of Persons, and keep adding to end of array when a new Person is to be added
Keep a hash of Person name vs. age, to assist in fetching person's age with given name
Maintain two objects minPerson and maxPerson for Person with min and max age. Update this if needed, when a new Person is added.
Now, although I keep a hash for better performance of (3), I think it may not be the best way if there are many collisions in the hash. Also, addition of a Person would mean an overhead of adding to the hash.
Is there anything that can be further optimized here?
Note: I am looking for the best (balanced) approach to support all these operations in minimum time and space.
You can get rid of the array as it doesn't provide anything that the other two structures can't do.
Otherwise, a hashtable + min/max is likely to perform well for your use case. In fact, this is precisely what I would use.
As to getting rid of the hashtable because a poor hash function might lead to collisions: well, don't use a poor hash function. I bet that the default hash function for strings that's provided by your programming language of choice is going to do pretty well out of the box.
It looks like that you need a data structure that needs fast inserts and that also supports fast queries on 2 different keys (name and age).
I would suggest keeping two data structures, one a sorted data structure (e.g. a balanced binary search tree) where the key is the age and the value is a pointer to the Person object, the other a hashtable where the key is the name and the value is a pointer to the Person object. Notice we don't keep two copies of the same object.
A balanced binary search tree would provide O(log(n)) inserts and max/min queries, while the hastable would give us O(1) (amortized) inserts and lookups.
When we add a new Person, we just add a pointer to it to both data structures. For a min/max age query, we can retrieve the Object by querying the BST. For a name query we can just query the hashtable.
Your question does not ask for updates/deletes, but those are also doable by suitably updating both data structures.
It sounds like you're expecting the name to be the unique idenitifer; otherwise your operation 3 is ambiguous (What is the correct return result if you have two entries for John Smith?)
Assuming that the uniqueness of a name is guaranteed, I would go with a plain hashtable keyed by names. Operation 1 and 3 are trivial to execute. Operation 2 could be done in O(N) time if you want to search through the data structure manually, or you can do like you suggest and keep track of the min/max and update it as you add/delete entries in the hash table.

Redis - Seeking Data Modeling Suggestions

I'm using Redis to store data logs from many analog sensors. My goal is to sort the data according to a log time stamp and extract the data from a specific datetime range. My originial data model was to use the sensor name as the key and have a hash for each timestamp and the value attached to the hashkey.
So. if I have SensorA, SensorB and SensorC, doing a Keys * would return 1. SensorA, 2. SensorB and 3. SensorC. Doing hget SensorB 20110111172900 would return, let's say 25.
The problem with the current modeling is that it doesn't allow sorting on the timestamp, or so I think since all I've tried has failed.
Would someone be able to suggest a data model that would allow sorting and extracting ranges of data, or suggest the proper sort arguments that would allow this in the data model above.
A sorted set is probably a better fit than a hash in this case.
The value would be a combination of timestamp and sensor value. The score would be the timestamp. Use ZRANGEBYSCORE to retrieve the values. Both read and write go from O(1) to O(Log(N)), but you gain the ability to return a range of values.
You could also use a list to get O(1) insertion. Reading would be O(N) for retrieving a specific entry, but getting the most recent entries would be O(1).

Resources