Efficient data structure to get an ID - data-structures

I need an efficient data structure to generate IDs. The IDs should be able to be released using a method in the data structure. After an ID was released it can be generated again. The data structure must always retrieve the lowest unused ID.
What efficient data structure can be used for this?

Can't you just increment an integer and return that, with appropriate currency control. If someone releases an integer back store that in another sorted data structure and return that. If the list of returned integers is empty then your return is a simple as read, increment, write, return. If the list of returned integers is not empty then just read, return and remove the first int from the returned integers list

Related

Redux/React state normalization - why maintain a separate array of IDs?

Following the tutorial by Dan Abramov here: https://egghead.io/lessons/javascript-redux-normalizing-the-state-shape
He doesn't seem to explain the benefit of maintaining an extra reducer with an array of todo IDs (allIds), would it not be easier to have just the one byId reducer and user Object.keys or Object.values to iterate over it?
The sample Todo app shows a list of todos, in the order in which they were created. It's not possible to retrieve that ordered list in a way that is guaranteed to work across browsers using an Object and Object.keys.
JS Object properties are unordered, but arrays have an order. So the ordering of the output of Object.keys() is not guaranteed to have any relationship to the order in which the keys were added. The array allows the reducer to display the todos in the order in which they were added.
Theoretically you could use a Map, as the keys in a Map are ordered. However, there's no way to re-order the contents of a Map. With an array you could re-order the IDs without needing to touch the todo objects themselves.
In other words, the array data structure is better suited to storing ordered lists than both Object and Map.

dataloader facebook optimization

everybody!
I'm trying to use Dataloader by facebook in my graphql project.
So, now I'm faced to the next problem. When I ask my database for data by ids for example: select * from books where books.author in (4,5,6,7) I got an Error: "function did not return a Promise of an Array of the same length as the Array of keys". Cause by id 4 I can fetch more then just one book.
Does anybody know how to fix it?
Dataloader is expecting you to return an array of the same length as the input to your loader. So, if the loader gets [4,5,6,7] as an input, it will need to return an array with a length of 4. Also keep in mind that the results returned from the loader need to be in the same order as the input ids. This may or may not be something you have to worry about depending on how the data is returned from your database.
You should return an array for each id - array of arrays. You have to convert sql result - flat list with duplicates into 'groupped' arrays of records preserving input ids (amount and order).

RethinkDB: count unique rows within grouped data

I'm trying to count all unique rows within grouped data, i.e, how many unique rows exist within each group.
Although groupedData.distinct().count() works for relatively small amounts of rows, running it on ~200k rows, such as in my case, ends with "over size limit".
I understand why it happens, yet I can't come up with more efficient way of doing it - is there a way?
Count is an expensive thing in RethinkDB to my experience. Especially for count operation that require iterating the whole data set. I myself struggle with this for a bit before.
To my understanding, when you pass groupData to distinct, it creates an array, because groupData will be a sequence, therefore it has 100,000 element limits.
To solve this, I think we have to use a stream, and count the stream instead. We cannot use group because it returns a group of stream, or in other words, an array of stream to my understanding again.
So here is how I solve it:
Create an index on the field I want to groups
Call distnct on that table with the index.
Map the stream, passing value into a count function with getAll, using index
An example query
r.table('t').distinct({index: 'index_name'})
.map(function(value) {
return {group: value, total: r.table('t').getAll(value, {index: 'index_name'}).count()}
})
With this, everything is a stream and we can lazily iterator over result set to get the count of each group.

couchdb - retrieve unique documents for a view that emits non-unique two array keys

I have an map function in a view in CouchDB that emits non-unique two array keys, for documents of type message, e.g.
The first position in the array key is a user_id, the second position represents whether or not the user has read the message.
This works nicely in that I can set include_docs=true and retrieve the actual documents. However, I'm retrieving duplicate documents in that case, as you can see above in the view results. I need to be able to write a view that can be queried to return unique messages that have been read by a given user. Additionally, I need to be able to efficiently paginate the resultset.
notice in the image above that [66, true] is emitted twice for doc id 26a9a271de3aac494d37b17334aaf7f3. As far as I can tell, with the keys in my map function, I cannot reduce in such a way that unique documents will be returned.
the next idea I had was to emit doc._id also in the map function and reduce with group_level=exact the result being:
now I am able to get unique document ids, but I cannot get the documents without doing a second query. And even in the case of a second query, it will require a lot of complexity to do pagination like this (at least I think so).
the last idea I came up with is to emit the entire document rather than the doc._id in the third position in the array key, then I can access the entire document and likely paginate. This seems really brutish.
So my question is:
Is #3 above a terrible idea? Is there something I'm missing? Is there a better approach?
Thanks in advance.
See #WickedGrey's comment to the question. The solution is to ensure that I never emit the same key twice for one document. I do this in the map function by keeping track of the keys as I emit them in an array, then skipping the emit if the key exists in the array.

Algorithm and data structure to store First name and last name

Is there a efficient way to store first name and last name in data structure so that we can lookup using either first or last name? I would consider a binary search tree with first name. It would be efficient to search first name. But wouldnt be efficient when trying to search last name. we can also consider one more BST with last name. Any ideas to implement it efficiently?
What if the question is
String names[] = { "A B","C D"};
A requirement is to be able to extend this directory dynamically at runtime,
without persistent storage. The directory can eventually grow to hundreds or
thousands of names and must be searchable by first or last name.
Now we can't have hash tables to store. Any ideas?
Two hash tables: one from first name to person, and one from last name to person.
Simple is best.
Why not put both first and last names in a trie?
As a bonus, this way you can even get suggestions on partial names by traversing all leaves after current node (maybe on an asynchronous call)
You're idea is pretty good, but here's another option: how about implementing to hash tables?
The first hash table would use first names as a key, and the associated value would either be the last name or a pointer to a Name object. The second hash table would use last names as keys, with the first names or pointers to Name as the values.
Personally, for choosing the values, I would go for a pointer to a Name object, since this method would be more applicable in case you'd like to store even more information (e.g. data of birth, etc.)
Also, see Does Java have a HashMap with reverse lookup?…, which is specific to Java but the discussion on the data structures is relevant to any language.
Note that structures such as Bidirectional Sorted Maps also allow range searches (which dual hash tables don't).
if you need to search only by first name or only by last name then yes, two hashmaps are the best (and notice you're not duplicating the data, you're partitioning it) but if you don't mind then put both first and last names in a single hashmap and don't differentiate between the two.

Resources