Why does SecureRandom.uuid create a unique string? [closed] - ruby

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Why does SecureRandom.uuid create a unique string?
SecureRandom.uuid
# => "35cb4e30-54e1-49f9-b5ce-4134799eb2c0"
The string that method SecureRandom.uuid creates is never repeated?

The string is not in fact guaranteed unique. There is a very small but finite chance of a collision.
However in practice you will never see two ids generated using this mechanism that are the same, because the probability is so low.
You may safely treat a call to SecureRandom.uuid as generating a globally unique string in code that needs to manage many billions of database entities.
Here is a small table of collision probabilities.
Opinion: If I were to pick an arbitrary limit where you might start to see one or two collisions across the entire dataset with a realistic probability I would go for around 10**16 - assuming you create a million ids per second in your system, then it would take 30 years to reach that size. Even then, the probability of seeing any collisions over the whole 30 years of the project, would be roughly 1 in 100000.

Related

Searching through an list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I'm reading about AI and in the notes it is mentioned
A lookup table in chess would have roughly 35^100 entries.
But what does this mean? Is there any way we could find out how long it would take the computer to search through and find it's entry? Would we assume thereis some order or that there is no order?
The number of atoms in the known universe is estimated to be around 10^80 which is much less than 35^100. With current technology, at least a few thousand atoms are required to store a single bit. I assume that each entry of your table would have multiple bits. You would need some really advanced technology to implement the memory of your computer.
So the answer is: With current technology it is not a matter of time, it is simply impossible.

Algorithm for finding similar words [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In order to support users learning English, I want to make a multiple-choice quiz using the vocabulary that the user is studying.
For example, if the user is learning "angel" then I need an algorithm to produce some similar words such as "angle" and "angled"
Another example, if the user is learning "accountant" then I need an algorithm to produce some similar words such as "accounttant" and "acountant", "acounttant"
You could compute the Levenshtein Distance from the starting word to each word in your vocabulary and pick the 2 or 3 shortest ones.
Depending on how many words are in your dictionary this might take a long time though, so I would recommend bailing out after a certain (small) number of steps - i.e. if you have made 3 mutations and still haven't arrived at your target word then stop and move on to the next one.

Is using unique data better than random data for a hash? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need to generate global unique ids by hashing some data.
On the one hand, I could use a combination of timestamp and network address, which is unique since every computer can only create one id at the same time. But since this data is to long I'd need to hash it and thus collisions could occur. (As a side note, we could also throw in a random number if the timestamp is not exact enough.)
On the other hand, I could just use a random number and hash that. Shouldn't that bring exactly the same hash collision probability as the first approach? It is interesting because this approach would be faster and is much easier to implement.
Is there a difference in terms of hash collisions when using unique data rather than random data? (By the way, I will not use real GUIDs as described by the standard but mine will only be 64 bits long. But that shouldn't affect the question.)
Why bother to hash a random number? Hashing is designed to map inputs uniformly to a keyspace, but PRNGs are already giving you a uniform mapping of outcomes. All you're doing is creating more work.

Ruby Object manipulation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
We have an algorithm that compare ruby objects coming from MongoDB. The majority of the time spent, it taking the results (~1000), assigning a weight to them, and comparing them to a base object. This process takes ~2 sec for 1000 objects. Afterwards, we order the objects by the weight, and take the top 10.
Given that the number of initial matches will continue to grow, I'm looking for more efficient ways to compare and sort matches in Ruby.
I know this is kind of vague, but let's assume they are User objects that have arrays of data about the person and we're comparing them to a single user to find the best match for that user.
Have you considered storing/caching the weight? This works well if the weight depends only on the attributes of each user and not on values external to that user.
Also, how complex is the calculation involving the weight associated with a user and the "base" user? If it's complex you may want to consider using a graph database, which can store data that is specific to the relation between 2 nodes/objects.

Joining very large lists [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Lets put some numbers first:
The largest of the list is about 100M records. (but is expected to grow upto 500). The other lists (5-6 of them) are in millions but would be less than 100M for the foreseeable future.
These are always joined based on a single id. and never with any other parameters.
Whats the best algorithm to join such lists?
I was thinking in lines of distributed computing. Have a good hash (the circular hash kinds, where you can add a node and there's not a lot of data movement) function and have these lists split into several smaller files. And since, they are always joined on the common id (which i will be hashing) it would boil down to joining to small files. And maybe use the nix join commands for that.
A DB (at least MySQL) would join using merge join (since it would be on primary key). Is that going to be more efficient that my approach?
I know its best to test and see. But given the magnitute of these files, its pretty time consuming. And I would like to do some theoretical calculation and then see how it fairs in practice.
Any insights on these or other ideas would be helpful. I dont mind if it takes slightly longer, but would prefer the best utilization of the resources I have. Don't have a huge budget :)
Use a Database. They are designed for performing joins (with the right indexes of course!)

Resources