What is the probability that a UUID, stripped of all of its letters and dashes, is unique? - probability

Say I have a UUID a9318171-2276-498c-a0d6-9d6d0dec0e84.
I then remove all the letters and dashes to get 9318171227649806960084.
What is the probability that this is unique, given a set of ID's that are generated in the same way? How does this compare to a normal set of UUID's?

UUIDs are represented as 32 hexadecimal (base-16) digits, displayed in 5 groups separated by hyphens. The issue with your question is that for any generated UUID we could get any valid hexadecimal number from the set of [ 0-9,A-F ] inclusive.
This leaves us with a dilemma since we don't know, beforehand, how many of the hexadecimal digits generated for each UUID would be an alpha-characte: [A-F]. The only thing that we can be certain of, is that each generated character of the UUID has a 5/16 chance of being an alpha character: [A-F]. Knowing this makes it impossible to answer this question accurately since removing the hyphens and alpha characters leaves us with variable length UUIDs for each generated UUID...
With that being said, to give you something to think about we know that each UUID is 36 characters in length, including the hyphens. So if we simplify and say, we have no hyphens, now each UUID can be only be 32 characters in length. Building on this if we further simplify and say that each of the 32 characters can only be a numeric character: [0-9] we could now give an accurate probability for uniqueness of each generated, simplified, UUID (according to our above mentioned simplifications):
Assuming a UUID is represented by 32 characters, where each character is a numerical character from the set of [0-9]. We know that we need to generate 32 numbers in order to create a valid simplified UUID. Now the chances of selecting any given number: [0-9] is 1/10. Another way to think about this is the following: each number has an equal opportunity of being generated and since there are 10 numbers: each number has a 10% chance of being generated.
Furthermore, when a number is generated, the number is generated independently of the previously generated numbers i.e. each number generated doesn't depend on the outcome of the previous number generated. Therefore, for each of the 32 numeric characters generated: each number is independent of one another and since the outcome of any number selected is a number and only a number from [0-9] we can say that each number selected is mututally exclusive to one another.
Knowing these facts we can take advantage of the Product Rule which states that the probability of the occurrence of two independent events is the product of their individual probabilities. For example, the probability of getting two heads on two coin tosses is 0.5 x 0.5 or 0.25. Therefore, the generation of two identical UUIDs would be:
1/10 * 1/10 * 1/10 * .... * 1/10 where the number of 1/10s would be 32.
Simplifying to 1/(10^32), or in general: to 1/(10^n) where n is the length of your UUID. So with all that being said the possibility of generating two unique UUIDs, given our assumptions, is infinitesimally small.
Hopefully that helps!

Related

Description of the algorithm for sorting ASCII characters alphabetically

I am a beginner web developer and as a test task I got the following:
An unordered array of printed ASCII characters is given. Describe in your own words (without code or pseudocode) a sorting algorithm that allows you to sort this array alphabetically in linear time. It is necessary to describe the actions at each step of the algorithm. Is a stable version of such a sorting algorithm possible?
I'm not very good at algorithms, because I have just started studying, so I do not understand how to approach this task.
Thanks for the help.
printed ASCII characters
I suppose they mean printable ASCII characters, which are characters with ASCII code in the range 32-126, so 95 characters.
Describe in your own words
For each relevant ASCII code, count how many times that character occurs in the input. The idea is that you do this in one pass over the input: for each encountered character increment the relevant counter.
Iterate the above (95) counters in order of ASCII code, and output that many times the corresponding character. So if the counter is zero, don't output the character, if it the counter is 3, output that character three times.
Is a stable version of such a sorting algorithm possible?
Yes. This is only relevant when each character in the input is accompanied by some related data (payload). In that case we should not only maintain a counter per ASCII code, but collect the associated payloads in an array that is associated with that ASCII code.
For more information, see Counting Sort on Wikipedia
To do this, sorting is used by counting:
Create an array based on the length of the input array to record the number of numbers encountered in the indexes, and write the numbers themselves to the values standing on these indexes.
In one pass, for each corresponding number (ASCII code), count how many characters occur at the input.
If the number occurs more than once, the counter increments the number of times by one.
Print the ASCII letter codes in the ranges from 65 to 90 and 97 to 122 as many times as it was counted once
Yes, a stable variant of such a sorting algorithm is possible, because elements with the same value are in the output array in the same order as in the input

Return the count of all prime numbers in range [a,b] such that all the digits are from set {1,5,9} . 1<=a<=b<=10⁹

Return the count of all prime numbers in range [a,b] such that all the digits are from set {1,5,9} . 1<=a<=b<=10⁹.
My approach -
I was trying to generate all the numbers which are from set {1,5,9}. which comes out to be 3^9(19683) and after that I am checking for is it prime or not.
Can I do this in a better time complexity?
Never generate a large set and after check all elements of the set, ruling out most. That requires a lot of memory to store things you'll be discarding. Instead, find a single number with "valid" digits, check for primeness, and only then store it in a set. Accessing large arrays of memory is very time-intense on modern computers compared to doing math.
"I produced all the numbers": I hope you're doing this smartly! You never have to check a number with a last digit being 5 for primeness (there's only a single prime that ends in 5; that's 5 itself!), for example. Also, you hopefully don't just build all combinations of digits "manually". Say, you find a number 19551, then 19559 is also a candidate, you never have to manually "combine" digits to try out the last digit.
Of course, your prime-checking algorithm needs to be matching your kind of problem: You can remove the initial check for divisibility by 2 (you never produce even numbers), for example. You never need to check for divisibility by 5, because you never use 5 or 0 as last digit. Depending on your prime checking algorithm, you also would want to save the factor that "killed" the xxxx1 – that's one factor you don't have to check xxxx9 against. Do your 3-factor-checking based on the count of 1,5 and 9 in your number; you can directly infer cross-sum and hence 3-divisibility from that.

LC-3 How to store a number large than 16-bit and print it out to console?

I'm having difficulty storing and displaying numbers greater than 32767 in LC-3 since a register can only hold values from -32768 to 32767. My apology for not being able to come up with any idea for the algorithm. Please give me some suggestion. Thanks!
You'll need a representation to store the larger number in a pair or more of words.
There are several approaches to how big integers are stored: in a fixed number of words, and in a variable number of words or bytes.  The critical part is being able to detect the presence and amount of overflow/carry on mathematical operations like *10.
For that reason, one simple approach is to use a variable number of words/bytes (for a single number), and store only one decimal digit in each of the words/bytes.  That way multiplication by 10, means simply adding a digit on the end (which has the effect of moving each existing digit to the next higher power of ten position).  Adding numbers of this form numbers is fairly easy as well, we need to line up the digits and then, we add them up and detect when the sum is >= 10, then there is a carry (of 1) to be added to the next higher order digit of the sum.  (If adding two such (variable length) numbers is desired, I would store the decimal digits in reverse order, because then the low order numbers are already lined up for addition.)  See also https://en.wikipedia.org/wiki/Binary-coded_decimal .  (In some sense, this is like storing numbers in a form like string, but using binary values instead of ascii characters.)
To simplify this approach for your needs, you can fix the number of words to use, e.g. at 7, for 7 digits.
A variation on (unpacked) Binary-coded Decimal to pack them two decimal digits per byte.  Its a bit more complicated but saves some storage.
Another approach is to store as many decimal digits as will fit full in a word, minus 1.  Which is to say if we can store 65536 in 16-bits that's only 4 full decimal digits, which means putting 3 digits at a time into a word.  You'd need 3 words for 9 digits.  Multiplication by 10 means multiplying each word by 10 numerically, and then checking for larger than 999, and if larger, then carry the 1 to the next higher order word while also subtracting 10,000 from the overflowing word.
This approach will require actual multiplication and division by 10 on each of the individual words.
There are other approaches, such as using all 16-bits in a word as magnitude, but the difficulty there is determining the amount of overflow/carry on *10 operations.  It is not a monumental task but will require work.  See https://stackoverflow.com/a/1815371/471129, for example.
(If you also want to store negative numbers, that is also an issue for representation.  We can either store the sign as separately known as sign-magnitude form (as in stored its own word/byte or packed into the highest byte) or store the number in a compliment form.  The former is better for variable length implementations and the latter can be made to work for fixed length implementations.)

Find numbers that differ by 1 digit from a set of 15,000 12-digit numbers

I have a list of ~15,000 12-digit barcoded tickets. Most of the time they are scanned off paper or phone screens, but sometimes they are typed in (cracked screens, etc.) How would I go about finding if we have any sets of codes that differ by 1 digit, so typing the first one with a mis-typed digit might end up with another valid code?
The code numbers are 12-digit integers that are fairly random in the range 100000000000 to 999999999999 (we don't want leading zeroes to give problems with other systems)
e.g. given the three code numbers
123456789012
123456789013
223456789012
The first and second differ by only one digit and the second and third also. the first and third differ by 2 digits, so is ignored.
Use a hash set. Go through each of your 15,000 numbers in turn, and for each one, generate the 108 different numbers that differ from it in one place (12 digits times 9 possible alternate digits in each place). Check if each of those 108 numbers exists in the hash set (without inserting them). If any one of them does then you have a hit. If not then add the unmodified number to the hash set and move onto the next one.
You could also try with transpositions of adjacent numbers, which would give you another 11 digits on top of the 108 to try.

Convert string to perfect number

Given a string, we need to find the largest square which can be obtained by replace its characters by digits (leading zeros are not allowed) where same characters always map to the same digits and different characters always map to different digits. If no solution, return -1.
Consider the string "ab" If we replace character a with 8 and b with 1, we get 81, which is a square.
How to find it for given string ? It is given that string length can be at max 11.
Please help me find a suitable and efficient way
Sorry can't comment, not enough reputation for it so I'll answer here.
#mat7 about what you said in your question comments, no you don't have to do it for every letter from a to z. You only have to do it for the letters present in your string (so at max 12 letters, not 26).
The first thing I would even check is how much different letter you have, if it's 11 or 12 different letters you can directly return -1 since you can't have different letters having the same number.
Now, supposing the input string being "fdsadrtas", you take a new array with only each different letter => "fdsadrt"
And with this array you try all possibilities (exclude the obvious mismatching options, if you set 'f' to 4 and 'd' to 5, 's' can only be 12367890 (and f can never be 0)), this way you will exclude lots of possibilities, having as worst case 10! instead of 12^10. (actually 9*9! with the test of the first one never beeing 0 but it's close enough)
EDIT 2 : +1 samgak nice idea !
The last digit can only be 0,1,4,5,6,9 so the worst number of tests drop even to 9*6*8!
10! is by far small enough to be brute tested, keep the higher square value you found and you are done.
EDIT :
Actually It would work (in a finite reasonable amount of time) but it is the wrong approach now that I have thought about it.
You will use less time in looking all the squares numbers that could be a solution for your string, using the exemple I gave above it's a string of length 9, and checking each square who is length 9 if he could be successfully mapped into the string.
For a string of length 12 (the worst case) you will have to check the square values of 316'228 to 999'999, who is way less than the >2 millions check of the previous proposition. The other proposition might become faster if you start accepting long strings but with only 12 you are faster this way.

Resources