LC-3 How to store a number large than 16-bit and print it out to console? - lc3

I'm having difficulty storing and displaying numbers greater than 32767 in LC-3 since a register can only hold values from -32768 to 32767. My apology for not being able to come up with any idea for the algorithm. Please give me some suggestion. Thanks!

You'll need a representation to store the larger number in a pair or more of words.
There are several approaches to how big integers are stored: in a fixed number of words, and in a variable number of words or bytes.  The critical part is being able to detect the presence and amount of overflow/carry on mathematical operations like *10.
For that reason, one simple approach is to use a variable number of words/bytes (for a single number), and store only one decimal digit in each of the words/bytes.  That way multiplication by 10, means simply adding a digit on the end (which has the effect of moving each existing digit to the next higher power of ten position).  Adding numbers of this form numbers is fairly easy as well, we need to line up the digits and then, we add them up and detect when the sum is >= 10, then there is a carry (of 1) to be added to the next higher order digit of the sum.  (If adding two such (variable length) numbers is desired, I would store the decimal digits in reverse order, because then the low order numbers are already lined up for addition.)  See also https://en.wikipedia.org/wiki/Binary-coded_decimal .  (In some sense, this is like storing numbers in a form like string, but using binary values instead of ascii characters.)
To simplify this approach for your needs, you can fix the number of words to use, e.g. at 7, for 7 digits.
A variation on (unpacked) Binary-coded Decimal to pack them two decimal digits per byte.  Its a bit more complicated but saves some storage.
Another approach is to store as many decimal digits as will fit full in a word, minus 1.  Which is to say if we can store 65536 in 16-bits that's only 4 full decimal digits, which means putting 3 digits at a time into a word.  You'd need 3 words for 9 digits.  Multiplication by 10 means multiplying each word by 10 numerically, and then checking for larger than 999, and if larger, then carry the 1 to the next higher order word while also subtracting 10,000 from the overflowing word.
This approach will require actual multiplication and division by 10 on each of the individual words.
There are other approaches, such as using all 16-bits in a word as magnitude, but the difficulty there is determining the amount of overflow/carry on *10 operations.  It is not a monumental task but will require work.  See https://stackoverflow.com/a/1815371/471129, for example.
(If you also want to store negative numbers, that is also an issue for representation.  We can either store the sign as separately known as sign-magnitude form (as in stored its own word/byte or packed into the highest byte) or store the number in a compliment form.  The former is better for variable length implementations and the latter can be made to work for fixed length implementations.)

Related

What is the probability that a UUID, stripped of all of its letters and dashes, is unique?

Say I have a UUID a9318171-2276-498c-a0d6-9d6d0dec0e84.
I then remove all the letters and dashes to get 9318171227649806960084.
What is the probability that this is unique, given a set of ID's that are generated in the same way? How does this compare to a normal set of UUID's?
UUIDs are represented as 32 hexadecimal (base-16) digits, displayed in 5 groups separated by hyphens. The issue with your question is that for any generated UUID we could get any valid hexadecimal number from the set of [ 0-9,A-F ] inclusive.
This leaves us with a dilemma since we don't know, beforehand, how many of the hexadecimal digits generated for each UUID would be an alpha-characte: [A-F]. The only thing that we can be certain of, is that each generated character of the UUID has a 5/16 chance of being an alpha character: [A-F]. Knowing this makes it impossible to answer this question accurately since removing the hyphens and alpha characters leaves us with variable length UUIDs for each generated UUID...
With that being said, to give you something to think about we know that each UUID is 36 characters in length, including the hyphens. So if we simplify and say, we have no hyphens, now each UUID can be only be 32 characters in length. Building on this if we further simplify and say that each of the 32 characters can only be a numeric character: [0-9] we could now give an accurate probability for uniqueness of each generated, simplified, UUID (according to our above mentioned simplifications):
Assuming a UUID is represented by 32 characters, where each character is a numerical character from the set of [0-9]. We know that we need to generate 32 numbers in order to create a valid simplified UUID. Now the chances of selecting any given number: [0-9] is 1/10. Another way to think about this is the following: each number has an equal opportunity of being generated and since there are 10 numbers: each number has a 10% chance of being generated.
Furthermore, when a number is generated, the number is generated independently of the previously generated numbers i.e. each number generated doesn't depend on the outcome of the previous number generated. Therefore, for each of the 32 numeric characters generated: each number is independent of one another and since the outcome of any number selected is a number and only a number from [0-9] we can say that each number selected is mututally exclusive to one another.
Knowing these facts we can take advantage of the Product Rule which states that the probability of the occurrence of two independent events is the product of their individual probabilities. For example, the probability of getting two heads on two coin tosses is 0.5 x 0.5 or 0.25. Therefore, the generation of two identical UUIDs would be:
1/10 * 1/10 * 1/10 * .... * 1/10 where the number of 1/10s would be 32.
Simplifying to 1/(10^32), or in general: to 1/(10^n) where n is the length of your UUID. So with all that being said the possibility of generating two unique UUIDs, given our assumptions, is infinitesimally small.
Hopefully that helps!

Find numbers that differ by 1 digit from a set of 15,000 12-digit numbers

I have a list of ~15,000 12-digit barcoded tickets. Most of the time they are scanned off paper or phone screens, but sometimes they are typed in (cracked screens, etc.) How would I go about finding if we have any sets of codes that differ by 1 digit, so typing the first one with a mis-typed digit might end up with another valid code?
The code numbers are 12-digit integers that are fairly random in the range 100000000000 to 999999999999 (we don't want leading zeroes to give problems with other systems)
e.g. given the three code numbers
123456789012
123456789013
223456789012
The first and second differ by only one digit and the second and third also. the first and third differ by 2 digits, so is ignored.
Use a hash set. Go through each of your 15,000 numbers in turn, and for each one, generate the 108 different numbers that differ from it in one place (12 digits times 9 possible alternate digits in each place). Check if each of those 108 numbers exists in the hash set (without inserting them). If any one of them does then you have a hit. If not then add the unmodified number to the hash set and move onto the next one.
You could also try with transpositions of adjacent numbers, which would give you another 11 digits on top of the 108 to try.

Best way to represent numbers of unbounded length?

What's the most optimal (space efficient) way to represent integers of unbounded length?
(The numbers range from zero to positive-infinity)
Some sample number inputs can be found here (each number is shown on it's own line).
Is there a compression algorithm that is specialized in compressing numbers?
You've basically got two alternatives for variable-length integers:
Use 1 bit of every k as an end terminator. That's the way Google protobuf does it, for example (in their case, one bit from every byte, so there are 7 useful bits in every byte).
Output the bit-length first, and then the bits. That's how ASN.1 works, except for OIDs which are represented in form 1.
If the numbers can be really big, Option 2 is better, although it's more complicated and you have to apply it recursively, since you may have to output the length of the length, and then the length, and then the number. A common technique is to use a Option 1 (bit markers) for the length field.
For smallish numbers, option 1 is better. Consider the case where most numbers would fit in 64 bits. The overhead of storing them 7 bits per byte is 1/7; with eight bytes, you'd represent 56 bits. Using even the 7/8 representation for length would also represent 56 bits in eight bytes: one length byte and seven data bytes. Any number shorter than 48 bits would benefit from the self-terminating code.
"Truly random numbers" of unbounded length are, on average, infinitely long, so that's probably not what you've got. More likely, you have some idea of the probability distribution of number sizes, and could choose between the above options.
Note that none of these "compress" (except relative to the bloated ascii-decimal format). The asymptote of log n/n is 0, so as the numbers get bigger the size of the size of the numbers tends to occupy no (relative) space. But it still needs to be represented somehow, so the total representation will always be a bit bigger than log2 of the number.
You cannot compress per se, but you can encode, which may be what you're looking for. You have files with sequences of ASCII decimal digits separated by line feeds. You should simply Huffman encode the characters. You won't do much better than about 3.5 bits per character.

A good starting number for the middle square method

I want to generate using the middle square method 10,000 (ten thousand) numbers with 6 decimals for both higher than 1 (for example 785633)and lower than 1(for example 0.434367) starting numbers. Is there any starting number for the two situations that can generate 10,000 distinct numbers?
You generally want a pretty big number for middle-square, say fifty digits or so. When you pick six digits (they can be any portion of the middle part of the number), you can use them as a six-digit number or divide by a million and use them as a decimal number.
You should be aware that middle-square is no longer considered a good method for generating random numbers. A simple linear congruential generator is faster and better, and there are many other types of random number generators also.

Algorithm Question on File Search Indexing

There is one question and I have the solution to it also. But I couldn't understand the solution. Kindly help with some set of examples and shower some experience.
Question
Given a file containing roughly 300 million social security numbers (9-digit numbers), find a 9-digit number that is not in the file. You have unlimited drive space but only 2MB of RAM at your disposal.
Answer
In the first step, we build an array 2^16 integers that is initialized to 0 and for every number in the file, we take its 16 most significant bits to index into this array and increment the number.
Since there are less than 2^32 numbers in the file, there is bound to be (at least) one number in the array that is less than 2^16. This tells us that there is at least one number missing among the possible numbers with those upper bits.
In the second pass, we can focus only only on the numbers that match this criterion and use a bit-vector of size 2^16 to identify one of the missing numbers.
To make the explanation simpler, let's say you have a list of two-digit numbers, where each digit is between 0 and 3, but you can't spare the 16 bits to remember for each of the 16 possible numbers, whether you have already encountered it. What you do is to create an array a of 4 3-bit integers and in a[i], you store how many numbers with the first digit i you encountered. (Two-bit integers wouldn't be enough, because you need the values 0, 4 and all numbers between them.)
If you had the file
00, 12, 03, 31, 01, 32, 02
your array would look like this:
4, 1, 0, 2
Now you know that all numbers starting with 0 are in the file, but for each of the remaining, there is at least one missing. Let's pick 1. We know there is at least one number starting with 1 that is not in the file. So, create an array of 4 bits, for each number starting with 1 set the appropriate bit and in the end, pick one of the bits that wasn't set, in our example if could be 0. Now we have the solution: 10.
In this case, using this method is the difference between 12 bits and 16 bits. With your numbers, it's the difference between 32 kB and 119 MB.
In round terms, you have about 1/3 of the numbers that could exist in the file, assuming no duplicates.
The idea is to make two passes through the data. Treat each number as a 32-bit (unsigned) number. In the first pass, keep a track of how many numbers have the same number in the most significant 16 bits. In practice, there will be a number of codes where there are zero (all those for 10-digit SSNs, for example; quite likely, all those with a zero for the first digit are missing too). But of the ranges with a non-zero count, most will not have 65536 entries, which would be how many would appear if there were no gaps in the range. So, with a bit of care, you can choose one of the ranges to concentrate on in the second pass.
If you're lucky, you can find a range in the 100,000,000..999,999,999 with zero entries - you can choose any number from that range as missing.
Assuming you aren't quite that lucky, choose one with the lowest number of bits (or any of them with less than 65536 entries); call it the target range. Reset the array to all zeroes. Reread the data. If the number you read is not in your target range, ignore it. If it is in the range, record the number by setting the array value to 1 for the low-order 16-bits of the number. When you've read the whole file, any of the numbers with a zero in the array represents a missing SSN.

Resources