Likeliest way to compute validation using MOD 7 check on the last number - algorithm

I am attempting to perform number validation for a proprietary ID implementation. I was anticipating a spec document for the algorithm in detail, but this is all that they sent:
The IDs are 9 digits.
The check digit is a MOD 7 check on the last number.
I think I am to assume that "MOD 7" means to apply modulus 7 to something--I suspect that "the last number" means more than just applying mod 7 to the last digit, otherwise every valid ID would end in 0 or 7.
Meanwhile someone in a separate conversation said that it was actually "a CRC MOD 7 check on the last number" (throwing in "CRC"), but I think that this was a misspoken detail and that CRC is not used at all, but what do I know?
I am having difficulty finding information on some standard way of reading this specification and interpreting this for some "standard algorithm". Most of the samples I've found consist of many different varieties of algorithms, such as weighted or unweighted, etc.
Does anyone know how I am most likely to interpret this, or if I am supposed to ask for more information? If I am supposed to ask for more information, what questions should I ask?

As I said in my comment, this is horrible documentation, but here's the only way to read this that makes sense to me:
You do a MOD 7 check on the whole card number (all nine digits), and then check if that result equals the last digit in the card number.

Slight variation on Briguy37's answer, this variation so far appears to actually be the correct answer in my case since initial tests seem to pass.
You do a MOD 7 check on the first eight digits of the 9-digit number as a single 8-digit integer, and then check if that result equals the last digit in the card number.

You should ask for more information. Think of the consequences if you don't and it turns out the interpretation you received from the internets is wrong.

Related

Is there any alternative explanations why 'Luhn algorithm' is good?

Good Evening guys,
I built a simple Java program that validates credit card numbers using Luhn's algorithm / (mod 10).
I found a lot of explanations for why it is right to use this formula but still did not understand it fully.
Can anyone explain to me why it is a good formula to use for such validation?
It is just because all the companies agreed to follow this formula and they do not allow any credit card/bank account to exist if it does not follow the Luhn format.
Reason for that is to limit chance of human error.
If you make single typo, it will not pass Luhn alghoritm. Why single typo? Because it is mod 10, if you do typo to one number the difference is from 1 to 9 so the mod 10 covers it.
If you do typo to two numbers and you are unlucky (only 10% of combinations is valid, so there is 10% chance) you can send money to wrong account anyway.

Converting PI digits into text strings

It's kind of interesting that pi's decimal representation never ends and never settles into a permanent repeating pattern. Meaning it's highly possible that pi contains every possible combination of numbers.
This guy calculated 5 trillions 5x(10^12) numbers of pi :D
http://www.numberworld.org/misc_runs/pi-5t/details.html
From the internet: "Converted into ASCII text, somewhere in that infinite string of digits is the name of every person you will ever love, the date, time and manner of your death, and the answers to all the great questions of the universe."
Wondering if somebody has already converted and analyzed the resulting string for known sequences of letters (words/sentences)?
Check out this page: http://pi.nersc.gov/.
It allows you to search for both character strings and hexadecimal sequences. Note that this search engine only has indexed the first 4 billion decimals of pi, and uses a formula for arbitrarily positioned binary or hexadecimal digits after those indexed.
The idea that Pi contains everything ever is a nice idea, but if it's correct, that means there is also an infinite amount of false things about everything ever. For example, if Pi contains a list of all the people you will ever love, then it will also have a list of people that seems that it is a list of people you will love, but in reality it's just a mix of names in a pattern that makes it look legit.
Following the same idea, the date, time, and manner of your death could also be "falsified". For example, let's say you are a man named Jason Delara, and you die at the age of 83 at 11:35 PM in your sleep. In Pi somewhere it can say in ASCII text "Jason Delara will die at age 83, 11:35 PM, passed in his sleep." It would also say somewhere else that "Jason Delara will die at age 35, at 6:00 AM, passed in a car accident." There could be an "infinite" amount of these false predictions.
There's also the fact that, if following the idea from above, all but one the answers to one of the great questions of the universe in that digit are wrong, even if many of the answers make sense. I've thought about this a lot, and I thought "What if there's part of the digit that states which facts are correct and which are not?" The answer is "Then there is an infinite amount of false lists in the digit claiming to do the same as the real list." In short, it would be pointless to convert Pi to ASCII text to try and figure everything out.
I know I'm a little late the party, but I wrote this for anybody who comes here looking for the answers to the universe in an endless, non-repeating decimal.
It is massively convenient that pi is an irrational number we're still finding digits for as if you can't find what you want in the sequence then by definition it just happens to be later on.
As for it containing hidden information - if you create any random sequence long enough, you'll be able to create simple words from the resulting output.
Conspiracy theorists just love to see patterns where there are none. They forget the other noise and are endlessly fascinated by mere coincidences.
Would just like to provide further context this question. Yes, the point is that PI goes on infinitely. That means there are endless possibilities for sentence structure and letter combination. This means every single combination of letters will happen and is happening in PI. So technically, everything in PI could apply to everything in the observable world around us.

Reverse engineering a check digit algorithm

I am trying to reverse engineer an algorithm used to generate a check digit.
Numbers are 8 digits long and the last digit is the check digit. I have thousands of valid numbers to test it on.
I have tried a standard Luhn, Verhoeff and modulo-10 algorithms (brute force checking of all possible weights), but could not find an answer!
Is it possible to calculate this? Any ideas?
Here is some examples of valid numbers:
1002784-5
1000514-7
1001602-8
1001255-2
1001707-1
1003355-5
1005579-1
1004535-0
1004273-1
1001695-9
1004565-9
1000541-9
1001291-1
1005866-1
1004352-7
EDIT:
Thanks guys - I don't have access to the code unfortunately. The number is a tax number, I need to be able to verify that the number was typed in correctly. From my research is looks like most countries use a pretty standard modulo-10 type system. I've got access to about 60 000 numbers.
I understand that the problem could be impossible to solve, it was more of academic concern.
First check your context:
If context is credit cards, driver's licenses, government licensing numbers (not SSN) think Luhn or Mod 10. If some other industry, does that industry have a defacto standard? If not, is the developer of the system using the numbers also a player in an industry that has a de facto standard?
Nobody likes to reinvent the wheel if they don't have to.
If that doesn't help keep in mind:
Don't assume that all the numbers in the keys you are testing against are used to arrive at the check digit. It's possible only 4 or the 8 digits are being used to calculate the check digit (or any other combination). It's also possible there is some external PREFIX number that is used with the other digits to arrive at the check digit. So... line up all your numbers with the same check digit, and see what the similarities are. Can you add a number to them and then always reach check digit? Can you test only the first few digits? Last few digits? every other digit?
Good luck.
Count how many times in your data (60 thousand) there are digits 0,1,2,3,4,5,6,7,8,9 as a check digit. If the digit 0 occurs twice as often as other digits, it means that the algorithm uses the modulo 11 operation. In this algorithm, if the sum mod 11 = 10, then the check digit is 0.

What methods can I use to analyse and guess 4-bit checksum algorithm?

[Background Story]
I am working with a 5 year old user identification system, and I am trying to add IDs to the database. The problem I have is that the system that reads the ID numbers requires some sort of checksum, and no-one working here now has ever worked with it, so no-one knows how it works.
I have access to the list of existing IDs, which already have correct checksums. Also, as the checksum only has 16 possible values, I can create any ID I want and run it through the authentication system up to 16 times until I get the correct checksum (but this is quite time consuming)
[Question]
What methods can I use to help guess the checksum algorithm of used for some data?
I have tried a few simple methods such as XORing and summing, but these have not worked.
So my question is: if I have data (in hexadecimal) like this:
data checksum
00029921 1
00013481 B
00026001 3
00004541 8
What methods can I use work out what sort of checksum is used?
i.e. should I try sequential numbers such as 00029921,00029922,00029923,... or 00029911,00029921,00029931,... If I do this what patterns should I look for in the changing checksum?
Similarly, would comparing swapped digits tell me anything useful about the checksum?
i.e. 00013481 and 00031481
Is there anything else that could tell me something useful? What about inverting one bit, or maybe one hex digit?
I am assuming that this will be a common checksum algorithm, but I don't know where to start in testing it.
I have read the following links, but I am not sure if I can apply any of this to my case, as I don't think mine is a CRC.
stackoverflow.com/questions/149617/how-could-i-guess-a-checksum-algorithm
stackoverflow.com/questions/2896753/find-the-algorithm-that-generates-the-checksum
cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html
[ANSWER]
I have now downloaded a much larger list of data, and it turned out to be simpler than I was expecting, but for completeness, here is what I did.
data:
00024901 A
00024911 B
00024921 C
00024931 D
00042811 A
00042871 0
00042881 1
00042891 2
00042901 A
00042921 C
00042961 0
00042971 1
00042981 2
00043021 4
00043031 5
00043041 6
00043051 7
00043061 8
00043071 9
00043081 A
00043101 3
00043111 4
00043121 5
00043141 7
00043151 8
00043161 9
00043171 A
00044291 E
From these, I could see that when just one value was increased by a value, the checksum was also increased by the same value as in:
00024901 A
00024911 B
Also, two digits swapped did not change the checksum:
00024901 A
00042901 A
This means that the polynomial value (for these two positions at least) must be the same
Finally, the checksum for 00000000 was A, so I calculated the sum of digits plus A mod 16:
( (Σxi) +0xA )mod16
And this matched for all the values I had. Just to check that there was nothing sneaky going on with the first 3 digits that never changed in my data, I made up and tested some numbers as Eric suggested, and those all worked with this too!
Many checksums I've seen use simple weighted values based on the position of the digits. For example, if the weights are 3,5,7 the checksum might be 3*c[0] + 5*c[1] + 7*c[2], then mod 10 for the result. (In your case, mod 16, since you have 4 bit checksum)
To check if this might be the case, I suggest that you feed some simple values into your system to get an answer:
1000000 = ?
0100000 = ?
0010000 = ?
... etc. If there are simple weights based on position, this may reveal it. Even if the algorithm is something different, feeding in nice, simple values and looking for patterns may be enlightening. As Matti suggested, you/we will likely need to see more samples before decoding the pattern.

How to generate a verification code/number?

I'm working on an application where users have to make a call and type a verification number with the keypad of their phone.
I would like to be able to detect if the number they type is correct or not. The phone system does not have access to a list of valid numbers, but instead, it will validate the number against an algorithm (like a credit card number).
Here are some of the requirements :
It must be difficult to type a valid random code
It must be difficult to have a valid code if I make a typo (transposition of digits, wrong digit)
I must have a reasonable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Given these requirements, how would you generate such a number?
EDIT :
#Haaked: The code has to be numerical because the user types it with its phone.
#matt b: On the first step, the code is displayed on a Web page, the second step is to call and type in the code. I don't know the user's phone number.
Followup : I've found several algorithms to check the validity of numbers (See this interesting Google Code project : checkDigits).
After some research, I think I'll go with the ISO 7064 Mod 97,10 formula. It seems pretty solid as it is used to validate IBAN (International Bank Account Number).
The formula is very simple:
Take a number : 123456
Apply the following formula to obtain the 2 digits checksum : mod(98 - mod(number * 100, 97), 97) => 76
Concat number and checksum to obtain the code => 12345676
To validate a code, verify that mod(code, 97) == 1
Test :
mod(12345676, 97) = 1 => GOOD
mod(21345676, 97) = 50 => BAD !
mod(12345678, 97) = 10 => BAD !
Apparently, this algorithm catches most of the errors.
Another interesting option was the Verhoeff algorithm. It has only one verification digit and is more difficult to implement (compared to the simple formula above).
For 1M combinations you'll need 6 digits. To make sure that there aren't any accidentally valid codes, I suggest 9 digits with a 1/1000 chance that a random code works. I'd also suggest using another digit (10 total) to perform an integrity check. As far as distribution patterns, random will suffice and the check digit will ensure that a single error will not result in a correct code.
Edit: Apparently I didn't fully read your request. Using a credit card number, you could perform a hash on it (MD5 or SHA1 or something similar). You then truncate at an appropriate spot (for example 9 characters) and convert to base 10. Then you add the check digit(s) and this should more or less work for your purposes.
You want to segment your code. Part of it should be a 16-bit CRC of the rest of the code.
If all you want is a verification number then just use a sequence number (assuming you have a single point of generation). That way you know you are not getting duplicates.
Then you prefix the sequence with a CRC-16 of that sequence number AND some private key. You can use anything for the private key, as long as you keep it private. Make it something big, at least a GUID, but it could be the text to War and Peace from project Gutenberg. Just needs to be secret and constant. Having a private key prevents people from being able to forge a key, but using a 16 bit CR makes it easier to break.
To validate you just split the number into its two parts, and then take a CRC-16 of the sequence number and the private key.
If you want to obscure the sequential portion more, then split the CRC in two parts. Put 3 digits at the front and 2 at the back of the sequence (zero pad so the length of the CRC is consistent).
This method allows you to start with smaller keys too. The first 10 keys will be 6 digits.
Does it have to be only numbers? You could create a random number between 1 and 1M (I'd suggest even higher though) and then Base32 encode it. The next thing you need to do is Hash that value (using a secret salt value) and base32 encode the hash. Then append the two strings together, perhaps separated by the dash.
That way, you can verify the incoming code algorithmically. You just take the left side of the code, hash it using your secret salt, and compare that value to the right side of the code.
I must have a reasonnable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Well, if you want it to have at least one million combinations, then you need at least six digits. Is that short enough?
When you are creating the verification code, do you have access to the caller's phone number?
If so I would use the caller's phone number and run it through some sort of hashing function so that you can guarantee that the verification code you gave to the caller in step 1 is the same one that they are entering in step 2 (to make sure they aren't using a friend's validation code or they simply made a very lucky guess).
About the hashing, I'm not sure if it's possible to take a 10 digit number and come out with a hash result that would be < 10 digits (I guess you'd have to live with a certain amount of collision) but I think this would help ensure the user is who they say they are.
Of course this won't work if the phone number used in step 1 is different than the one they are calling from in step 2.
Assuming you already know how to detect which key the user hit, this should be doable reasonably easily. In the security world, there is the notion of a "one time" password. This is sometimes referred to as a "disposable password." Normally these are restricted to the (easily typable) ASCII values. So, [a-zA-z0-9] and a bunch of easily typable symbols. like comma, period, semi colon, and parenthesis. In your case, though, you'd probably want to limit the range to [0-9] and possibly include * and #.
I am unable to explain all the technical details of how these one-time codes are generated (or work) adequately. There is some intermediate math behind it, which I'd butcher without first reviewing it myself. Suffice it to say that you use an algorithm to generate a stream of one time passwords. No matter how mnay previous codes you know, the subsequent one should be impossibel to guess! In your case, you'll simply use each password on the list as the user's random code.
Rather than fail at explaining the details of the implementation myself, I'll direct you to a 9 page article where you can read up on it youself: https://www.grc.com/ppp.htm
It sounds like you have the unspoken requirement that it must be quickly determined, via algorithm, that the code is valid. This would rule out you simply handing out a list of one time pad numbers.
There are several ways people have done this in the past.
Make a public key and private key. Encode the numbers 0-999,999 using the private key, and hand out the results. You'll need to throw in some random numbers to make the result come out to the longer version, and you'll have to convert the result from base 64 to base 10. When you get a number entered, convert it back to base64, apply the private key, and see if the intereting numbers are under 1,000,000 (discard the random numbers).
Use a reversible hash function
Use the first million numbers from a PRN seeded at a specific value. The "checking" function can get the seed, and know that the next million values are good. It can either generate them each time and check one by one when a code is received, or on program startup store them all in a table, sorted, and then use binary search (maximum of compares) since one million integers is not a whole lot of space.
There are a bunch of other options, but these are common and easy to implement.
-Adam
You linked to the check digits project, and using the "encode" function seems like a good solution. It says:
encode may throw an exception if 'bad' data (e.g. non-numeric) is passed to it, while verify only returns true or false. The idea here is that encode normally gets it's data from 'trusted' internal sources (a database key for instance), so it should be pretty usual, in fact, exceptional that bad data is being passed in.
So it sounds like you could pass the encode function a database key (5 digits, for instance) and you could get a number out that would meet your requirements.

Resources