3 Byte output hashing algorithm - algorithm

So for a project that I am working on, I am trying to get a hashing algorithm but I don't know anything about hashing algorithms. The final outcome I would like to archive is inputing a 6 byte value and get 3 unique bytes as my output.
My other alternative is one algorithm that inputs a 2 byte value and outputs 1 unique byte.
Is this possible?
** Edit: I would need this in C language if possible or pseudo code.

Most hash functions can take arbitrary numbers of bytes since they are by nature compression functions. As for the output, you can just take the first 3 bytes of the output. Any cryptographically safe hash function will output bytes that are suitable for this.
For example, in Python it would be:
from hashlib import sha256
s = sha256(<your bytes>)
output = s.digest()[:3]

Related

Need a function to create a smaller hash from a larger hash

Basically I have 24 digit hexadecimal id values. I need a function that can take one of these ids and turn it into a 10 digit decimal value. It does not need to be cryptographically secure. I just need to ensure that the same input will always get the same output, and that the resulting value has as low of a chance as possible to be the same with different inputs.
Something like this should work as you are asking -
shorterHash := MD5(hexId) # 16 byte
return shortedHash.substring(0, 10) # this is still non-colliding enough
I hope there are many standard implementations of MD5 on your language on internet.

Looking for an one-way function with small input and long output

I'm looking for an algorithm, which is a one-way function, like Hash function. And the algorithm accept a small input(serveral bits, less than 512 bits), and map it to a long output(1K Byte or more). Do you know an algorithm or a function like this?
From the Shannon theorem you don't gain any security by having a cyphertext of a size bigger than your plain text, unless the key (or the procedure to create the cyphertext) is different for any input. Even in this case, you will need to assign only one key (or mechanism) for each input x otherwise you violate the definition of a function. So if you apply an encryption mechanism f: X (set of inputs) -> Y (set of outputs), then |Y| <= |X|.
All this to say that if your input is less than 512 bits, you gain nothing by producing a 1KB output. Now, I recommend you to use one of the functions listed on the one-way function wiki page
Keccak has variable length output, (although not evaluated for in SHA-3), it's "security claim is disentangled from the output length. There is a minimum output length..." and Skein hash function has a variable output of up to 16 exabytes
Whatever your reasons are, you can calculate hashes of the same small data using different algorithms, then concatenate those hashes. If the output is not large enough, calculate hashes of hashes and append them.
As pointed in other answers, this doesn't have much sense from security perspective.

How does the md5 hashing algorithm compress data to a fixed length?

I know that MD5 produces a 128-bit digest. My question is, how does it produce this fixed length output from a message of 128bits+?
EDIT:
I have now a greater understanding of hashing functions now. After reading this article I have realized that hash functions are one-way, meaning that you can't convert the hash back to plaintext. I was under the misimpression that you could due to all the online services converting them back to strings, but I have realised that thats just rainbow tables (collections of string's mapped to pre-computed hashes).
When you generate an MD5 hash, you're not compressing the input data. Compression implies that you'll be able to uncompress it back to it's original state. MD5, on the other hand, is a one-way process. This is why it's used for password storage; you ideally have to know the original input string to be able to generate the same MD5 result again.
This page provides a nice graphic-equipped explanation of MD5 and similar hash functions, and how they're used: An Illustrated Guide to Cryptographic Hashes
Consider something like starting with a 128-bit value, and taking input 128 bits at a time, and XORing each of those input blocks with the existing value.
MD5 is considerably more complex than that, but the general idea is the same: input is processed 128 bits at a time. Each input block can change the value of the result, but has no effect on the length.
It has noting (or, better, few) to do with compression. There is an algorithm which produces for every initial state and byte a new state. This state is more or less unique to this combination of inputs.
In short, it will split into many parts and do operation.
If you are wonder about the collsion, consider your message is only Readable.
The bit space is much bigger than readable char space.

A function where small changes in input always result in large changes in output

I would like an algorithm for a function that takes n integers and returns one integer. For small changes in the input, the resulting integer should vary greatly. Even though I've taken a number of courses in math, I have not used that knowledge very much and now I need some help...
An important property of this function should be that if it is used with coordinate pairs as input and the result is plotted (as a grayscale value for example) on an image, any repeating patterns should only be visible if the image is very big.
I have experimented with various algorithms for pseudo-random numbers with little success and finally it struck me that md5 almost meets my criteria, except that it is not for numbers (at least not from what I know). That resulted in something like this Python prototype (for n = 2, it could easily be changed to take a list of integers of course):
import hashlib
def uniqnum(x, y):
return int(hashlib.md5(str(x) + ',' + str(y)).hexdigest()[-6:], 16)
But obviously it feels wrong to go over strings when both input and output are integers. What would be a good replacement for this implementation (in pseudo-code, python, or whatever language)?
A "hash" is the solution created to solve exactly the problem you are describing. See wikipedia's article
Any hash function you use will be nice; hash functions tend to be judged based on these criteria:
The degree to which they prevent collisions (two separate inputs producing the same output) -- a by-product of this is the degree to which the function minimizes outputs that may never be reached from any input.
The uniformity the distribution of its outputs given a uniformly distributed set of inputs
The degree to which small changes in the input create large changes in the output.
(see perfect hash function)
Given how hard it is to create a hash function that maximizes all of these criteria, why not just use one of the most commonly used and relied-on existing hash functions there already are?
From what it seems, turning integers into strings almost seems like another layer of encryption! (which is good for your purposes, I'd assume)
However, your question asks for hash functions that deal specifically with numbers, so here we go.
Hash functions that work over the integers
If you want to borrow already-existing algorithms, you may want to dabble in pseudo-random number generators
One simple one is the middle square method:
Take a digit number
Square it
Chop off the digits and leave the middle digits with the same length as your original.
ie,
1111 => 01234321 => 2342
so, 1111 would be "hashed" to 2342, in the middle square method.
This way isn't that effective, but for a few number of hashes, this has very low collision rates, a uniform distribution, and great chaos-potential (small changes => big changes). But if you have many values, time to look for something else...
The grand-daddy of all feasibly efficient and simple random number generators is the (Mersenne Twister)[http://en.wikipedia.org/wiki/Mersenne_twister]. In fact, an implementation is probably out there for every programming language imaginable. Your hash "input" is something that will be called a "seed" in their terminology.
In conclusion
Nothing wrong with string-based hash functions
If you want to stick with the integers and be fancy, try using your number as a seed for a pseudo-random number generator.
Hashing fits your requirements perfectly. If you really don't want to use strings, find a Hash library that will take numbers or binary data. But using strings here looks OK to me.
Bob Jenkins' mix function is a classic choice, at when n=3.
As others point out, hash functions do exactly what you want. Hashes take bytes - not character strings - and return bytes, and converting between integers and bytes is, of course, simple. Here's an example python function that works on 32 bit integers, and outputs a 32 bit integer:
import hashlib
import struct
def intsha1(ints):
input = struct.pack('>%di' % len(ints), *ints)
output = hashlib.sha1(input).digest()
return struct.unpack('>i', output[:4])
It can, of course, be easily adapted to work with different length inputs and outputs.
Have a look at this, may be you can be inspired
Chaotic system
In chaotic dynamics, small changes vary results greatly.
A x-bit block cipher will take an number and convert it effectively to another number. You could combine (sum/mult?) your input numbers and cipher them, or iteratively encipher each number - similar to a CBC or chained mode. Google 'format preserving encyption'. It is possible to create a 32-bit block cipher (not widely 'available') and use this to create a 'hashed' output. Main difference between hash and encryption, is that hash is irreversible.

Can I identify a hash algorithm based on the initial key and output hash?

If I have both the initial key and the hash that was created, is there any way to determine what hashing algorithm was used?
For example:
Key: higher
Hash: df072c8afcf2385b8d34aab3362020d0
Algorithm: ?
By looking at the length, you can decide which algorithms to try. MD5 and MD2 produce 16-byte digests. SHA-1 produces 20 bytes of output. Etc. Then perform each hash on the input and see if it matches the output. If so, that's your algorithm.
Of course, if more than the "key" was hashed, you'll need to know that too. And depending on the application, hashes are often applied iteratively. That is, the output of the hash is hashed again, and that output is hashed… often thousands of times. So if you know in advance how many iterations were performed, that can help too.
There's nothing besides the length in the output of a cryptographic hash that would help narrow down the algorithm that produced it.
Well, given that there are a finite number of popular hash algorithms, maybe what you propose is not so ridiculous.
But suppose I asked you this:
If I have an input and an output, can
I determine the function?
Generally speaking, no, you cannot determine the inner-workings of any function simply from knowing one input and one output, without any additional information.
// very, very basic illustration
if (unknownFunction(2) == 4) {
// what does unknownFunction do?
// return x + 2?
// or return x * 2?
// or return Math.Pow(x, 2)?
// or return Math.Pow(x, 3) - 4?
// etc.
}
The hash seems to contain only hexadecimal characters (each character represents 4bits)
Total count is 32 characters -> this is a 128-bits length hash.
Standard hashing algorithms that comply with these specs are: haval, md2, md4, md5 and ripemd128.
Highest probability is that MD5 was used.
md5("higher") != df072c8afcf2385b8d34aab3362020d0
Highest probability is that some salt was used.
Highest probability still remains MD5.
Didn't match any of the common hashing algorithms:
http://www.fileformat.info/tool/hash.htm?text=higher
Perhaps a salt was added prior to hashing...
Not other than trying out a bunch that you know and seeing if any match.

Resources