Does the MD5 change from encryption? [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have something I've been researching into and I can't find an answer or maybe just understand. When encrypting a file WITHOUT changing the contents does that change the MD5 Sum/Hash of a file? Like a Word file with the same unchanged string of characters being encrypted, does the encrypted file hold the same MD5 Sum as the encrypted Word file?

Yes, encrypting a file should substantially change any hash of a file.
Cryptographic hash codes are constructed such that hashing any two different strings should produce wildly different results, even if there's a close connection between the original strings. For example, the MD5 hash of "hello" is
5d41402abc4b2a76b9719d911017c592
while the MD5 hash of "hello?" is
3809718a10a0f59bcf6d4939c10fd28d
Encrypting a file should, with good encryption, make the resulting file look statistically random. Consequently, if you were to hash an encrypted file, it should give a hash that's statistically indistinguishable from the hash of a random string. That means that the probability that you should get the same hash output would, roughly speaking, be about 1 / N, where N is the number of possible hash outputs. For even a decently good hash function, this should be astronomically small.

That depends; if you create the exact same plaintext from the ciphertext (which is the name for the encrypted plaintext) and hash that then the MD5 sum will be the same. If you just hash the ciphertext then the hash will be different.
Cryptographically secure hashes should always be different from each other, even if only a single bit of input changes. Even though there are unlimited messages that hash to the same value, it should be impossible to find another message that computes to the same hash (this is called a collision).
Note that the MD5 hash function is broken. If an attacker can generate the files to be hashed then it is possible to generate two different files with the same hash. So it is very easy to create two programs that do different things but hash to the same MD5 hash. So use a hash function that has not been broken, e.g. SHA-256 or SHA-512 would be considered a good option.

Encrypting a file changes the contents that are stored on disk. The MD5 hash of a file's contents does not know (or care) whether it is encrypted or not, it just reads bytes from the disk. Since the bytes are different between plaintext and encrypted, the MD5 hash will be different.

Related

How does MD5 hashing not run out of hashes?

If I am not mistaken, MD5 is a hash 32 chars long. If MD5 is only 32 chars long and we can make a string infinitely long, how is every hash different? what is the upper limit of MD5 and how exactly is it completely unpredictable?
MD5, like all cryptographically secure hash functions, do not guarantee that every hash is different, just that is is highly unlikely and difficult to find two inputs that produce the same output.
MD5 is actually 16 8-bit bytes, 128-bits. Because of the short output (128-bits) and some internal issues, MD5 is no longer considered sufficient for most uses and generally SHA-256 is a good replacement.
You can have collisions.
Both:
d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89 55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70
d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70
Give the same hash:
79054025255fb1a26e4bc422aef54eb4
Every hash can have collisions. It's just what are the chances of that happening?
32-bit hashes have an even higher chance of colliding:
cataract collides with periti
roquette collides with skivie
shawl collides with stormbound
dowlases collides with tramontane
cricketings collides with twanger
longans collides with whigs
You are wrong, MD5 is a deterministic hashing algorithm, there is nothing random or randomized in MD5.
Of course if you apply MD5 to data there may be other data generating the same MD5 value. This is known als collision.

Hash an integer by another integer [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'd like a way to hash an integer using another integer. It should produce a new hashed integer. It should accept an integer input and a key, the input is then hashed by the key and produced as an integer. The method would look like hash_method(input, key). Collisions won't matter here, I am not using them for security or comparison. I'm pretty sure this is possible seeing how some security algorithms that use challenges do something similar. How would I go about this in ruby?
Hash routines are many and varied, and usually picked to match details of expected input distribution, purpose of the hash value, and speed to generate them.
However, you can make use of existing hash routines in Ruby's standard library. The output of most cryptographic hash functions is a string of bytes that can easily be interpreted as an integer. For your purpose, you just need to decide on a suitable maximum value by restricting the length.
Cryptographic hashes also have the advantage in your case that they make high quality pseudo-random functions - given two inputs that differ by only a single bit, the results will not correlate.
The HMAC construct combines a hash function with two inputs - a message and a secret. Using your input number as the message, and key as the secret allows you to use the function as-is.
There is nothing special about using Integer inputs for most standard hashing function, which on the whole just munges bytes ignoring data types. Given that you don't seem to care about specific values that the hash outputs, you can simply convert your numbers to String values and feed them into a standard hashing function. This is perfectly OK, there is no reason not to do this, unless you need to differentiate between 1 and "1" using the same hashing function.
Like this:
require 'openssl'
input = 25
key = 106
full_hash = OpenSSL::HMAC.hexdigest(
OpenSSL::Digest.new('sha1'), key.to_s, input.to_s )
# This is an unsigned 32-bit integer
result = full_hash[0..7].to_i(16)
# => 2746028024
This result is suitable for checksums, or for algorithms that want pseudo-random re-distribution of values. It has a flaw, in that the speed will not be high in cases where you need to generate many values.
You could make this a lot simpler if you were happy with a lower quality of randomness - you could use a linear congruential generator for example. This would likely be faster than the above, but might exhibit unwanted patterns in the output.

How does the md5 hashing algorithm compress data to a fixed length?

I know that MD5 produces a 128-bit digest. My question is, how does it produce this fixed length output from a message of 128bits+?
EDIT:
I have now a greater understanding of hashing functions now. After reading this article I have realized that hash functions are one-way, meaning that you can't convert the hash back to plaintext. I was under the misimpression that you could due to all the online services converting them back to strings, but I have realised that thats just rainbow tables (collections of string's mapped to pre-computed hashes).
When you generate an MD5 hash, you're not compressing the input data. Compression implies that you'll be able to uncompress it back to it's original state. MD5, on the other hand, is a one-way process. This is why it's used for password storage; you ideally have to know the original input string to be able to generate the same MD5 result again.
This page provides a nice graphic-equipped explanation of MD5 and similar hash functions, and how they're used: An Illustrated Guide to Cryptographic Hashes
Consider something like starting with a 128-bit value, and taking input 128 bits at a time, and XORing each of those input blocks with the existing value.
MD5 is considerably more complex than that, but the general idea is the same: input is processed 128 bits at a time. Each input block can change the value of the result, but has no effect on the length.
It has noting (or, better, few) to do with compression. There is an algorithm which produces for every initial state and byte a new state. This state is more or less unique to this combination of inputs.
In short, it will split into many parts and do operation.
If you are wonder about the collsion, consider your message is only Readable.
The bit space is much bigger than readable char space.

What is password hashing? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
What does it mean to hash a password?
Definition:
Hashing is the application of a function f() to a variable sized input to produce a constant sized output.
A => f() => X
B => f() => Y
C => f() => Z
A hash is also a one-way function which means that there isn't a function to reverse or undo a hash. As well re-applying the hash f(f(x)) isn't going to product x again.
The Details:
A hash function can be as simple as "add 13 to the input" or complex like a Cryptographic Hash such as MD5 or SHA1. There are many things that constitute a good hash function like:
Low Cost: Easy to compute
Deterministic: if I hash the input a multiple times, I am going to get the same output each time
Uniformity: The input will be evenly distributed among the possible outputs. This falls in line with something called the Pigeonhole Principle. Since there are a limited number of outputs we want f() to place those outputs evenly instead of in the same bucket. When two inputs compute to the same output this is known as a collision. It's a good thing for a hash function to produce fewer collisions.
Hashing applied to Passwords:
The hashing of passwords is the same process as described above, however it comes with some special considerations. Many of the properties that make up a good hash function are not beneficial when it comes to passwords.
Take for example determinism, because hashes produce a deterministic result when two people use the same password the hash is going to look the same in the password store. This is a bad thing! However this is mitigated by something called a salt.
Uniformity on the other hand is beneficial because the desire is for the algorithm to limit collisions.
Because a hash is One-Way means the input cannot be determined from the output, which is why hashing is great for passwords!
takes a block of data and returns a string such that you can't get your original block of data back.
Wikipedia Article
Hashing a password will take a clear text string and perform an algorithm on it (depending on the hash type) to get a completely different value. This value will be the same every time, so you can store the hashed password in a database and check the user's entered password against the hash.
This prevents you from storing the cleartext passwords in the database (bad idea).
Here is a list of hash functions.
A hash is simply a one-way function, that will take a string or data source and create an encrypted looking string.
There are various hashing algorithms the most popular is MD5, but there are many others. Many experts in the industry are using the SHA256 algorithm for better security.
MD5 Hash for the words:
password is 22e5ab5743ea52caf34abcc02c0f161d
PASSWORD is 319f4d26e3c536b5dd871bb2c52e3178
The character length of the result will be the same regardless of how many characters you try to hash. Hashes are commonly used to store passwords to prevent them from being viewed.

Knowing the plaintext, how to discover the encryption scheme used? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 14 years ago.
Improve this question
I have some char() fields in a DBF table that were left encrypted by a past developer in the project.
However, I know the plaintext result of the decryption of several records. How can I determine the function/algorithm/scheme to decrypt the original data?
These are some sample fields:
For cryptext:
b5 01 02 c1 e3 0d 0a
plaintext should be:
3543921 or 3.543.921
And for cryptext:
41 c3 c5 07 17 0d 0a
plaintext should be
1851154 or 1.851.154
I believe 0d 0a is just padding. Was from data gathered in win-1252 encoding (dunno if matters)
EDIT: It's for the sake of curiosity and learning. I want to be able to undestand the encryption used(seems a simple one, although is binary data) to recover the value of the fields for the tuples whose plaintext I don't know.
EDIT 2: Added a couple samples.
There is no easy way in general case. This question is too general. Try posting these plain + encrypted strings.
EDIT:
for the sake of learning you can read this article : Cryptography on Wikipedia
if you really beleive the encryption is simple - check if it's a byte (or word) level XOR - see the following pseudocode
for (i in originalString) {
newString[i] = originalString[i] ^ CRYPT_BYTE;
}
Assuming it's not something as simple as a substitution cipher (try frequency analysis) or a poorly applied XOR (e.g., reusing the key; try XORing two ciphertexts with known plaintexts and then see whether the result is the XOR of the plaintexts; or try XORing the ciphertext with itself shifted by some number of bytes), you should probably assume it's well-known stream/block cipher with an unknown key (which most likely consists of ASCII characters). If you have a big enough sample of ciphertext-plaintext pairs, you could start by checking whether plaintexts with the same first few characters/bytes have ciphertexts with the same first characters/bytes. There you might also see whether it's a block or a stream cipher and whether there is any feedback mechanism involved. Padding, if present, might also suggest that it's a block cipher rather than a stream cipher.
Depending on how much effort you want to put into it, you should be able to get somewhere. Start by reading up on cryptanalysis, in particular the methods of cryptanalysis.
The things that will determine how easy this task will be are:
how good the encryption method used is; if it's a recent, well-regarded method such as RSA or AES, you're probably out of luck
how much ciphertext and plaintext you have -- the more the better
what kind of data it is -- simple text is the easiest, while random data would be the hardest
whether the data is all encrypted with the same key, or whether multiple keys have been used.
The key to success is don't be disheartened; the history of cryptanalysis is filled with stories of supposedly unbreakable codes being cracked; perhaps the most famous is the Enigma machine from World War II, the cracking of which contributed to the development of modern computers.
We can tell a few things from what you've provided:
With a ciphertext length of 7 bytes in each case, it's unlikely to be a block cipher (since block ciphers encrypt a block at a time, their length will be a multiple of the blocksize, and a blocksize of 56 bits is pretty unlikely*).
The length of the ciphertext and the number of characters in the plaintext is the same in each case, so it could be straightforward encoding of numbers as ascii with a stream cipher applied.
XORing the plaintext (as ascii) and the ciphertext together gives neither a single repeated octet nor the same cryptostream for each, so it's not a trivial cipher. It's also not a simple stream cipher using the same key for both, unless some of the ciphertext bytes are an IV.
The last two bytes are identical in ciphertext but not in plaintext. This could be a coincidence but also could be indicative of padding as you suggest. If they are padding, some other encoding mechanism must be used.
Do you know if all the encrypted values are integers, or are other values also possible?
Determining the algorithm used without the corresponding key may not be entirely useful.
If the text is small enough, and you have the plaintext, why would you ant to figure it out? Other than, of course, for curiosity sake?
There's no deterministic way to tell, but often there are hints in the ciphertext. Is it really encrypted (with some sort of key)? Or is it just hashed and (possibly) salted.
If it's hashed, you could get lucky and just google for a matching pair (assuming you have any that are dictionary words) because there are pre-hashed dictionaries already online.
If you have an example of the ciphertext, you could post it, someone might recognize the cipher format...
I think it's a misconception that XOR is an easily decryptable scheme. The theoretically strongest form of encryption is a one-time pad: simply a string of predetermined bits which you xor your plaintext with...
Finite XORs, on the other hand...

Resources