Knowing the plaintext, how to discover the encryption scheme used? [closed] - algorithm

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 14 years ago.
Improve this question
I have some char() fields in a DBF table that were left encrypted by a past developer in the project.
However, I know the plaintext result of the decryption of several records. How can I determine the function/algorithm/scheme to decrypt the original data?
These are some sample fields:
For cryptext:
b5 01 02 c1 e3 0d 0a
plaintext should be:
3543921 or 3.543.921
And for cryptext:
41 c3 c5 07 17 0d 0a
plaintext should be
1851154 or 1.851.154
I believe 0d 0a is just padding. Was from data gathered in win-1252 encoding (dunno if matters)
EDIT: It's for the sake of curiosity and learning. I want to be able to undestand the encryption used(seems a simple one, although is binary data) to recover the value of the fields for the tuples whose plaintext I don't know.
EDIT 2: Added a couple samples.

There is no easy way in general case. This question is too general. Try posting these plain + encrypted strings.
EDIT:
for the sake of learning you can read this article : Cryptography on Wikipedia
if you really beleive the encryption is simple - check if it's a byte (or word) level XOR - see the following pseudocode
for (i in originalString) {
newString[i] = originalString[i] ^ CRYPT_BYTE;
}

Assuming it's not something as simple as a substitution cipher (try frequency analysis) or a poorly applied XOR (e.g., reusing the key; try XORing two ciphertexts with known plaintexts and then see whether the result is the XOR of the plaintexts; or try XORing the ciphertext with itself shifted by some number of bytes), you should probably assume it's well-known stream/block cipher with an unknown key (which most likely consists of ASCII characters). If you have a big enough sample of ciphertext-plaintext pairs, you could start by checking whether plaintexts with the same first few characters/bytes have ciphertexts with the same first characters/bytes. There you might also see whether it's a block or a stream cipher and whether there is any feedback mechanism involved. Padding, if present, might also suggest that it's a block cipher rather than a stream cipher.

Depending on how much effort you want to put into it, you should be able to get somewhere. Start by reading up on cryptanalysis, in particular the methods of cryptanalysis.
The things that will determine how easy this task will be are:
how good the encryption method used is; if it's a recent, well-regarded method such as RSA or AES, you're probably out of luck
how much ciphertext and plaintext you have -- the more the better
what kind of data it is -- simple text is the easiest, while random data would be the hardest
whether the data is all encrypted with the same key, or whether multiple keys have been used.
The key to success is don't be disheartened; the history of cryptanalysis is filled with stories of supposedly unbreakable codes being cracked; perhaps the most famous is the Enigma machine from World War II, the cracking of which contributed to the development of modern computers.

We can tell a few things from what you've provided:
With a ciphertext length of 7 bytes in each case, it's unlikely to be a block cipher (since block ciphers encrypt a block at a time, their length will be a multiple of the blocksize, and a blocksize of 56 bits is pretty unlikely*).
The length of the ciphertext and the number of characters in the plaintext is the same in each case, so it could be straightforward encoding of numbers as ascii with a stream cipher applied.
XORing the plaintext (as ascii) and the ciphertext together gives neither a single repeated octet nor the same cryptostream for each, so it's not a trivial cipher. It's also not a simple stream cipher using the same key for both, unless some of the ciphertext bytes are an IV.
The last two bytes are identical in ciphertext but not in plaintext. This could be a coincidence but also could be indicative of padding as you suggest. If they are padding, some other encoding mechanism must be used.
Do you know if all the encrypted values are integers, or are other values also possible?

Determining the algorithm used without the corresponding key may not be entirely useful.
If the text is small enough, and you have the plaintext, why would you ant to figure it out? Other than, of course, for curiosity sake?

There's no deterministic way to tell, but often there are hints in the ciphertext. Is it really encrypted (with some sort of key)? Or is it just hashed and (possibly) salted.
If it's hashed, you could get lucky and just google for a matching pair (assuming you have any that are dictionary words) because there are pre-hashed dictionaries already online.
If you have an example of the ciphertext, you could post it, someone might recognize the cipher format...

I think it's a misconception that XOR is an easily decryptable scheme. The theoretically strongest form of encryption is a one-time pad: simply a string of predetermined bits which you xor your plaintext with...
Finite XORs, on the other hand...

Related

How does the aes256 encryption algorithm deal with keys whose length is not equal to 32 bytes?

The reason why I ask this question is that we all know that this algorithm will fill the plaintext data into a multiple of 32 bytes,
so how will the key with less than 32 bytes or more be handled?
Because aes256 encryption algorithm is used in many websites or programs,
and usually we don't set a 32 byte password.
In that case, how should the algorithm go on?
Or is there any place where I can perfectly read the algorithms of all modes of aes256?
I am willing to check the source code of the algorithm by myself.
(this is not an advertisement)
but before that, I wrote an encryption algorithm myself.
I named it "sn_aes2048", Its function is:
"if the plaintext data is not a multiple of 256 bytes,
it will be filled with a multiple of 256 bytes, and the key is the same operation.
16 rounds of encryption will be performed by default,
and the data of the key will be updated in each round of encryption.
You may not believe that this algorithm is both symmetric encryption and asymmetric encryption.
Yes, yes, it is an encryption algorithm similar to aes256."
The reason why I ask this question is that we all know that this algorithm will fill the plaintext data into a multiple of 32 bytes,
AES is a block cipher, which must be used with a mode of operation to be used as a general cipher. Some mode of operations (ECB, CBC) require padding (or ciphertext stealing) to be able to operate. So AES - the block cipher algorithm - doesn't do that, and many more modes of operation (CTR, GCM) don't require padding at all.
so how will the key with less than 32 bytes or more be handled?
AES - the block cipher - supports key sizes of 128, 192 and 256 bits, and that's it. It doesn't perform any actions on the key itself.
and usually we don't set a 32 byte password.
Yes but a password is not a key. Both are secrets, but there are different requirements for keys and passwords. You can indeed use a password based key derivation function (PBKDF) as has been commented below. Other methods exist as well such as PAKE schemes.
You may not believe that this algorithm is both symmetric encryption and asymmetric encryption.
I don't believe it can be any good if you don't even understand the concepts of a symmetric key and a password - or the concept of padding which you're trying to re-invent, but feel free to publish a paper.
Or is there any place where I can perfectly read the algorithms of all modes of aes256?
Try "block cipher mode of operation" and "Padding" on Wikipedia for a start. Then buy a book or follow a course on Cryptography. It is an academic field - creating your own algorithm from scratch is like screwing together your own automobile.

How does the md5 hashing algorithm compress data to a fixed length?

I know that MD5 produces a 128-bit digest. My question is, how does it produce this fixed length output from a message of 128bits+?
EDIT:
I have now a greater understanding of hashing functions now. After reading this article I have realized that hash functions are one-way, meaning that you can't convert the hash back to plaintext. I was under the misimpression that you could due to all the online services converting them back to strings, but I have realised that thats just rainbow tables (collections of string's mapped to pre-computed hashes).
When you generate an MD5 hash, you're not compressing the input data. Compression implies that you'll be able to uncompress it back to it's original state. MD5, on the other hand, is a one-way process. This is why it's used for password storage; you ideally have to know the original input string to be able to generate the same MD5 result again.
This page provides a nice graphic-equipped explanation of MD5 and similar hash functions, and how they're used: An Illustrated Guide to Cryptographic Hashes
Consider something like starting with a 128-bit value, and taking input 128 bits at a time, and XORing each of those input blocks with the existing value.
MD5 is considerably more complex than that, but the general idea is the same: input is processed 128 bits at a time. Each input block can change the value of the result, but has no effect on the length.
It has noting (or, better, few) to do with compression. There is an algorithm which produces for every initial state and byte a new state. This state is more or less unique to this combination of inputs.
In short, it will split into many parts and do operation.
If you are wonder about the collsion, consider your message is only Readable.
The bit space is much bigger than readable char space.

Simple integer encryption

Is there a simple algorithm to encrypt integers? That is, a function E(i,k) that accepts an n-bit integer and a key (of any type) and produces another, unrelated n-bit integer that, when fed into a second function D(E(i),k) (along with the key) produces the original integer?
Obviously there are some simple reversible operations you can perform, but they all seem to produce clearly related outputs (e.g. consecutive inputs lead to consecutive outputs). Also, of course, there are cryptographically strong standard algorithms, but they don't produce small enough outputs (e.g. 32-bit). I know any 32-bit cryptography can be brute-forced, but I'm not looking for something cryptographically strong, just something that looks random. Theoretically speaking it should be possible; after all, I could just create a dictionary by randomly pairing every integer. But I was hoping for something a little less memory-intensive.
Edit: Thanks for the answers. Simple XOR solutions will not work because similar inputs will produce similar outputs.
Would not this amount to a Block Cipher of block size = 32 bits ?
Not very popular, because it's easy to break. But theorically feasible.
Here is one implementation in Perl :
http://metacpan.org/pod/Crypt::Skip32
UPDATE: See also Format preserving encryption
UPDATE 2: RC5 supports 32-64-128 bits for its block size
I wrote an article some time ago about how to generate a 'cryptographically secure permutation' from a block cipher, which sounds like what you want. It covers using folding to reduce the size of a block cipher, and a trick for dealing with non-power-of-2 ranges.
A simple one:
rand = new Random(k);
return (i xor rand.Next())
(the point xor-ing with rand.Next() rather than k is that otherwise, given i and E(i,k), you can get k by k = i xor E(i,k))
Ayden is an algorithm that I developed. It is compact, fast and looks very secure. It is currently available for 32 and 64 bit integers. It is on public domain and you can get it from http://github.com/msotoodeh/integer-encoder.
You could take an n-bit hash of your key (assuming it's private) and XOR that hash with the original integer to encrypt, and with the encrypted integer to decrypt.
Probably not cryptographically solid, but depending on your requirements, may be sufficient.
If you just want to look random and don't care about security, how about just swapping bits around. You could simply reverse the bit string, so the high bit becomes the low bit, second highest, second lowest, etc, or you could do some other random permutation (eg 1 to 4, 2 to 7 3 to 1, etc.
How about XORing it with a prime or two? Swapping bits around seems very random when trying to analyze it.
Try something along the lines of XORing it with a prime and itself after bit shifting.
How many integers do you want to encrypt? How much key data do you want to have to deal with?
If you have few items to encrypt, and you're willing to deal with key data that's just as long as the data you want to encrypt, then the one-time-pad is super simple (just an XOR operation) and mathematically unbreakable.
The drawback is that the problem of keeping the key secret is about as large as the problem of keeping your data secret.
It also has the flaw (that is run into time and again whenever someone decides to try to use it) that if you take any shortcuts - like using a non-random key or the common one of using a limited length key and recycling it - that it becomes about the weakest cipher in existence. Well, maybe ROT13 is weaker.
But in all seriousness, if you're encrypting an integer, what are you going to do with the key no matter which cipher you decide on? Keeping the key secret will be a problem about as big (or bigger) than keeping the integer secret. And if you're encrypting a bunch of integers, just use a standard, peer reviewed cipher like you'll find in many crypto libraries.
RC4 will produce as little output as you want, since it's a stream cipher.
XOR it with /dev/random

Encryption algorithm that output byte by byte based on password and offset

Is there a well-known (to be considered) algorithm that can encrypt/decrypt any arbitrary byte inside the file based on the password entered and the offset inside the file.
(Databyte, Offset, Password) => EncryptedByte
(EncryptedByte, Offset, Password) => DataByte
And is there some fundamental weakness in this approach or it's still theoretically possible to build it strong enough
Update:
More datails: Any cryptographic algorithm has input and output. For many existing ones the input operates on large blocks. I want to operate on only one byte, but the system based on this can only can remap bytes and weak by default, but if we take the position in the file of this byte, we for example can take the bits of this position value to interpret them as some operation on some step (0: xor, 1: shitf) and create the encrypted byte with this. But it's too simple, I'm looking for something stronger.
Maybe it's not very efficient but how about this:
for encryption use:
encryptedDataByte = Encrypt(offset,key) ^ dataByte
for decryption use:
dataByte = Encrypt(offset,key) ^ encryptedDataByte
Where Encrypt(offset,key) might be e.g. 3DES or AES (with padding the offset, if needed, and throwing away all but one result bytes)
If you can live with block sizes of 16 byte, you can try the XTS-mode described in the wikipedia article about Disk encryption theory (the advantage being that some good cryptologists already looked at it).
If you really need byte-wise encryption, I doubt that there is an established solution. In the conference Crypto 2009 there was a talk about How to Encipher Messages on a Small Domain: Deterministic Encryption and the Thorp Shuffle. In your case the domain is a byte, and as this is a power of 2, a Thorp Shuffle corresponds to a maximally unbalanced Feistel network. Maybe one can build something using the position and the password as key, but I'd be surprised if a home-made solution will be secure.
You can use AES in Counter Mode where you divide your input into blocks of 16 bytes (128 bits) and then basically encrypt a counter on the block number to get a pseudo-random 16 bytes that you can XOR with the plaintext. It is critically important to not use the same counter start value (and/or initialization vector) for the same key ever again or you will open yourself for an easy attack where an attacker can use a simple xor to recover the key.
You mention that you want to only operate on individual bytes, but this approach would give you that flexibility. Output Feedback Mode is another common one, but you have to be careful in its use.
You might consider using the EAX mode for better security. Also, make sure you're using something like PBKDF-2 or scrypt to generate your encryption key from the password.
However, as with most cryptography related issues, it's much better to use a rigorously tested and evaluated library rather than rolling your own.
Basically what you need to do is generate some value X (probably 1 byte) based on the offset and password, and use this to encrypt/decrypt the byte at that offset. We'll call it
X = f(offset,password)
The problem is that an attacker that "knows something" about the file contents (e.g. the file is English text, or a JPEG) can come up with an estimate (or sometimes be certain) of what an X could be. So he has a "rough idea" about many X values, and for each of these he knows what the offset is. There is a lot of information available.
Now, it would be nice if all that information were of little use to the attacker. For most purposes, using a cryptographic hash function (like SHA-1) will give you a reasonable assurance of decent security.
But I must stress that if this is something critical, consult an expert.
One possibility is a One Time Pad, possibly using the password to seed some pseudo-random number generator. One time pads theoretically achieve perfect secrecy, but there are some caveats. It should do what you're looking for though.

two-way keyed encryption/hash algorithm

I am no way experienced in this type of thing so I am not even sure of the keywords (hence the title).
Basically I need a two way function
encrypt(w,x,y) = z
decrypt(z) = w, x, y
Where w = integer
x = string (username)
y = unix timestamp
and z = is an 8 digit number (possibly including letters, spec isn't there yet.)
I would like z to be not easily guessable and easily verifiable. Speed isn't a huge concern, security isn't either. Tracking one-to-one relationship is the main requirement.
Any resources or direction would be appreciated.
EDIT
Thanks for the answers, learning a lot. So to clarify, 8 characters is the only hard requirement, along with the ability to link W <-> Z. The username (Y) and timestamp (Z) would be considered icing on the cake.
I would like to do this mathematically rather than doing some database looks up, if possible.
If i had to finish this tonight, I could just find a fitting hash algorithm and use a look up table. I am simply trying to expand my understanding of this type of thing and see if I could do it mathematically.
Encryption vs. Hashing
This is an encryption problem, since the original information needs to be recovered. The quality of a cryptographic hash is judged by how difficult it is to reverse the hash and recover the original information, so hashing is not applicable here.
To perform encryption, some key material is needed. There are many encryption algorithms, but they fall into two main groups: symmetric and asymmetric.
Application
The application here isn't clear. But if you are "encrypting" some information and sending it somewhere, then later getting it back and doing something with it, symmetric encryption is the way to go. For example, say you want to encode a user name, an IP address, and some identifier from your application in a parameter that you include in a link in some HTML. When the user clicks the link, that parameter is passed back to your application and you decode it to recover the original information. That's a great fit for symmetric encryption, because the sender and the recipient are the same party, and key exchange is a no-op.
Background
In symmetric encryption, the sender and recipient need to know the same key, but keep it secret from everyone else. As a simple example, two people could meet in person, and decide on a password. Later on, they could use that password to keep their email to each other private. However, anyone who overhears the password exchange will be able to spy on them; the exchange has to happen over a secure channel... but if you had a secure channel to begin with, you wouldn't need to exchange a new password.
In asymmetric encryption, each party creates a pair of keys. One is public, and can be freely distributed to anyone who wants to send a private message. The other is private. Only the message recipient knows that private key.
A big advantage to symmetric encryption is that it is fast. All well-designed protocols use a symmetric algorithm to encrypt large amounts of data. The downside is that it can be difficult to exchange keys securely—what if you can't "meet up" (virtually or physically) in a secure place to agree on a password?
Since public keys can be freely shared, two people can exchange a private message over an insecure channel without having previously agreed on a key. However, asymmetric encryption is much slower, so its usually used to encrypt a symmetric key or perform "key agreement" for a symmetric cipher. SSL and most cryptographic protocols go through a handshake where asymmetric encryption is used to set up a symmetric key, which is used to protect the rest of the conversation.
You just need to encrypt a serialization of (w, x, y) with a private key. Use the same private key to decrypt it.
In any case, the size of z cannot be simply bounded like you did, since it depends on the size of the serialization (since it needs to be two way, there's a bound on the compression you can do, depending on the entropy).
And you are not looking for a hash function, since it would obviously lose some information and you wouldn't be able to reverse it.
EDIT: Since the size of z is a hard limit, you need to restrict the input to 8 bytes, and choose a encryption technique that use 64 bits (or less) block size. Blowfish and Triple DES use 64 bits blocks, but remember that those algorithms didn't receive the same scrutiny as AES.
If you want something really simple and quite unsecure, just xor your input with a secret key.
You probably can't.
Let's say that w is 32 bits, x supports at least 8 case-insensitive ASCII chars, so at least 37 bits, and y is 32 bits (gets you to 2038, and 31 bits doesn't even get you to now).
So, that's a total of at least 101 bits of data. You're trying to store it in an 8 digit number. It's mathematically impossible to create an invertible function from a larger set to a smaller set, so you'd need to store more than 12.5 bits per "digit".
Of course if you go to more than 8 characters, or if your characters are 16 bit unicode, then you're at least in with a chance.
Let's formalize your problem, to better study it.
Let k be a key from the set K of possible keys, and (w, x, y) a piece of information, from a set I, that we need to crypt. Let's define the set of "crypted-messages" as A8, where A is the alphabet from which we extract the characters to our crypted message (A = {0, 1, ..., 9, a, b, ..., z, ... }, depending on your specs, as you said).
We define the two functions:
crypt: I * K --> A^8.
decrypt A^8 * K --> I
The problem here is that the size of the set A^8, of crypted-messages, might be smaller than the set of pieces of information (w, x, y). If this is so, it is simply impossible to achieve what you are looking for, unless we try something different...
Let's say that only YOU (or your server, or your application on your server) have to be able to calculate (w, x, y) from z. That is, you might send z to someone, and you don't care that they will not be able to decrypt it.
In this case, what you can do is use a database on your server. You will crypt the information using a well-known algorithm, than you generate a random number z. You define the table:
Id: char[8]
CryptedInformation: byte[]
You will then store z on the Id column, and the crypted information on the corresponding column.
When you need to decrypt the information, someone will give you z, the index of the crypted information, and then you can proceed to decryption.
However, if this works for you, you might not even need to crypt the information, you could have a table:
Id: char[8]
Integer: int
Username: char[]
Timestamp: DateTime
And use the same method, without crypting anything.
This can be applied to an "e-mail verification system" on a subscription process, for example. The link you would send to the user by mail would contain z.
Hope this helps.
I can't tell if you are trying to set this up a way to store passwords, but if you are, you should not use a two way hash function.
If you really want to do what you described, you should just concatenate the string and the timestamp (fill in extra spaces with underscores or something). Take that resulting string, convert it to ASCII or UTF-8 or something, and find its value modulo the largest prime less than 10^8.
Encryption or no encryption, I do not think it is possible to pack that much information into an 8 digit number in such a way that you will ever be able to get it out again.
An integer is 4 bytes. Let's assume your username is limited to 8 characters, and that characters are bytes. Then the timestamp is at least another 4 bytes. That's 16 bytes right there. In hex, that will take 32 digits. Base36 or something will be less, but it's not going to be anywhere near 8.
Hashes by definition are one way only, once hashed, it is very difficult to get the original value back again.
For 2 way encryption i would look at TripleDES which .net has baked right in with TripleDESCryptoServiceProvider.
A fairly straight forward implementation article.
EDIT
It has been mentioned below that you can not cram a lot of information into a small encrypted value. However, for many (not all) situations this is exactly what Bit Masks exist to solve.

Resources