I need to write a simplified encryption API that can easily deal with symmetric encryption, either by using a random generated key or a password-derived key.
The password generation is performed with the PKCS5_PBKDF2_HMAC() function from the OpenSSL library and using EVP_sha256() as hashing algorithm and a random generated 16-byte salt.
The symmetric encryption is performed with the OpenSSL EVP API.
My question is: how (in)secure is it to use the password derivation salt also as the IV for encryption?
The reason behind this question is that this will allow me to simplify the API and the output stream in the following way:
for the encryption routine, a user would have to provide either the password or the secret key; based on whichever is provided, the code can decide if a key needs to be derived from the password or use the provided key as it is;
similarly, for the decryption routine, a user would have to provide either the password or the secret key; based on whichever is provided, the key could be re-derived from the password and the IV, which is also acting as a password salt (and is put first in the output stream, right before the ciphertext);
the output stream will consist only of the IV concatenated with the ciphertext, eliminating a separate salt;
the output stream will be the same for a random generated key or a password-derived key.
Note: the API automatically takes care of the salt/IV generation, which is randomly generated for each encryption session, so even if a password is reused, the key is guaranteed to be different.
Thank you in advance for your answers.
As it happens, I've run into pretty much exactly the same scenario while working on one of my own projects (where a message is encrypted in CBC-mode with a random IV, and the user can either specify a key or a textual password).
Similar questions are discused here and here. To summarize: the purpose of an IV is to ensure that ciphertext remains unique even if the key is reused. As long as you're generating a new IV per message like you said you are, the source of the key doesn't matter as much. Which means you're probably safe reusing the salt as the IV, as far as anyone knows right now. It doesn't even seem like it would even make sense for it to be an issue, because the salt gets put through a cryptographic hash before deriving the key in a different way; as long as you use a good hashing function in PBKDF2 (i.e. SHA-256 as mentioned above), a key so derived is indistinguishable from one which was randomly generated, which in this case it might have been.
However, people uncover unexpected things in the world of cryptanalysis all the time, and straight-up reusing the same data in two places is considered A Bad Thing in principle even if we don't know of any practical problems right this minute. Should you actually be worried about this? At my level of knowledge on cryptanalysis, I'm somewhere between "maybe" and "I don't know," which is a little too much uncertainty for my tastes, so I'm going with the "technically safer" course of action, which is generating separate IV and salt values. Transmitting both the salt and the IV is a perfectly cromulent security practice, and you have nothing to lose if the user directly inputs the key and the salt goes unused.
Related
everyone. I'm learning Laravel and I'm in the start of my journey. I was learning about encryption and decryption in Laravel today and then this thought came into my mind. Could be a stupid one but I want to know my answers.
Let's say I make a database which stores sensitive information about users and I encrypt all the data before storing into the database, let's just say using the Encrypt class of Laravel. Now my questions:
If someone steals that database and luckily finds out that this information was encrypted using techniques provided by Laravel or any other technique. Can't that person descript that all data using the same decryption technique that was used to encrypt it. If this can be done, then what's the point of doing this encryption?
If that can be done then how can we make sure that our data is actually encrypted and is safe even if someone steals it?
Thank you guys!
I encrypted my data and then decrypted it and want my answer that how that encrypted data is even safe.
You might want to read up on the basics of encryption.
The common approach is that the technique by which you encrypt should be as open as possible - because the more people look at the algorithm, the less likely there might be bugs.
However, even if the algorithm is public, the key is not. Only people who have the key can decrypt properly encrypted data. This is true of the AES algorithm Laravel uses too.
The mathematics are complicated, but essentially the length of the key determines the amount of computer resources required to break the encryption.
THe real-world example is that everyone knows how door locks work. There are millions of locks that all work in the same way - but only people who have a key can open the door.
So, if an attacker steals your database, they cannot read your content unless they also have the key, as long as the key length is sufficient.
If someone steals that database and luckily finds out that this information was encrypted using techniques provided by Laravel or any other technique. Can't that person descript that all data using the same decryption technique that was used to encrypt it. If this can be done, then what's the point of doing this encryption?
If someone steals that database they will still need a decryption key to decrypt (thats why strong passwords are recommended) so even if they bruteforce it will become almost impossible to decrypt.
The way you’re asking if encrypt and decrypt is easy then i think you’re asking some encryption like base64.
With AES bruteforcing their way in becomes difficult. In laravel encrypt or crypt class it uses AES-256-CBC which is pretty good at that.
Then there is Hash library they are one way encryption techniques which uses bcrypt it can only be verified and not decrypt you have to run all combinations for lines everytime to brute force. Unlike md5 which gives same encryted string every time.
Background story:
I'm trying to write my own logging library. It's for hobby purposes. There's one must for me: logged data must be encrypted asymmetrically. The log messages are always directly written into the file, no caching occurs, no waiting for any queue.
This means I'll have to encrypt bunch of small chunks of messages. Even though the bottleneck is probably going to be the lack of caching & IO operations, I'd like to choose the encryption algorithm wisely.
Summary:
I have to encrypt numerous of small (<200 bytes) of data
Algorithm MUST be asymmetric, I'd like to encrypt with the public key and the only be able to decrypt it with my very own private key
What algorithm do you suggest?
It seems that you're interpreting “logged data must be encrypted asymmetrically” literally as a low-level requirement. “Logged data must be encrypted asymmetrically” is not a security requirement, it's an implementation approach. It's a bad implementation approach, because it requires you to design your own cryptographic protocol (you can use standard primitives, but only in a non-standard way), and it would have annoying limitations.
A much more reasonable requirement is “the machine that produces the logs must not be able to decrypt them”. This is a security requirement: it is a requirement on an asset (the logs) concerning its security (specifically their confidentiality).
The way to implement this security requirement is indeed to use asymmetric encryption. But you don't take an asymmetric encryption primitive and pass the logs as input to that. Rather, you use hybrid encryption: generate a symmetric key, encrypt the logs with that, encrypt the symmetric key with the asymmetric key, and erase the symmetric key.
The best way to do this is to use a library that does it well. The crypto_box, crypto_box_easy and crypto_box_seal functions of NaCl or libsodium are the gold standards here. You pass the public key for encryption, the message to sign, and you get an encrypted “box” out which can only be decrypted with the private key. crypto_box_easy and crypto_box also take your own private key as an argument, to sign the logs, which you might not want in your toy example but is usually important in the real world. crypto_box_easy and crypto_box also take a nonce as argument; this can be any value that can be public but that you must not use twice, for example a random string of crypto_box_NONCEBYTES bytes.
If you don't want to use crypto_box, for example because you want to learn how it's done under the hood, you have to assemble the parts manually, using your chosen low-level cryptographic library. The flow is different depending on which flavor of asymmetric encryption you use. With a key establishment method such as ECIES:
Generate a random one-time private key y.
Calculate the corresponding public value gy.
Using the recipient's public key gx, calculate the shared secret gxy.
Apply a key derivation function such as HKDF to deterministically generate a secret symmetric key, for example an AES key or a Chacha20_Poly1305 key.
Use the secret symmetric key to encrypt the log message, for example using AES-GCM or Chacha20_Poly1305.
Optionally, hash the log message and sign it with your public key.
Wipe the one-time private key, the shared secret, the secret symmetric key, the plaintext log, and any other intermediate value from memory.
Send the ciphertext, the public valuegy and optionally the signature.
With a key encryption method such as RSA-OAEP:
Generate a random one-time secret key, for example an AES key or a Chacha20_Poly1305 key.
Use the secret symmetric key to encrypt the log message, for example using AES-GCM or Chacha20_Poly1305.
Optionally, hash the log message and sign it with your public key.
Encrypt the symmetric key using the recipient's public key.
Wipe the one-time secret key, the plaintext log, and any other intermediate value from memory.
Send the ciphertext, the public valuegy and optionally the signature.
Doing the steps manually may have a performance benefit if you decide that it's ok to encrypt multiple log messages with the same symmetric key. This has a performance benefit, because asymmetric operations are slower than symmetric operations. There is no long-term security drawback to doing this. The only security drawback is a short-term one: all logs from the current symmetric key can still be decrypted. If you decide, for example, that it's ok if an attacker who breaches your system can read the last minute's logs, then you can renew the symmetric key every minute.
I have a very large block of code (few seconds to crypt).
I use KeyA to encrypt it.
later in the process, i receive a key (not necessarily KeyA)...
but i don't need to open the block yet,
what i really need, is to validate that this is really the Key that will open the code correctly.
I Assumed i can keep a known block, and encrypt it,
and in order to validate the key, only open it, but it feels like weakenning the power of the cryptography (brute-force is easier, one can learn few things about the key properties).
Does my assumption really weakening the chipher? why yes/why no?
Is there a different way to ensure the match of a key without opening the whole block.
I am assuming you are using Symmetric-Key Cryptography (the kind where the key used to decrypt the file is the same as the one used to encrypt it).
If the cipher is vulnerable to a Known-Plaintext Attack, then the known block of plaintext may reveal information about the key. The stream cipher used for ZIP files suffered from this problem. Because ZIPs are compressed, it was difficult to guess enough plain-text, but the checksum used to verify passwords (among other factors) helped provide sufficient plain-text for a practical attack.
In principle you could publicize the hash of KeyA (assuming that the hash algorithm is strong enough that it cannot be reversed, and that the hash algorithm isn't also used internally by the cipher). This would allow you to quickly reject invalid keys without changing the way the message is encrypted.
Taking this idea further, you could use a Message authentication code such as HMAC. A message authentication code will validate that the message (in this case your very large block of code, or perhaps just its file path) has not been tampered with, as well as validating that the key is correct.
If you are concerned that this will make brute force easier or expose properties of the key, you could split the key into two parts. The first part of the key could be purely for validation, and the second part purely for decryption. e.g. MyKey = AuthenticationPart,DecryptionPart
(Disclaimer: This is based on my very incomplete understanding of crypto. You might get better responses from the experts on security.stackexchange.com and/or crypto.stackexchange.com)
To start, I am trying to encrypt very sensitive information on a public website. Users will be able to update their information, Administrators will need access to this information. I am worried that if the encrypted data is some how compromised, then everyone's information would be as well due to them all using the same salt and key.
So I know using a salt, and key is always preferred. But as mentioned above if they reverse engineer the encrypted data, what use it is.
My solution, is to have the key and salt stored in a DB, with many rows and columns, any of which can be used for the salt or key. I would have an algorithm that will use "something" fixed in the users account that will be used to figure out which salt and key to use. This way statistically speaking no 2 years will have same combo of salt and key.
Is this over kill, or good?
I question the value of this second database that holds keys and salts. Consider:
The "something" in the user's data that identifies the salt and key will necessarily have to be encrypted differently from the rest of the user's data. Otherwise, you wouldn't be able to get it without first already having it.
Statistical analysis of the encrypted user data would almost certainly discover that the "something" is encrypted differently. That will be like waving a red flag at a bull, and an attacker will concentrate on figuring out why that's different.
You can assume that if an attacker can get the database of encrypted user information, he can also get the database of salts and keys.
Given that, there are two possible scenarios:
The encryption of the "something" that identifies the key and salt is unbreakable. That is, it's so good that the attacker's best efforts fail to reveal the connection between that "something" and the key/salt database.
The attacker discovers the encryption of the "something," and therefore is able to decrypt your sensitive data.
If #1 is the case, then you probably should use that encryption algorithm for all of your user data. Why do something in two steps when you can do it just as effectively in one?
If #2 is the case, then all the work you've done just put up a little bump in the road for the attacker.
So, short answer: what you propose looks like either unnecessary work or ineffective road blocking. In either case, it looks to me like a whole lot of work and added complexity for no appreciable gain.
That said, I could have misinterpreted your brief description. If so, please correct me.
I'm using an OpenSSL cipher in Ruby to send text between a client and server and apparently it's a good idea to employ an IV, but for decryption on the server-side, I'm going to need that IV which was generated client-side. My question is will I run into problems sending the IV over the network? I don't know the first thing about cryptography, so I have no idea whether the IV can be used to decrypt the message or not.
The IV is public information, it's totally fine to send it over the network. However, you should use a cryptographically secure random for every single encryption, especially if you are using CBC mode. Using a somehow predictable IV in a situation like that leaves your encryption vulnerable to certain kinds of attacks.
If you are completely new to cryptography and using Cipher, have a look at the docs, we added some information there that should help you getting started. It illustrates some best practices, among them is handling the IV correctly.
The IV ensures that even if you were to encrypt two identical plaintexts using the same key, they produce distinct ciphertexts (because a new, randomly generated IV should be used for every encryption).
The IV cannot be used to decrypt the message without the key, and does not need to be transmitted securely, so it can be safely sent over the network along with the encrypted message.