I wanted to encrypt a string from an input using Encryption Library in codeigniter, I wanted it to generate a 32 char regardless on how long the input is but the number of character generated from encrypt() deters on how many characters the input...
If you could encrypt any string down to 32 characters, 50 gigabyte games and 8K three hour movies could be compressed down to 32 characters. Obviously, that's not possible.
Consider a MD5 or SHA1 hash of the string. It won't be decryptable, and it won't be guaranteed to be unique, but it'll be a fixed, predictable length.
Related
I plan to build a unique file name based on its content. For example, by its SHA256 hash. Files with the same content must have the same name.
The easiest way is to convert hash to a hex string. A file name will be 32 bytes length * 2 = 64 characters. This is pretty long name to operate with. How to make it shorter?
I implemented a sort of "Base32" coding - a vocabulary string that includes digits and 22 letters. I use only five bits of every byte to build file name with 32 characters. Much better.
I am looking for a balance between file name length and low collision probability. If the number of files is expected to be less than 500K, how long should the filename be? 8? 16? 24? 32?
Is there any recommended method to build short unique filenames at all?
If you use an N-bit cryptographic hash on M files, you can estimate the probability of at least one collision to be M2/2N+1
For 500K files, that's about 1/2N-37
Using base32, 16 chars gives probability of collision 1/243 -- a few trillion to 1 odds.
If that won't do, then 24 chars gives 1/283.
If you're willing to check them all and re-generate on collision, then 8 chars is fine.
Number of collisions depend on the content of the files, the hash-algorithm and the length of the hash.
In general: The longer the hash-value is the less likely are collisions (if your content does not especially provoke collisions).
You cannot avoid the possibility of collisions unless you use the content as file-name (or a lossless compression of it).
To shorten the filenames you could allow more different characters for the file-name. (But we aware what characters your OS allows and which you are willing to use).
I would go for a kind of base32 encoding to avoid problems with filesystems that do not distinguish between upper and lower case character.
I use STANDARD_HASH in the manner below to hash credit card numbers. It returns hashes with 40 characters. This seems excessive for credit card numbers which have 16 digits. I would like to save space in my export. How can I create shorter hashes while still achieving these goals:
Have the same level of security and non-reversibility as
STANDARD_HASH
Keep the likelihood of two card numbers receiving the same hash very small (though if this happens a few times, it's OK)
Have the shortest possible hash result in terms of characters or space required when exporting to a CSV
Perform this operation while using as few database resources as possible
Perform this operation using read-only access to the database
If a method exists which achieves goals 2 and 3, then I expect that goal 1 could be achieved by using this method to hash the output of STANDARD_HASH.
SELECT STANDARD_HASH(TRIM(' 123456789123456789 ' )) FROM DUAL;
TRIM removes the spaces and then STANDARD_HASH returns a hash of length 64.
Here's the same example on db<>fiddle:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=7cd086f1b60f69eb3bc6f54d4a211844
The database version is "Oracle Database 18c Enterprise Edition".
That length of 64 is not the length of the result, but just how it displays. STANDARD_HASH returns a RAW value, that is displayed as hexadecimal.
You can convert this raw value into something usable using the UTL_RAW functions at https://docs.oracle.com/database/121/TTPLP/u_raw.htm#TTPLP71498
Eg
SELECT UTL_RAW.CAST_TO_VARCHAR2 (STANDARD_HASH(TRIM(' 123456789123456789 ' ))) FROM DUAL;
Note that when you try this in the fiddle, you’ll find a few ? that represent non-printable characters, so allow for that in your export.
Edit to add : STANDARD_HASH uses SHA1 by default - but that and MD5 have vulnerabilities - better to just add the extra parameter to STANDARD_HASH to use a longer SHA -see https://docs.oracle.com/database/121/SQLRF/functions183.htm#SQLRF55647
SELECT UTL_RAW.CAST_TO_VARCHAR2 (STANDARD_HASH(TRIM(' 123456789123456789 ' ), ‘SHA256’)) FROM DUAL;
Edit to address the 5 points :
it uses the same STANDARD_HASH so is the same
SHA1 is prone to collisions, so as above swap to SHA256 or higher
STANDARD_HASH uses industry-standard hashing algorithms. It is what it is. Be aware that by its very nature, hashing returns binary values, so it is your responsibility to convert them to appropriate format - eg for CSV files, you can convert to Base64 (see Base64 encoding and decoding in oracle )
and 5. No additional resources
Edit to respond to addition comments :
Yes, full SELECT you stated looks correct :
select utl_raw.cast_to_varchar2(utl_encode.base64_encode(
STANDARD_HASH(TRIM(' 123456789123456789 ' ), 'SHA1'))) FROM dual;
Base64 operates on groups of 3 bytes at a time, and appends "=" for each byte short. SHA1 hashes are always 20 bytes, so is always 1 byte short.
So offhand, you COULD trim that trailing "=" off - though I would advise against it (lean code beats premature optimisation). For example, if you subsequently decided to upgrade from SHA1 to SHA256, that generates hashes with a different number of bytes, and therefore potentially 0 or 2 "=" at the end, so weird bugs await.
Yes, "+" and "/" are valid characters in the Base64 output (along with 0-9, and upper-and lower- case letters - hence 64 characters in all, plus the =), but importantly commas and double-quotes are not - so yes, Base64 strings are safe to go into a CSV format.
FYI, a quick summary of Base64 (since I guess that you like me always like to have an overview of what I'm dealing with)
Base64 is used to translate a stream of binary data into printable strings. Now 3 bytes of binary data is 24 bits, which of course can be regarded as 4 lots of 6-bits (we can ignore the byte boundaries). Any collection of 6 bits has 2^6 = 64 possible values (hence the Base64 name), which are represented as 64 characters :
Upper-case letters
Lower case letters (so yes, case-sensitive).
digits 0-9
"+" and "/"
Hence each character in the Base64 output represents the next 6 bits of the binary data.
Is there a way to compress/encode string to specified length(8/10 character).
I have a combination of secret key and a numeric value of 16 digit, and I want to create a unique id with combination of these both. which length should be between 8-12, and it should not change if combination is same.
Please suggest a way.
If it's 16 decimal digits and your string can contain any characters, then sure. If you want ten characters out, then you'd need 40 different characters. 4010 > 1016. Or for nine characters out, you need 60 different characters. 609 > 1016. E.g. some subset of the upper case letters, lower case letters, and digits (62 to choose 40 or 60 from). Then it is simply a matter of base conversion either way. Convert from base 10 to base 40 or 60, and then back.
Many languages already have Base-64 coding routines, which will get you to nine characters.
Eight is a problem, since you would need 100 characters (1008 == 1016), and there are only 95 printable ASCII characters.
You could use a secure hash function, like sha512, and truncate the resulting hex string to the desired length.
If you want slightly more entropy, you can base64 encode it before truncating.
I need to hash a message into a string of 30 chars. What's the best and most secure hash function for this usage?
Thirty characters (bytes) is 240 bits.
If you can't move the goal-post to allow 32 characters, then you will probably end up using SHA-1, which generates 160-bits or 20 bytes. When Base-64 encoded, that will be 28 characters. If you use a hex-encoding, it will be 40 characters, which is nominally out of range. With 32 characters, you could use SHA-256, but Base-64 encoding would increase that size (to 44 characters) and hex-encoding increases the size to 64 characters.
If you must use hex-encoding and can go to 32 bytes, then MD5 - which generates 128 bits - could be used, though it is not recommended for any new systems. With Base-64 encoding, MD5 uses 24 characters. Otherwise, you are using very minimally secure algorithms - not recommended at all.
Just use SHA1 and trim to 30 characters.
import hashlib
hash = hashlib.sha1("your message").hexdigest()[:30]
It's been proven that cutting characters off a cryptographically secure hash function such as SHA1 has negligible effects on its security (can't find the reference now though)
I want to encrypt some info for a licensing system and I want the result to be able to be typed in by the user.
Update: This operation must be reversible (decrypt-able)
E.g.,
Encrypt ( ComputerID+ProductID) -> (any standard ASCII character that can be typed. Ideally maybe even just A-Z).
So far what I did was to convert the encrypted text to HEX (so it's any character from 0-F) but that doubles the number of characters.
I'm using VB6.
I'm thinking I'd do some operation on each pair of (Input$(x) and Key$(x)) and then do a MOD to keep it within a range of ascii values (maybe 0-9-A-Z)
Any suggestions of a good algorithm?
Look into Base64 "encryption."
Base 64 will convert a number into 64 different ASCII characters, verses hex which is only 16 different ASCII characters... Making Base64 more compact and what you are looking for.
EDIT:
Code to do this in VB6 is available here: http://www.nonhostile.com/howto-encode-decode-base64-vb6.asp
Per Fuzzy Lollipop, below, Base32 looks like an even better option. Bonus points if you can find an example of that.
EDIT: I found an example of Base32 for VB6 although I've not tried it yet. -Clay
encode the encrypted bytes in HEX, or Base32 or Base64
Do you want this to be reversible -- to recover the IDs from the encrypted text? If so then it matters how you combine the key and input strings.
Usually you'd XOR each byte pair (work with byte arrays to avoid Unicode issues), circulating on the key string if it's shorter than the input. You can then use Base N encoding (32, 64 etc) to generate the license string.
Both operations are reversible: you can recover the XORed strings from the Base N string, then XOR with the key again to get the original IDs.
If you don't care about reversing the operations, then any convolution of key and ID will do. XOR is just the simplest.