Derive password string from random bytes - algorithm

I have 32 bytes. I need to derive from them a password string (which will hopefully work on most websites), given certain restrictions.
All characters must be in one of { A-Z, a-z, 0-9, !##$% }.
The string will have at least two characters from each of the above sets.
The string must be exactly 15 characters long.
Currently I'm using the bytes to seed a non-cryptographically-secure PRNG, which I'm then using to:
get two random characters from each of the sets and push them.
fill the rest of the string with randomly chosen characters from any of the sets.
shuffle the string.
Is this valid, and is there a simpler way?

Related

How to have a 128-bits/32 character encryption in codeigniter?

I wanted to encrypt a string from an input using Encryption Library in codeigniter, I wanted it to generate a 32 char regardless on how long the input is but the number of character generated from encrypt() deters on how many characters the input...
If you could encrypt any string down to 32 characters, 50 gigabyte games and 8K three hour movies could be compressed down to 32 characters. Obviously, that's not possible.
Consider a MD5 or SHA1 hash of the string. It won't be decryptable, and it won't be guaranteed to be unique, but it'll be a fixed, predictable length.

Encode string to an specified length using any algo

Is there a way to compress/encode string to specified length(8/10 character).
I have a combination of secret key and a numeric value of 16 digit, and I want to create a unique id with combination of these both. which length should be between 8-12, and it should not change if combination is same.
Please suggest a way.
If it's 16 decimal digits and your string can contain any characters, then sure. If you want ten characters out, then you'd need 40 different characters. 4010 > 1016. Or for nine characters out, you need 60 different characters. 609 > 1016. E.g. some subset of the upper case letters, lower case letters, and digits (62 to choose 40 or 60 from). Then it is simply a matter of base conversion either way. Convert from base 10 to base 40 or 60, and then back.
Many languages already have Base-64 coding routines, which will get you to nine characters.
Eight is a problem, since you would need 100 characters (1008 == 1016), and there are only 95 printable ASCII characters.
You could use a secure hash function, like sha512, and truncate the resulting hex string to the desired length.
If you want slightly more entropy, you can base64 encode it before truncating.

Algorithm for compacting hexadecimal GUID in URL-safe way?

I've got a database with rows that are identified by 32-character hexadecimal GUIDs (stored as binary). I'm wondering how to compress these strings dynamically into a shorter, but still user-friendly representation... ideally for use in shared URLs. Since they're 32 characters hexadecimally (and case-insensitive currently) ... I tried hitting the binary representation with base64 encoding. That got them from 32 characters to 22 characters, but I wasn't sure if there was anything better that was common yet straightforward.
I'm also thinking about getting creative, given that even emoji now is technically URL-safe. Not sure if that's a good idea, though.
Has anyone considered cross-platform solutions for this problem before? Is it better just to generate new IDs entirely with a smaller subset?
You are allowed to use 0-9, a-z, A-Z, and !$'()*+,-._~ in a URI (which does not include the characters with special syntax interpretations). That's 74 characters. That is a little better than 64. You can use a simple scheme to pull 6 or 7 bits from your stream of bits and use that to select one of the allowed URI characters.
To encode, pull six bits from your stream. If it is less than 54, then emit the corresponding character in the set of 74. If it is 54 or more, pull one more bit on the bottom of that. You now have a seven bit number in the range of 108..127. Subtract 108 and add 54 to get the range 54..73. Emit that character from the set.
You now have an average number of bits per character of 6*54/74 + 7*20/74 = 6.27. Or 1.276 characters per byte. Your 16-byte ID would then be encoded, on average, in 20.4 characters. Actually a bit more since you will have to stuff a few zero bits at the end to get the last character out. The real-world average is 21.1303, with a minimum of 19 and a maximum of 22.
This is faster and simpler than trying to do a base conversion with large integers, and gives essentially the same performance, 21 characters.
Do your 16-byte IDs tend to have leading or trailing zeros, or other patterns amendable to compression? If so, then you can arrange the encoding scheme to use fewer characters for those cases.
See this Javascript implementation:
function toDigits(n, b){
var digits = []
while(n.isPositive()){
digits.push(n.remainder(b).valueOf())
n = n.quotient(b);
}
return digits
}
function fromDigits(digits, b){
n = BigInteger(0);
for(var i=0;i<digits.length;i++){
var d=parseInt(digits[i],b);
n = n.multiply(b).add(d);
}
return n;
}
function changebase(n,from_base,to_base){
var temp=fromDigits(n,from_base);
return toDigits(temp,to_base);
}
var unreserved_characters="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~";
var number_of_unreserved_characters=unreserved_characters.length;
var guid="9ec54806c242982ca059661b6db74ab9";
var newbase=changebase(guid,16,number_of_unreserved_characters);
var newurl="";
for(var i=0;i<newbase.length;i++){
newurl+=unreserved_characters[newbase[i]];
}
I used a BigInteger library http://silentmatt.com/biginteger/.
This implementation converts the hex into a new base that is the number of unreserved characters allowed in URI. This might be a little better than base64 since it hase 2 additional characters for a total of 66 characters compared to 64 in base64. That might not make much difference though. So depending on whether you dont mind browser compatibility, you can add other ascii characters to the list.
for instance using:
var unreserved_characters="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜø£Ø׃áíóúñѪº¿®¬½¼¡«»░▒▓│┤ÁÂÀ©╣║╗╝¢¥┐└┴┬├─┼ãÃ╚╔╩╦╠═╬¤ðÐÊËÈıÍÎÏ┘┌█▄¦Ì▀ÓßÔÒõÕµþÞÚÛÙýݯ´≡±‗¾¶§÷¸°¨·¹³²■";
has much more characters and reduces the size even more, and might work on your target browsers.

How to encode a number as a string such that the lexicographic order of the generated string is in the same order as the numeric order

For eg. if we have two strings 2 and 10, 10 will come first if we order lexicographically.
The very trivial sol will be to repeat a character n number of time.
eg. 2 can be encoded as aa
10 as aaaaaaaaaa
This way the lex order is same as the numeric one.
But, is there a more elegant way to do this?
When converting the numbers to strings make sure that all the strings have the same length, by appending 0s in the front if necessary. So 2 and 10 would be encoded as "02" and "10".
While kjampani's solution is probably the best and easiest in normal applications, another way which is more space-efficient is to prepend every string with its own length. Of course, you need to encode the length in a way which is also consistently sorted.
If you know all the strings are fairly short, you can just encode their length as a fixed-length base-X sequence, where X is the number of character codes you're willing to use (popular values are 64, 96, 255 and 256.) Note that you have to use the character codes in lexicographical order, so normal base64 won't work.
One variable-length order-preserving encoding is the one used by UTF-8. (Not UTF-8 directly, which has a couple of corner cases which will get in the way, but the same encoding technique. The order-preserving property of UTF-8 is occasionally really useful.) The full range of such compressed codes can encode values up to 42 bits long, with an average of five payload bits per byte. That's sufficient for pretty long strings; four terabyte long strings are pretty rare in the wild; but if you need longer, it's possible, too, by extending the size prefix over more than one byte.
Break the string into successive sub strings of letters and numbers and then sort by comparing each substring as an integer if it's an numeric string
"aaa2" ---> aaa + 2
"aaa1000" ---> aaa + 1000
aaa == aaa
Since they're equal, we continue:
1000 > 2
Hence, aaa1000 > aaa2.

how to represent a n-byte array in less than 2*n characters

given that a n-byte array can be represented as a 2*n character string using hex, is there a way to represent the n-byte array in less than 2*n characters?
for example, typically, an integer(int32) can be considered as a 4-byte array of data
The advantage of hex is that splitting an 8-bit byte into two equal halves is about the simplest thing you can do to map a byte to printable ASCII characters. More efficient methods consider multiple bytes as a block:
Base-64 uses 64 ASCII characters to represent 6 bits at a time. Every 3 bytes (i.e. 24 bits) are split into 4 6-bit base-64 digits, where the "digits" are:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
(and if the input is not a multiple of 3 bytes long, a 65th character, "=", is used for padding at the end). Note that there are some variant forms of base-64 use different characters for the last two "digits".
Ascii85 is another representation, which is somewhat less well-known, but commonly used: it's often the way that binary data is encoded within PostScript and PDF files. This considers every 4 bytes (big-endian) as an unsigned integer, which is represented as a 5-digit number in base 85, with each base-85 digit encoded as ASCII code 33+n (i.e. "!" for 0, up to "u" for 84) - plus a special case where the single character "z" may be used (instead of "!!!!!") to represent 4 zero bytes.
(Why 85? Because 845 < 232 < 855.)
yes, using binary (in which case it takes n bytes, not surprisingly), or using any base higher than 16, a common one is base 64.
It might depend on the exact numbers you want to represent. For instance, the number 9223372036854775808, which requres 8 bytes to represent in binary, takes only 4 bytes in ascii, if you use the product of primes representation (which is "2^63").
How about base-64?
It all depends on what characters you're willing to use in your encoding (i.e. representation).
Base64 fits 6 bits in each character, which means that 3 bytes will fit in 4 characters.
Using 65536 of about 90000 defined Unicode characters you may represent binary string in N/2 characters.
Yes. Use more characters than just 0-9 and a-f. A single character (assuming 8-bit) can have 256 values, so you can represent an n-byte number in n characters.
If it needs to be printable, you can just choose some set of characters to represent various values. A good option is base-64 in that case.

Resources