Encrypt printable text so result is still printable (can be typed)

Encrypt printable text so result is still printable (can be typed) - vb6

I want to encrypt some info for a licensing system and I want the result to be able to be typed in by the user.
Update: This operation must be reversible (decrypt-able)
E.g.,
Encrypt ( ComputerID+ProductID) -> (any standard ASCII character that can be typed. Ideally maybe even just A-Z).
So far what I did was to convert the encrypted text to HEX (so it's any character from 0-F) but that doubles the number of characters.
I'm using VB6.
I'm thinking I'd do some operation on each pair of (Input$(x) and Key$(x)) and then do a MOD to keep it within a range of ascii values (maybe 0-9-A-Z)
Any suggestions of a good algorithm?

Look into Base64 "encryption."
Base 64 will convert a number into 64 different ASCII characters, verses hex which is only 16 different ASCII characters... Making Base64 more compact and what you are looking for.
EDIT:
Code to do this in VB6 is available here: http://www.nonhostile.com/howto-encode-decode-base64-vb6.asp
Per Fuzzy Lollipop, below, Base32 looks like an even better option. Bonus points if you can find an example of that.
EDIT: I found an example of Base32 for VB6 although I've not tried it yet. -Clay

encode the encrypted bytes in HEX, or Base32 or Base64

Do you want this to be reversible -- to recover the IDs from the encrypted text? If so then it matters how you combine the key and input strings.
Usually you'd XOR each byte pair (work with byte arrays to avoid Unicode issues), circulating on the key string if it's shorter than the input. You can then use Base N encoding (32, 64 etc) to generate the license string.
Both operations are reversible: you can recover the XORed strings from the Base N string, then XOR with the key again to get the original IDs.
If you don't care about reversing the operations, then any convolution of key and ID will do. XOR is just the simplest.

Related

Shorter hashes of credit card numbers in Oracle

I use STANDARD_HASH in the manner below to hash credit card numbers. It returns hashes with 40 characters. This seems excessive for credit card numbers which have 16 digits. I would like to save space in my export. How can I create shorter hashes while still achieving these goals:
Have the same level of security and non-reversibility as
STANDARD_HASH
Keep the likelihood of two card numbers receiving the same hash very small (though if this happens a few times, it's OK)
Have the shortest possible hash result in terms of characters or space required when exporting to a CSV
Perform this operation while using as few database resources as possible
Perform this operation using read-only access to the database
If a method exists which achieves goals 2 and 3, then I expect that goal 1 could be achieved by using this method to hash the output of STANDARD_HASH.
SELECT STANDARD_HASH(TRIM(' 123456789123456789 ' )) FROM DUAL;
TRIM removes the spaces and then STANDARD_HASH returns a hash of length 64.
Here's the same example on db<>fiddle:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=7cd086f1b60f69eb3bc6f54d4a211844
The database version is "Oracle Database 18c Enterprise Edition".

That length of 64 is not the length of the result, but just how it displays. STANDARD_HASH returns a RAW value, that is displayed as hexadecimal.
You can convert this raw value into something usable using the UTL_RAW functions at https://docs.oracle.com/database/121/TTPLP/u_raw.htm#TTPLP71498
Eg
SELECT UTL_RAW.CAST_TO_VARCHAR2 (STANDARD_HASH(TRIM(' 123456789123456789 ' ))) FROM DUAL;
Note that when you try this in the fiddle, you’ll find a few ? that represent non-printable characters, so allow for that in your export.
Edit to add : STANDARD_HASH uses SHA1 by default - but that and MD5 have vulnerabilities - better to just add the extra parameter to STANDARD_HASH to use a longer SHA -see https://docs.oracle.com/database/121/SQLRF/functions183.htm#SQLRF55647
SELECT UTL_RAW.CAST_TO_VARCHAR2 (STANDARD_HASH(TRIM(' 123456789123456789 ' ), ‘SHA256’)) FROM DUAL;
Edit to address the 5 points :
it uses the same STANDARD_HASH so is the same
SHA1 is prone to collisions, so as above swap to SHA256 or higher
STANDARD_HASH uses industry-standard hashing algorithms. It is what it is. Be aware that by its very nature, hashing returns binary values, so it is your responsibility to convert them to appropriate format - eg for CSV files, you can convert to Base64 (see Base64 encoding and decoding in oracle )
and 5. No additional resources
Edit to respond to addition comments :
Yes, full SELECT you stated looks correct :
select utl_raw.cast_to_varchar2(utl_encode.base64_encode(
STANDARD_HASH(TRIM(' 123456789123456789 ' ), 'SHA1'))) FROM dual;
Base64 operates on groups of 3 bytes at a time, and appends "=" for each byte short. SHA1 hashes are always 20 bytes, so is always 1 byte short.
So offhand, you COULD trim that trailing "=" off - though I would advise against it (lean code beats premature optimisation). For example, if you subsequently decided to upgrade from SHA1 to SHA256, that generates hashes with a different number of bytes, and therefore potentially 0 or 2 "=" at the end, so weird bugs await.
Yes, "+" and "/" are valid characters in the Base64 output (along with 0-9, and upper-and lower- case letters - hence 64 characters in all, plus the =), but importantly commas and double-quotes are not - so yes, Base64 strings are safe to go into a CSV format.
FYI, a quick summary of Base64 (since I guess that you like me always like to have an overview of what I'm dealing with)
Base64 is used to translate a stream of binary data into printable strings. Now 3 bytes of binary data is 24 bits, which of course can be regarded as 4 lots of 6-bits (we can ignore the byte boundaries). Any collection of 6 bits has 2^6 = 64 possible values (hence the Base64 name), which are represented as 64 characters :
Upper-case letters
Lower case letters (so yes, case-sensitive).
digits 0-9
"+" and "/"
Hence each character in the Base64 output represents the next 6 bits of the binary data.

Pack/Unpack and base64 in Ruby

I have a string a = "hello". I can convert it to base 2 or base 16 using unpack:
a.unpack('B*')
# => ["0110100001100101011011000110110001101111"]
a.unpack('H*')
# => ["68656c6c6f"]
To convert to base 64, I tried pack:
[a].pack('m0')
# => "aGVsbG8="
but the result is not what I expected. I thought that if I have some binary representation or a string, to represent it in divided parts, I should use unpack. But it turned out not. Please help me understand it.

Per OP's clarified question, "Why do we use #pack to get base64 and #unpack to get other representations of raw data?"
The surface level reason is because Array#pack is a method that returns a String, while String#unpack is a method that returns an Array.
There are stronger conceptual reasons underlying this. The key principle is that base64 is not an array of raw bytes. Rather, it's a 7-bit-ASCII-safe string that can represent arbitrary bytes if properly (de)coded.
Each base64 character maps to a sequence of six bits. At the byte level, that's a 4:3 ratio of characters to raw bytes. Since integer powers of 2 don't divide by 3, we end up with padding more often than not, and you can't slice base64 in arbitrary places to get ranges of bytes out of it (you'd have to figure out which bytes you want in groups of three and go get the associated base64 characters in groups of four).
Arbitrary sequences of data are, fundamentally, arrays of bytes. Base64-encoded sequences are, fundamentally, strings: data sequences constrained to the range of bytes safely transmissible and displayable as text.
Base64 is the encapsulation (or "packing") of a data array into a string.

The encoded text is correct, to validate use below online tool:
https://www.base64encode.org/
text:
hello
Encoded Base64:
aGVsbG8=
Useful resource:
https://idiosyncratic-ruby.com/4-what-the-pack.html

Edit and read binary files?

For a data compression project, I want to be able to edit and read binary files, For this particular project it is very important to get 256 combinations out of 1 byte, I noticed saving one character in notepad resulted in a 1 byte file, this is great, so long as there are 256 characters linked to all 8-bit combinations. ASCII currently offers about 218 typeable characters, the rest are control characters
I know that there are 256 combinations in 8 bits (1 byte) because of 2 ^ 8 = 256 and i want to be able to use all those combinations for data compression. So a binary editor and reader would be perfect!

I'm afraid you're question is not specific enough. What kind of tooling are you looking for?
Hex is not fake binary, it's just another representation of the same data.
If you're on windows, open up the calculator and switch it to 'programming mode'. It'll allow you to convert values from decimal, hexadecimal and binary representation.
Then find yourself an hex-editor and you're in business.
Since you mention 'ASCII', I suggest you read Joel Spolskys post on character encodings. It's an excellent post, which clarifies how difficult it can be to show plain text. The post is from 2003, but still valid.
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Rubyist way to decode this encoded string assuming invariant ASCII encoding

My program is a decoder for a binary protocol. One of the fields in that binary protocol is an encoded String. Each character in the String is printable, and represents an integral value. According to the spec of the protocol I'm decoding, the integral value it represents is taken from the following table, where all possible characters are listed:
Character Value
========= =====
0 0
1 1
2 2
3 3
[...]
: 10
; 11
< 12
= 13
[...]
B 18
So for example, the character = represents an integral 13.
My code was originally using ord to get the ASCII code for the character, and then subtracting 48 from that, like this:
def Decode(val)
val[0].ord - 48
end
...which works perfectly, assuming that val consists only of characters listed in that table (this is verified elsewhere).
However, in another question, I was told that:
You are asking for a Ruby way to use ord, where using it is against
the Ruby way.
It seems to me that ord is exactly what I need here, so I don't understand why using ord here is not a Rubyist way to do what I'm trying to do.
So my questions are:
First and foremost, what is the Rubyist way to write my function above?
Secondary, why is using ord here a non-Rubyist practice?
A note on encoding: This protocol which I'm decoding specifies precisely that these strings are ASCII encoded. No other encoding is possible here. Protocols like this are extremely common in my industry (stock & commodity markets).

I guess the Rubyistic way, and faster, to decode the string into an array of integers is the unpack method:
"=01:".unpack("C*").map {|v| v - 48}
>> [13, 0, 1, 10]
The unpack method, with "C*" param, converts each character to an 8-bit unsigned integer.

Probably ord is entirely safe and appropriate in your case, as the source data should always be encoded the same way. Especially if when reading the data you set the encoding to 'US-ASCII' (although the format used looks safe for 'ASCII-8BIT', 'UTF-8' and 'ISO-8859', which may be the point of it - it seems resilient to many conversions, and does not use all possible byte values). However, ord is intended to be used with character semantics, and technically you want byte semantics. With basic ASCII and variants there is no practical difference, all byte values below 128 are the same character code.
I would suggest using String#unpack as a general method for converting binary input to Ruby data types, but there is not an unpack code for "use this byte with an offset", so that becomes a two-part process.

Is there a good two way hash to convert an email address to a predictable, readable, unix username?

We are working with a number of unix based filesystems, all of which share a similar set of restrictions on that certain characters can't be used in the username fields. One of those restrictions is no "#" , "_", or "." in the names. Being unix there are a number of other restrictions.
So the question is if there is a good known algorithm that can take an email address and turn that into a predictable unix filename. We would need to reverse this at some point to get the email.
I've considered doing thing like "."->"DOT", "#"->"AT", etc. But there are size limitations and other things that are generally problematic. I could also optimize by being able to map the #xyz.com part of the email to a special char or something. Each implementation would only have at most 3 domains it would need to support. I'm hoping someone has found a solution without a huge number of tradeoffs.
UPDATE:
-The two target filesystems are AFS and NFS.
-Base64 doesn't work as it has not compatible characters. "/"
-Readable is preferable.
Seems like the best answer would be to replace the #xyz.com domain to a single non-standard character, and then have a function that could shrink the first part of a name to something that fits in the username length restrictions of the various filesystems. But what is a good function for that?

You could try a modified version of the URL percent (%) encoding scheme used on for URIs.
If the percent symbol isn't allowed on your particular filesystem(s), simply replace it with a different, allowed character (and remember to encode any occurrences of that character properly).
Using this method:
mail.address#server.com
Would become:
mail%2Eaddress%40server%2Ecom
Or, if you had to substitute (for example), the letter a instead of the % symbol:
ma61ila2Ea61ddressa40servera2Ecom
Not exactly humanly-readable perhaps, but easily enough processed through an encoding algorithm. For the best space efficiency, your escape character should be a character allowed by the filesystem, yet one that is not likely to appear frequently in an address.
This encoding scheme has the advantage that there is no size increase for most normal characters. The string length will ONLY go up for characters not supported by the filesystem.

Check out base64. Encoding and decoding is well defined.
I'd prefer this over rolling my own format any day.

Hmm, from your question I'm not totally clear on this point, but since you wanted some conversion I'm assuming that you want something that is at least human readable?
Each OS may have different restrictions, but are you close enough to the platforms that you would be able to find out/test what is acceptable in a username? If you could find three 'special' characters that you could use just to do a replace on '#', '.', '_' you would be good to go. (Is that comprehensive? if not you would need to make sure you know all of them otherwise you could clash.) I searched a bit trying to find whether there was a POSIX standard, but wasn't able to find anything, so that's why I think if you can just test what's valid that would be the most direct route.
With even one special character, you could do URL encoding, either with '%' if it's available, or whatever you choose if not, say '!", then { '#'->'!40", '_'->'!5F', '.'-> '!2E' }. (The spec [RFC1738] http://www.rfc-editor.org/rfc/rfc1738.txt) defines the characters as US-ASCII so you can just find a table, e.g. in wikipedia's ASCII article and look up the correct hex digits there.) Or, you could just do your own simple mapping since you don't need the whole ASCII set, you could just do a map with two characters per escaped character and have, say, '!a','!u','!p' for at, underscore, period.
If you have two special characters, say, '%', and '!', you could delimit text that represents the character, say, %at!, &us!, and '&pd!'. (This is pretty much html-style encoding, but instead of '&' and ';' you are using the available ones, and you're making up your own mnemonics.) Another idea is that you could use runs of a symbol to determine the translated character, where each new character flops which symbol is being used. (This conveniently stops the run if we need to put two of the disallowed characters next to each other.) So assume '%' and '!', with period being 1, underscore 2, and at-sign being three, 'mickey._sample_#fake.out' would become 'mickey%!!sample%%!!!fake%out'. There are other variations but this one is easy to code.
If none of this is an option (e.g. no symbols at all, just [a-zA-Z0-9]), then really I think the Base64 answer sounds about right. Really once we're getting to anything other than a simple replacement (and even that) it's already getting hard to type if that's the goal. But if you really need to try to keep the email mostly readable, what you do is implement some sort of escaping. I'm thinking use '0' as your escape character, so now '0' becomes '00', '#' becomes '01', '.' becomes '02', and '_' becomes '03'. So now, 'mickey01._sample_#fake.out'would become 'mickey0010203sample0301fake02out'. Not beautiful but it should work; since we escaped any raw 0's, just always make sure you define a mapping for whatever you choose as your escape char and you should be fine..
That's all I can think of atm. :) Definitely if there's no need for these usernames to be readable in the raw it seems like apparently Base64 won't work, since it can produce slashes. Heck, ok, just the 2-digit US-ASCII hex value for each character and you're done...] is a good way to go; there's lots of nice debugged, heavily field-tested code out there for it and it solves your problem quite handily. :)

Given...
- the limited set of characters allowed in various file systems
- the desire to keep the encoded email address short (both for human readability and for possible concerns with file system limitations)
...a possible approach may be a two steps encoding logic whereby the email is
first compressed using a lossless compression algorithm such as Lempel-Ziv, effectively turning it into a "binary" form, stored in a shorter array of bytes
then this array of bytes is encoded using a Base64-like algorithm
The idea is to minimize the size of the binary representation, so that the expansion associated with the storage inefficiency of the encoding -which can only store roughly 6 bits (and probably a bit less) per character-, doesn't cause the encoded string to be too long.
Without getting overly sophisticated for the compression nor the encoding, such a system would likely produce encoded strings that are maybe 4/5 of the input string size (the email address): the compression should easily half the size, but the encoding, say Base32, would grow the binary form size by 8/5.
Efforts in improving the compression ratio may allow the selection of more "wasteful" encoding schemes (with smaller character sets) and this may help making the output more human-readable and also more broadly safe on various flavors of file systems. For example whereby a Base64 seems optimal. space-wise, using only uppercase letter (base 26) may ensure portability of the underlying scheme to file systems where the file names are not case sensitive.
Another benefit of the initial generic compression is that few, if any, assumptions need to be made about the syntax of valid input key (email addresses here).
Ideas for compression:
LZ seems like a good choice, 'though one may consider primin its initial buffer with common patterns found in email addresses (example ".com" or even "a.com", "b.com" etc.). This initial buffer would ensure several instances of "citations" per compressed email address, hence a better compression ratio overall). To further squeeze a few bytes, maybe LZH or other LZ-variations could be used.
Aside from the priming of the buffer mentioned above, another customization may be to use a shorter buffer than typical LZ algorithms, since the string we have to compress (email address instances) are themselves very short and would not benefit from say a 512 bytes buffer. (Shorter buffer sizes allow shorter codes for the citations)
Ideas for encoding:
Base64 is not suitable as-is because of the slash (/), plus (+) and equal (=) characters. Alternate characters could be used to replace these; dash (-) comes to mind, but finding three charcters, allowed by all "flavors" of the targeted file systems may be a stretch.
Never the less, Base64 and its 4 output characters per 3 payload bytes ratio provide what is probably the barely achievable upper limit of storage efficiency [for an acceptable character set].
At the lower end of this efficiency, is maybe an ASCII representation of the Hexadeciamal values of the bytes in the array. This format with a doubling of the payload bytes may be acceptable, length-wise, and is interesting because of its simplicity (there is a direct and simple relation between each nibble (4 bits) in the input and characters in the encoded string.
Base32 whereby A thru Z encode 0 thru 25 and 0 thru 5 encode 26 thru 31, respectively, essentially variation of Base64 with an 8 output characters per 5 payload bytes ratio may be a very viable compromise.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio