convert decimal to hexidecimal 2 digits at a time Ruby - ruby

I am trying to convert a string of decimal values to hex, grabbing two digits at a time
so for example, if i were to convert these decimals two digits at a time
01 67 15 06 01 76 61 73
this would be my result
01430F06014C3D49
i know that str.to_s(16) will convert decimal to hex but like i said I need this done two digits at a time so the output is correct, and i have no clue how to do this in Ruby
here is what i have tried
str.upcase.chars.each_slice(2).to_s.(16).join

You can use String#gsub with a regular expression and Kernel#sprintf:
"01 67 15 06 01 76 61 73".gsub(/\d{2} */) { |s| sprintf("%02x", s.to_i) }
#=> "01430f06014c3d49"
The regular expression /\d{2} */) matches two digits followed by zero or more spaces (note 73 is not followed by space).
The result of the block calculation replaces the two or three characters that were matched by the regular expression.
sprintf's formatting directive forms a sting containing 2 characters, padded to the left with '0''s, if necessary, and converting the string representation of an integer in base 10 to the string representation of an integer in base 16 ('x').
Alternatively, one could use String#% (with sprintf's formatting directives):
"01 67 15 06 01 76 61 73".gsub(/\d{2} */) { |s| "%02x" % s.to_i }
#=> "01430f06014c3d49"

Related

Hexdump: Convert between bytes and two-byte decimal

When I use hexdump on a file with no options, I get rows of hexadecimal bytes:
cf fa ed fe 07 00 00 01 03 00 00 80 02 00 00 00
When I used hexdump -d on the same file, that same data is shown in something called two-byte decimal groupings:
64207 65261 00007 00256 00003 32768 00002 00000
So what I'm trying to figure out here is how to convert between these two encodings. cf and fa in decimal are 207 and 250 respectively. How do those numbers get combined to make 64207?
Bonus question: What is the advantage of using these groupings? The octal display uses 16 groupings of three digits, why not use the same thing with the decimal display?
As commented by #georg.
0xfa * 256 + 0xcf == 0xfacf == 64207
The conversion exactly works like this.
So, if you see man hexdump:
-d, --two-bytes-decimal
Two-byte decimal display. Display the input offset in hexadecimal, followed by eight
space-separated, five-column, zero-filled, two-byte units of input data, in unsigned
decimal, per line.
So, for example:
00000f0 64207 65261 00007 00256 00003 32768 00002 00000
Here, 00000f0 is a hexadecimal offset.
Followed by two-byte units of input data, for eg.: 64207 in decimal (first 16 bits - i.e. two bytes of the file).
The conversion (in your case):
cf fa ----> two-byte unit (the byte ordering depends on your architecture).
fa * 256 + cf = facf ----> appropriately ----> 0xfacf (re-ording)
And dec of oxfacf is 64207.
Bonus question: It is a convention to display octal numbers using three digits (unlike hex and decimal), so it uses a triplet for each byte.

Why does `"0a".to_i(16)` return `10`?

I'm confused about the optional argument for to_i.
Specifically, what "base" means, and how it impacts the method in this example:
"0a".to_i(16) #=> 10
I have trouble with the optional argument in regards to the string the method is called on. I thought that the return value would just be an integer value of 0.
Simple answer: It's because 0a or a in Hexadecimal is equal to 10 in Decimal.
And base, in other word Radix means the number of unique digits in a numeral system.
In Decimal, we have 0 to 9, 10 digits to represent numbers.
In Hexadecimal, there're 16 digits instead, apart from 0 to 9, we use a to f to represent the conceptual numbers of 10 to 15.
You can test it like this:
"a".to_i(16)
#=> 10
"b".to_i(16)
#=> 11
"f".to_i(16)
#=> 15
"g".to_i(16)
#=> 0 # Because it's not a correct hexadecimal digit/number.
'2c'.to_i(16)
#=> 44
'2CH2'.to_i(16)
#=> 44 # Extraneous characters past the end of a valid number are ignored, and it's case insensitive.
9.to_s.to_i(16)
#=> 9
10.to_s.to_i(16)
#=> 16
In other words, 10 in Decimal is equal to a in Hexadecimal.
And 10 in Hexadecimal is equal to 16 in Decimal. (Doc for to_i)
Note that usually we use 0x precede to Hexadecimal numbers:
"0xa".to_i(16)
#=> 10
"0x100".to_i(16)
#=> 256
Btw, you can just use these representations in Ruby:
num_hex = 0x100
#=> 256
num_bin = 0b100
#=> 4
num_oct = 0o100
#=> 64
num_dec = 0d100
#=> 100
Hexadecimal, binary, octonary, decimal (this one, 0d is superfluous of course, just use in some cases for clarification.)

Identify characters in a series by a line number

this is my first time on here. I searched and couldn't find anything relevant. Trying to work something out:
Where a=1, b=2, c=3 ... z=26
If you were to create a series where it goes through every possible outcome of letters and using 1 character length in numerical order, the total possible number of outcomes is 26 (26^1). You easily figure "e" would be on line 5 of the series. "y" would be line 25.
If you set the parameters to a 2 character length, the total number of combinations is 676 (26^2), "aa" would be line 1, "az" would be line 26, "ba" would be line 27, "zz" would be line 676. This is easily calculated, and can be done no matter what the character length is, you will always find what line it would be on in the series.
My question is how do you do it in reverse? Using the same parameters, 1 will obviously be "aa", 31 will be "be". How do you work out with a formula that 676 will be "zz"? 676, based on the parameters set, can only be "zz", it can't be any other set of characters. So there should be a way of calculating this, no matter how long the number is, as long as you know the parameters of the series.
If length of characters was 10, what characters would be on line 546,879,866, for example?
Is this even doable? Thanks so much in advance
It is enough to translate 546,879,866 into 26-base number. For example in bash:
echo 'obase=26 ; 546879866' | bc
01 20 00 19 03 23 00
And if your prefere 10 caracters you should fill the number from the beginning:
00 00 00 01 20 00 19 03 23 00
Just note that numeration starts from 0 which is mean a=00, b=01, … z=25.

regular expression ruby phone number

I'm trying to figure out how to write my own regex.
I made a list of viable phone numbers and non-viable ones and trying to make sure the viable ones are included but I can't figure out how to finish it up.
Allowed list
0665363636 //
06 65 36 36 36 //
06-65-36-36-36 //
+33 6 65 36 36 36
Not allowed
06 65 36 36 //
2336653636 //
+3366536361 //
0065363636
I messed around with it a bit and I currently have this:
[0+][63][6 \-3][56\ ][\d{1}][\d \-]\d{2}[\d{1} \-]\d\d? ?\-?\d?\d? ?\d?\d?$
This blocks out number 2 and 4 of the non allowed but I can't seem to figure out how to block the other ones out.
Should I put a minimum amount of numbers? If so how would I do this.
[Edit: After posting this I see it is very similar to #Lucas' answer. I will let it stand, however, for the alternative presentation.]
I would try constructing a regex for each of the allowed patterns and then take their union to obtain a single regex.
We see that all of the allowable numbers not beginning with + have 10 digits, so I will assume that's a requirement. If different numbers of digits are permitted, that can be dealt with easily.
1. Include 0665363636, exclude 2336653636 and 0065363636
I assume this means the number must begin with the digit 0 and the second digit must not be 0. That's easy:
r1 = /
^ # match start of string
0 # match 0
[1-9] # match any digit 1-9
\d{8} # match 8 digits
$ # match end of string
/x
Test:
'0665363636' =~ r1 #=> 0
'2336653636' =~ r1 #=> nil
'0065363636' =~ r1 #=> nil
That seems to work.
2. Include 06 65 36 36 36, exclude 06 65 36 36
Another easy one:
r2 = /
^ # match start of string
0 # match 0
[1-9] # match any digit 1-9 # or \d if can be zero
(?: # begin a non-capture group
\s # match one whitespace
\d{2} # match two digits
) # end capture group
{4} # match capture group 4 times
$ # match end of string
/x
Test:
'06 65 36 36 36' =~ r2 #=> 0
'06 65 36 36' =~ r2 #=> nil
Another apparent success!
We see that 06-65-36-36-36 should also be permitted. That's such a small variant of the above we don't have to bother creating another regex to include in the union; instead we just modify r2 ever-so-slightly:
r2 = /^0[1-9](?:
[\s-] # match one whitespace or a hyphen
\d{2}){4}$
/x
Notice that we don't have to escape the hyphen when it's in a character class.
Test:
'06 65 36 36 36' =~ r2 #=> 0
'06-65-36-36-36' =~ r2 #=> 0
Yes!
3. Include +33 6 65 36 36 36, exclude +3366536361
It appears that, when the number begins with a +, + must be followed by two digits, a space, one digit, a space, then four pairs of numbers separated by spaces. We can just write that down:
r3 = /
^ # match start of string
\+ # match +
\d\d # match two digits
\s\d # match one whitespace followed by a digit
(?: # begin a non-capture group
\s # match one whitespace
\d{2} # match two digits
) # end capture group
{4} # match capture group 4 times
$ # match end of string
/x
Test:
'+33 6 65 36 36 36' =~ r3 #=> 0
'+3366536361' =~ r3 #=> nil
Nailed it!
Unionize!
r = Regexp.union(r1, r2, r3)
=> /(?x-mi:
^ # match start of string
0 # match 0
[1-9] # match any digit 1-9
\d{8} # match 8 digits
$ # match end of string
)|(?x-mi:^0[1-9](?:
[\s-] # match one whitespace or a hyphen
\d{2}){4}$
)|(?x-mi:
^ # match start of string
\+ # match +
\d\d # match two digits
\s\d # match one whitespace followed by a digit
(?: # begin a non-capture group
\s # match one whitespace
\d{2} # match two digits
) # end capture group
{4} # match capture group 4 times
$ # match end of string
)/
Let's try it:
['0665363636', '06 65 36 36 36', '06-65-36-36-36',
'+33 6 65 36 36 36'].any? { |s| (s =~ r).nil? } #=> false
['06 65 36 36', '2336653636', '+3366536361',
'0065363636'].all? { |s| (s =~ r).nil? } #=> true
Bingo!
Efficiency
Unionizing individual regexes may not produce the most efficient single regex. You must decide if the benefits of easier initial initial construction and testing, and ongoing maintenance, are worth the efficiency penalty. If efficiency is paramount, you might still construct the r this way, then tune it by hand.
Looks like you want to limit the allowed phone numbers to French mobile phone numbers only.
You made a list of valid and invalid strings, which is a good starting point. But then, I think you just wanted to write the pattern in one shot, which is error-prone.
Let's follow a simple methodology and go through the allowed list and craft a very simple regex for each one:
0665363636 -> ^06\d{8}$
06 65 36 36 36 -> ^06(?: \d\d){4}$
06-65-36-36-36 -> ^06(?:-\d\d){4}$
+33 6 65 36 36 36 -> ^\+33 6(?: \d\d){4}$
So far so good.
Now, just combine everything into one regex, and factor it a bit (the 06 part is common in the first 3 cases):
^06(?:\d{8}|(?: \d\d){4}|(?:-\d\d){4})|\+33 6(?: \d\d){4}$
Et voilà. Demo here.
As a side note, you should rather use:
^0[67](?:\d{8}|(?: \d\d){4}|(?:-\d\d){4})|\+33 [67](?: \d\d){4}$
As French mobile phone numbers can start in 07 too.

Figuring out how to decode obfuscated URL parameters

I have web based system that uses encrypted GET parameters. I need to figure out what encryption is used and create a PHP function to recreate it. Any ideas?
Example URL:
...&watermark=ISpQICAK&width=IypcOysK&height=IypcLykK&...
You haven't provided nearly enough sample data for us to reliably guess even the alphabet used to encode it, much less what structure it might have.
What I can tell, from the three sample values you've provided, is:
There is quite a lot of redundancy in the data — compare e.g. width=IypcOysK and height=IypcLykK (and even watermark=ISpQICAK, though that might be just coincidence). This suggests that the data is neither random nor securely encrypted (which would make it look random).
The alphabet contains a fairly broad range of upper- and lowercase letters, from A to S and from c to y. Assuming that the alphabet consists of contiguous letter ranges, that means a palette of between 42 and 52 possible letters. Of course, we can't tell with any certainty from the samples whether other characters might also be used, so we can't even entirely rule out Base64.
This is not the output of PHP's base_convert function, as I first guessed it might be: that function only handles bases up to 36, and doesn't output uppercase letters.
That, however, is just about all. It would help to see some more data samples, ideally with the plaintext values they correspond to.
Edit: The id parameters you give in the comments are definitely in Base64. Besides the distinctive trailing = signs, they both decode to simple strings of nine printable ASCII characters followed by a line feed (hex 0A):
_Base64___________Hex____________________________ASCII_____
JiJQPjNfT0MtCg== 26 22 50 3e 33 5f 4f 43 2d 0a &"P>3_OC-.
JikwPClUPENICg== 26 29 30 3c 29 54 3c 43 48 0a &)0<)T<CH.
(I've replaced non-printable characters with a . in the ASCII column above.) On the assumption that all the other parameters are Base64 too, let's see what they decode to:
_Base64___Hex________________ASCII_
ISpQICAK 21 2a 50 20 20 0a !*P .
IypcOysK 23 2a 5c 3b 2b 0a #*\;+.
IypcLykK 23 2a 5c 2f 29 0a #*\/).
ISNAICAK 21 23 40 20 20 0a !## .
IyNAPjIK 23 23 40 3e 32 0a ###>2.
IyNAKjAK 23 23 40 2a 30 0a ###*0.
ISggICAK 21 28 20 20 20 0a !( .
IikwICAK 22 29 30 20 20 0a ")0 .
IilAPCAK 22 29 40 3c 20 0a ")#< .
So there's definitely another encoding layer involved, but we can already see some patterns:
All decoded values consist of a constant number of printable ASCII characters followed by a trailing line feed character. This cannot be a coincidence.
Most of the characters are on the low end of the printable ASCII range (hex 20 – 7E). In particular, the lowest printable ASCII character, space = hex 20, is particularly common, especially in the watermark strings.
The strings in each URL resemble each other more than they resemble the corresponding strings from other URLs. (But there are resemblances between URLs too: for example, all the decoded watermark values begin with ! = hex 21.)
In fact, the highest numbered character that occurs in any of the strings is _ = hex 5F, while the lowest (excluding the line feeds) is space = hex 20. Their difference is hex 3F = decimal 63. Coincidence? I think not. I'll guess that the second encoding layer is similar to uuencoding: the data is split into 6-bit groups (as in Base64), and each group is mapped to an ASCII character simply by adding hex 20 to it.
In fact, it looks like the second layer might be uuencoding: the first bytes of each string have the right values to be uuencode length indicators. Let's see what we get if we try to decode them:
_Base64___________UUEnc______Hex________________ASCII___re-UUE____
JiJQPjNfT0MtCg== &"P>3_OC- 0b 07 93 fe f8 cd ...... &"P>3_OC-
JikwPClUPENICg== &)0<)T<CH 25 07 09 d1 c8 e8 %..... &)0<)T<CH
_Base64___UUEnc__Hex_______ASC__re-UUE____
ISpQICAK !*P 2b + !*P``
IypcOysK #*\;+ 2b c6 cb +.. #*\;+
IypcLykK #*\/) 2b c3 c9 +.. #*\/)
ISNAICAK !## 0e . !##``
IyNAPjIK ###>2 0e 07 92 ... ###>2
IyNAKjAK ###*0 0e 02 90 ... ###*0
ISggICAK !( 20 !(```
IikwICAK ")0 25 00 %. ")0``
IilAPCAK ")#< 26 07 &. ")#<`
This is looking good:
Uudecoding and re-encoding the data (using Perl's unpack "u" and pack "u") produces the original string, except that trailing spaces are replaced with ` characters (which falls within acceptable variation between encoders).
The decoded strings are no longer printable ASCII, which suggests that we might be closer to the real data.
The watermark strings are now single characters. In two cases out of three, they're prefixes of the corresponding width and height strings. (In the third case, which looks a bit different, the watermark might perhaps have been added to the other values.)
One more piece of the puzzle — comparing the ID strings and corresponding numeric values you give in the comments, we see that:
The numbers all have six digits. The first two digits of each number are the same.
The uudecoded strings all have six bytes. The first two bytes of each string are the same.
Coincidence? Again, I think not. Let's see what we get if we write the numbers out as ASCII strings, and XOR them with the uudecoded strings:
_Num_____ASCII_hex___________UUDecoded_ID________XOR______________
406747 34 30 36 37 34 37 25 07 09 d1 c8 e8 11 37 3f e6 fc df
405174 34 30 35 31 37 34 25 07 0a d7 cb eb 11 37 3f e6 fc df
405273 34 30 35 32 37 33 25 07 0a d4 cb ec 11 37 3f e6 fc df
What is this 11 37 3f e6 fc df string? I have no idea — it's mostly not printable ASCII — but XORing the uudecoded ID with it yields the corresponding ID number in three cases out of three.
More to think about: you've provided two different ID strings for the value 405174: JiJQPjNfT0MtCg== and JikwPCpVXE9LCg==. These decode to 0b 07 93 fe f8 cd and 25 07 0a d7 cb eb respectively, and their XOR is 2e 00 99 29 33 26. The two URLs from which these ID strings came from have decoded watermarks of 0e and 20 respectively, which accounts for the first byte (and the second byte is the same in both, anyway). Where the differences in the remaining four bytes come from is still a mystery to me.
That's going to be difficult. Even if you find the encryption method and keys, the original data is likely salted and the salt is probably varied with each record.
That's the point of encryption.

Resources