crc32 hash integrated with md5 in python - algorithm

I am trying to create a CRC32
But i keep getting this error for crcvalue = zlib.crc32(crcvalue)
builtins.TypeError: a bytes-like object is required, not 'int'

You are passing 0 to crc32 instead of the file name. This just makes no sense.
just encode the filename as bytes using ascii encoding and pass those bytes to the crc method:
>>> import zlib
>>> x = "filename"
>>> zlib.crc32(x.encode('ascii'))
1007413605

Related

How to encode protocol buffer string to binary using protoc

I been trying to encode strings using protoc cli utility.
Noticed that output still contains plain text.
What am i doing wrong?
osboxes#osboxes:~/proto/bin$ cat ./teststring.proto
syntax = "proto2";
message Test2 {
optional string b = 2;
}
echo b:\"my_testing_string\"|./protoc --encode Test2 teststring.proto>result.out
result.out contains:
^R^Qmy_testing_string
protoc versions libprotoc 3.6.0 and libprotoc 2.5.0
Just to formalize in an answer:
The command as written should be fine; the output is protobuf binary - it just resembles text because protobuf uses utf-8 to encode strings, and your content is dominated by a string. However, despite this: the file isn't actually text, and you should usually use a hex viewer or similar if you need to inspect it.
If you want to understand the internals of a file, https://protogen.marcgravell.com/decode is a good resource - it rips an input file or hex string following the protocol rules, and tells you what each byte means (field headers, length prefixes, payloads, etc).
I'm guessing your file is actually:
(hex) 10 11 6D 79 5F etc
i.e. 0x10 = "field 2, length prefixed", 0x11 = 17 (the payload length, encoded as varint), then "my_testing_string" encoded as 17 bytes of UTF8.
protoc --proto_path=${protobuf_path} --encode=${protobuf_message} ${protobuf_file} < ${source_file} > ${output_file}
and in this case:
protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto < ${source.txt} > ./output.bin
or:
cat b:\"my_testing_string\" | protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto > ./output.bin

Ruby SHA1 RSA signing different from command line OpenSSL?

I'm trying to implement RSA SHA1 signature verification.
To my surprise, command line OpenSSL tool doesn't generate the same key as the Ruby OpenSSL.
If I run those commands :
MacBook-Pro-de-Geoffrey:ssl_tests Escaflowne$ cat data.txt
000
MacBook-Pro-de-Geoffrey:ssl_tests Escaflowne$ openssl dgst -sha1 -binary -sign prvkey.pem -out sig.bin data.txt
MacBook-Pro-de-Geoffrey:ssl_tests Escaflowne$ openssl base64 -in sig.bin -out sig64.txt
MacBook-Pro-de-Geoffrey:ssl_tests Escaflowne$ cat sig64.txt
AJEh2kA7O3j624Kdl7UCGN1HiEk/v2LQudB+cjxw1CfmRTjcSPBjUE/EAwy8NEut
K4zYgfRwwTs7NY3AwYiUEtAe5yohUM0Qv17qSDW+G4IWjwe9PKE7Sl00umiMdszA
q/1hqeQlHKgjme7YO7H6i1UcAXmriOOjn+ySRaovsHw=
So final base64 result in command line is :
AJEh2kA7O3j624Kdl7UCGN1HiEk/v2LQudB+cjxw1CfmRTjcSPBjUE/EAwy8NEut
K4zYgfRwwTs7NY3AwYiUEtAe5yohUM0Qv17qSDW+G4IWjwe9PKE7Sl00umiMdszA
q/1hqeQlHKgjme7YO7H6i1UcAXmriOOjn+ySRaovsHw=
Now, if I try signing it through my ruby script :
def sign_message(message)
privkey = OpenSSL::PKey::RSA.new(File.read(Rails.root.join('lib', 'payment', 'prvkey.pem')))
digest = OpenSSL::Digest.new('sha1')
expected_sign = privkey.sign(digest, message)
base_64_expected_sign = [expected_sign].pack('m')
puts "Expected Signature"
puts expected_sign
puts "Base 64 Expected Signature"
puts base_64_expected_sign
return base_64_expected_sign
end
And calling the function like this :
def test_sign
message = "000"
message_signature = sign_message(message)
puts "Message Signature : #{message_signature}"
puts "Valid : #{verify_signature(message_signature, message)}"
end
I get the output :
Expected Signature
??|??n?^~?T_1Y#??BR??u???k x?*????S?L?:.7
t??tc?)崪? ?}DMp?p2??4?D-f??jT;!e
?k??5??
Base 64 Expected Signature
QjhL1zQoUdGFLVCMg06/CKeE/HdhRTOhJ/p09wkWeK0qD/afsxfcU7tMtDou
Nw3rwXw/5W68XhZ+BK1UXwIxWUDbFYlCUpu6HnWTmI5rC3QP+f50Y8kp5bSq
gQkekH1ETXDmcDKvExeSNKVELWYe3uwTalQ7IWUMyWvnF541rvo=
Message Signature : QjhL1zQoUdGFLVCMg06/CKeE/HdhRTOhJ/p09wkWeK0qD/afsxfcU7tMtDou
Nw3rwXw/5W68XhZ+BK1UXwIxWUDbFYlCUpu6HnWTmI5rC3QP+f50Y8kp5bSq
gQkekH1ETXDmcDKvExeSNKVELWYe3uwTalQ7IWUMyWvnF541rvo=
So final ruby OpenSSL signature is :
QjhL1zQoUdGFLVCMg06/CKeE/HdhRTOhJ/p09wkWeK0qD/afsxfcU7tMtDou
Nw3rwXw/5W68XhZ+BK1UXwIxWUDbFYlCUpu6HnWTmI5rC3QP+f50Y8kp5bSq
gQkekH1ETXDmcDKvExeSNKVELWYe3uwTalQ7IWUMyWvnF541rvo=
Versus command line :
AJEh2kA7O3j624Kdl7UCGN1HiEk/v2LQudB+cjxw1CfmRTjcSPBjUE/EAwy8NEut
K4zYgfRwwTs7NY3AwYiUEtAe5yohUM0Qv17qSDW+G4IWjwe9PKE7Sl00umiMdszA
q/1hqeQlHKgjme7YO7H6i1UcAXmriOOjn+ySRaovsHw=
I've been struggling with this for some time now and I don't understand what could be making a difference!
UPDATE :
Well, apparently results match if I replace my message variable with File.read(Rails.root.join('lib', 'payment', 'data.txt'))
So basically, using a string with the same value as what's in the text file doesn't give the same result.
This means it's encoding related right ?
UPDATE 2 :
So the file says its encoded in us-ascii if I run file -I data.txt
However, if I do message.encoding.name it says its loaded as UTF-8
Also, message.encode('ascii') does not alter the result of the generated signature, it still corresponds with the command line openssl.
As soon as I switch to a string "000".encode('utf-8') or "000".encode('ascii'), the signatures don't match anymore.
So encoding doesn't seem to play a role at all.
How come there's a difference between the exact same content whether it comes from reading a file or written as a string ?
The file data.txt has a trailing newline that you are not taking into account in your code. Using
message = "000\n"
should work.
You could also do
message = File.binread("data.txt")
to make sure you get the exact data as the command line.

Why does YAML interpret '0777' as 511?

In my YAML file I have:
foo:
- '0777'
When I load the file in my code (result = YAML.load_file(...)) I get
result[:foo] = [511]
This happens on Ubuntu. On Mac it is correct (["0777"]). When changed to:
foo:
- "'0777'"
it works on Ubuntu but the string consists the quotes: '0777'.
Why?
In Ruby for Integer if the argument is string, and happen to start with 0x, 0b, 0, it is interpreted as hex, binary, octal string respectively.
Therefore here 0777 is being treated as an octal string. Since '0777' octal = '511' decimal, you are getting 511 as result.
reference

Python 3 - GeoPy and encoding

I'm using DictWriter to write a dictionary to a csv after some geolocation work.
location = geolocator.reverse(coords)
row["address"] = location.address
writer.writerow(row)
Which generates this:
File "C:\bin64\python\3.4.3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200e' in
position 118: character maps to <undefined>
My problem was in how I was opening the file. I suppose I should have posted that in the question. I needed to set the encoding upon opening the file.
with open('results.csv', mode='w', encoding='utf-8', newline='') as file:
...

Using ruby SAX parsers for GB2312 encoded xml

Good day,
I have a lot of big xml files that i need to parse, but problems is they have 'gb2312' encoding. I would normaly use SAX parser for this.
So here is in example of xml:
<?xml version="1.0" encoding="gb2312"?>
<Root>
<ValueList Count="112290" FieldCount="11">
<Item1 Value1="23743" Value2="Дипломатия � Пустой кувшин" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
<Item2 Value1="6611" Value2="ДЛ � 018 омела � золотой кинжал" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
<Item3 Value1="6608" Value2="Наука (ДЛ)�круг фей 021�тяпка" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
<Item4 Value1="6612" Value2="Знаки ДЛ � 003руны � разрушение" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
....
</Root>
I'm trying to use Nokogiri SAX (also tried libxml-ruby with same result) parser:
require 'nokogiri'
class SchemaParser < Nokogiri::XML::SAX::Document
def initialize
#cnt = 0
end
def start_element name, attrs =[]
if name == "Item1"
#cnt+= 1
puts #cnt
end
end
end
parser = Nokogiri::XML::SAX::Parser.new(SchemaParser.new)
parser.parse_io(File.open('2_4_EQUIPMENT_ESSENCE.xml'), 'gb2312')
But this gives error "`check_encoding': 'GB2312' is not a valid encoding (ArgumentError)". If I remove encoding declaration and let Nokogiri detect encoding himself, I will receive this error:
encoding error : input conversion failed due to input error, bytes 0xA8 0x43 0x20 0xA7
encoding error : input conversion failed due to input error, bytes 0xA8 0x43 0x20 0xA7
I/O error : encoder error
I also tried to open File with proper encoding, but that didn't help SAX parser:
[3] pry(main)> f = File.open('2_4_EQUIPMENT_ESSENCE.xml', "r:gb2312")
=> #<File:2_4_EQUIPMENT_ESSENCE.xml>
[4] pry(main)> f.external_encoding.name
=> "GB2312"
Did anyone use 'gb2312' encoding with SAX parsers in ruby? Any recommendations how to proceed?
It seems the issue is that Libxml2 does not support the GB2312 encoding (see here for a list of supported encodings).
I'm not sure if you have tried this, but I think you can work around this by removing the encoding declaration from the XML files (so Libxml2 does not try to transcode the data) and set the external encoding of the File object to GB2312, because then Ruby will transcode the file to UTF-8 as it is read, and from then on everything will remain as UTF-8.
So, here is my workaround.
Problems:
Some of characters presented in xml are not 'gb2312' encoding, I have found that 'GB18030' would be a better choice with full Chinese characters.
I converted all xml's to utf8, so i can use SAX parser.
I ended up with this rake task:
desc "convert chinese xml files to utf-8"
task :convert do
rm_rf 'data/utf8'
mkdir 'data/utf8'
Dir.foreach('data') {|f|
if f.end_with?('.xml')
puts "converted:: data/utf8/#{f}" if system("iconv -f GB18030 -t UTF-8 data/#{f} > data/utf8/#{f}")
end
}
#replace encodings for xml files
system("bundle exec ruby -pi -e \"gsub(/gb2312/, 'UTF-8')\" data/utf8/*.xml")
end

Resources