I been trying to encode strings using protoc cli utility.
Noticed that output still contains plain text.
What am i doing wrong?
osboxes#osboxes:~/proto/bin$ cat ./teststring.proto
syntax = "proto2";
message Test2 {
optional string b = 2;
}
echo b:\"my_testing_string\"|./protoc --encode Test2 teststring.proto>result.out
result.out contains:
^R^Qmy_testing_string
protoc versions libprotoc 3.6.0 and libprotoc 2.5.0
Just to formalize in an answer:
The command as written should be fine; the output is protobuf binary - it just resembles text because protobuf uses utf-8 to encode strings, and your content is dominated by a string. However, despite this: the file isn't actually text, and you should usually use a hex viewer or similar if you need to inspect it.
If you want to understand the internals of a file, https://protogen.marcgravell.com/decode is a good resource - it rips an input file or hex string following the protocol rules, and tells you what each byte means (field headers, length prefixes, payloads, etc).
I'm guessing your file is actually:
(hex) 10 11 6D 79 5F etc
i.e. 0x10 = "field 2, length prefixed", 0x11 = 17 (the payload length, encoded as varint), then "my_testing_string" encoded as 17 bytes of UTF8.
protoc --proto_path=${protobuf_path} --encode=${protobuf_message} ${protobuf_file} < ${source_file} > ${output_file}
and in this case:
protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto < ${source.txt} > ./output.bin
or:
cat b:\"my_testing_string\" | protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto > ./output.bin
I'm using DictWriter to write a dictionary to a csv after some geolocation work.
location = geolocator.reverse(coords)
row["address"] = location.address
writer.writerow(row)
Which generates this:
File "C:\bin64\python\3.4.3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200e' in
position 118: character maps to <undefined>
My problem was in how I was opening the file. I suppose I should have posted that in the question. I needed to set the encoding upon opening the file.
with open('results.csv', mode='w', encoding='utf-8', newline='') as file:
...
I am using nokogiri to decode some xml. This xml does have some html as values. I am seeing some strange behavior when parsing this. It appears nokogiri is removing some of the html encoded tags, so when i parse the html I am unable to decode it properly. See examples below:
doc = Nokogiri::XML '<?xml version="1.0"?><manifest
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
identifier="Manifest-eaf97d26-aa83-4399-8e9b-ae9f6f5fc6a2"
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
xmlns:imsmd="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:imsqti="http://www.imsglobal.org/xsd/imsqti_v2p1">
<imsmd:langstring><p>
These are the<strong>instructions</strong> for the pool</p></imsmd:langstring>'
this yields the following value:
"<?xml version=\"1.0\"?>\n<manifest xmlns=\"http://www.imsglobal.org/xsd/imscp_v1p1\" xmlns:imsmd=\"http://www.imsglobal.org/xsd/imsmd_v1p2\" xmlns:imsqti=\"http://www.imsglobal.org/xsd/imsqti_v2p1\" identifier=\"Manifest-eaf97d26-aa83-4399-8e9b-ae9f6f5fc6a2\">\n<imsmd:langstring>p
These are thestrong instructions/strong for the pool/p</imsmd:langstring></manifest>\n"
Notice how the < > tags are missing. However the following works as expected.
doc = Nokogiri::XML '<?xml version="1.0"?><imsmd:langstring><p>
These are the<strong> instructions</strong> for the pool</p></imsmd:langstring>'
and gives the following result
"<?xml version=\"1.0\"?>\n<imsmd:langstring><p>
These are the<strong> instructions</strong> for the pool</p></imsmd:langstring>\n"
I am sure I am missing something but can't figure out what is causing this.
When I try to submit a textarea with Mechanize and Ruby 2.0, I always get an
Encoding::UndefinedConversionError: U+0151 from UTF-8 to ISO-8859-1
Then I tryied to convert the text with Iconv, I got a similar result:
Iconv.iconv("LATIN1", "UTF-8", text)
I get this error message:
Iconv::IllegalSequence: "őzködik, melyet "...
As the text contains east-european characters. What can I do to avoid this kind of inconveniences or how can I convert properly between different encodings?
I have found an elegant solution:
replacements = [["À", "À"], ["Á", "Á"], ["Â", "Â"], ["Ã", "Ã"], ["Ä", "Ä"], ["Å", "Å"], ["Æ", "Æ"], ["Ç", "Ç"], ["È", "È"], ["É", "É"], ["Ê", "Ê"], ["Ë", "Ë"], ["Ì", "Ì"], ["Í", "Í"], ["Î", "Î"], ["Ï", "Ï"], ["Ð", "Ð"], ["Ñ", "Ñ"], ["Ò", "Ò"], ["Ó", "Ó"], ["Ô", "Ô"], ["Õ", "Õ"], ["Ö", "Ö"], ["Ø", "Ø"], ["Ù", "Ù"], ["Ú", "Ú"], ["Û", "Û"], ["Ü", "Ü"], ["Ý", "Ý"], ["Þ", "Þ"], ["ß", "ß"], ["à", "à"], ["á", "á"], ["â", "â"], ["ã", "ã"], ["ä", "ä"], ["å", "å"], ["æ", "æ"], ["ç", "ç"], ["è", "è"], ["é", "é"], ["ê", "ê"], ["ë", "ë"], ["ì", "ì"], ["í", "í"], ["î", "î"], ["ï", "ï"], ["ð", "ð"], ["ñ", "ñ"], ["ò", "ò"], ["ó", "ó"], ["ô", "ô"], ["õ", "õ"], ["ö", "ö"], ["ø", "ø"], ["ù", "ù"], ["ú", "ú"], ["û", "û"], ["ü", "ü"], ["ý", "ý"], ["þ", "þ"], ["ÿ", "ÿ"]]
def replace(str,replacements)
replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
return str
end
my_string=replace(my_string,replacements)
I'm having problems in decoding a string using Base64.decode64 in Ruby. As a test, I'm using this site that decodes strings in php: https://rnd.feide.no/simplesaml/module.php/saml2debug/debug.php.
As a test, I'm using this string:
fZJNT%2BMwEIbvSPwHy%2Fd8tMvHympSdUGISuwS0cCBm%2BtMUwfbk%2FU4zfLvSVMq2Euv45n3fd7xzOb%2FrGE78KTRZXwSp5yBU1hpV2f8ubyLfvJ5fn42I2lNKxZd2Lon%2BNsBBTZMOhLjQ8Y77wRK0iSctEAiKLFa%2FH4Q0zgVrceACg1ny9uMy7rCdaM2%2Bs0BWrtppK2UAdeoVjW2ruq1bevGImcvR6zpHmtJ1MHSUZAuDKU0vY7Si2h6VU5%2BiMuJuLx65az4dPql3SHBKaz1oYnEfVkWUfG4KkeBna7A%2Fxm6M14j1gZihZazBRH4MODcoKPOgl%2BB32kFz08PGd%2BG0JJIkr7v46%2BhRCaEpod17DCRivYZCkmkd4N28B3wfNyrGKP5bws9DS6PKDz%2FMpsl36Tyz%2F%2Fax1jeFmi0emcLY7C%2F8SDD0Z7dobcynHbbV3QVbcZW0TlqQemNhoqzJD%2B4%2Fn8Yw7l8AA%3D%3D
The output should be:
<?xml version="1.0" encoding="UTF-8"?>
<samlp:AuthnRequest xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="agdobjcfikneommfjamdclenjcpcjmgdgbmpgjmo" Version="2.0" IssueInstant="2007-04-26T13:51:56Z" ProtocolBinding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" ProviderName="google.com" AssertionConsumerServiceURL="https://www.google.com/a/solweb.no/acs" IsPassive="true"><saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">google.com</saml:Issuer><samlp:NameIDPolicy AllowCreate="true" Format="urn:oasis:names:tc:SAML:2.0:nameid-format:unspecified" /></samlp:AuthnRequest>
But in Ruby I keep getting a strange output:
}\222MO?0\206?H?\a??|\264???jRuA\210J????\233?LS\aۓ?8???IS*?K\257??}????\377\254a;??e|\022\247\234\201SXiWg????~?y~~6#iM+\026]غ'??6L:\022?C?;?J?$\234\264#\"(\261Z?~?8\255ǀ\n\rg??\214˺?u\2436??Z?i\244\255\224\001רV5\266\256?m??"g/G\254?kI???Q\220.\f\2454\275\216ҋhzUN~\210ˉ\270\274z??t???!?)\254????}Y\026Q?*G\201\235\256??\031\2723^#?b\205\226\263\005\021?0?ܠ\243_\201?i\005?O\017\031߆ВH\222\276??\241D&\204\246\207u?0?\212?
I\244w\203v??|ܫ\030\243?o
.\217(<\3772\233%ߤ?????X?h\264zg\vc\260\277? ??\236ݡ\2672\234v?Wt\025m?V?9jA鍆\212\263$?\270\376\177\030ù|\000
The code I use is:
require 'cgi'
require 'base64'
Base64::decode64(CGI::unescape('fZJNT%2BMwEIbvSPwHy%2Fd8tMvHympSdUGISuwS0cCBm%2BtMUwfbk%2FU4zfLvSVMq2Euv45n3fd7xzOb%2FrGE78KTRZXwSp5yBU1hpV2f8ubyLfvJ5fn42I2lNKxZd2Lon%2BNsBBTZMOhLjQ8Y77wRK0iSctEAiKLFa%2FH4Q0zgVrceACg1ny9uMy7rCdaM2%2Bs0BWrtppK2UAdeoVjW2ruq1bevGImcvR6zpHmtJ1MHSUZAuDKU0vY7Si2h6VU5%2BiMuJuLx65az4dPql3SHBKaz1oYnEfVkWUfG4KkeBna7A%2Fxm6M14j1gZihZazBRH4MODcoKPOgl%2BB32kFz08PGd%2BG0JJIkr7v46%2BhRCaEpod17DCRivYZCkmkd4N28B3wfNyrGKP5bws9DS6PKDz%2FMpsl36Tyz%2F%2Fax1jeFmi0emcLY7C%2F8SDD0Z7dobcynHbbV3QVbcZW0TlqQemNhoqzJD%2B4%2Fn8Yw7l8AA%3D%3D'))
What could possibly be wrong? Thanks in advance.
I have no idea where you got the idea that that string of yours is a base64-encoded version of your XML. If you pass the first bit of it (<?x) through Base64.encode64() then CGI.escape(), you get:
PD94
at the start, which is nothing like your string. In fact, your first four characters "fZJN" are values 31, 25, 9 and 13 in base 64 so will give you:
011111 011001 001001 001101
then, grouping them in octets instead of sextets (I guess that's the right word):
01111101 10010010 01001101
7D 92 4D
which are not the characters you're expecting to see.
Putting the whole string in gives you:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4gPHNhbWxw
OkF1dGhuUmVxdWVzdCB4bWxuczpzYW1scD0idXJuOm9hc2lzOm5hbWVzOnRj
OlNBTUw6Mi4wOnByb3RvY29sIiBJRD0iYWdkb2JqY2Zpa25lb21tZmphbWRj
bGVuamNwY2ptZ2RnYm1wZ2ptbyIgVmVyc2lvbj0iMi4wIiBJc3N1ZUluc3Rh
bnQ9IjIwMDctMDQtMjZUMTM6NTE6NTZaIiBQcm90b2NvbEJpbmRpbmc9InVy
bjpvYXNpczpuYW1lczp0YzpTQU1MOjIuMDpiaW5kaW5nczpIVFRQLVBPU1Qi
IFByb3ZpZGVyTmFtZT0iZ29vZ2xlLmNvbSIgQXNzZXJ0aW9uQ29uc3VtZXJT
ZXJ2aWNlVVJMPSJodHRwczovL3d3dy5nb29nbGUuY29tL2Evc29sd2ViLm5v
L2FjcyIgSXNQYXNzaXZlPSJ0cnVlIj48c2FtbDpJc3N1ZXIgeG1sbnM6c2Ft
bD0idXJuOm9hc2lzOm5hbWVzOnRjOlNBTUw6Mi4wOmFzc2VydGlvbiI+Z29v
Z2xlLmNvbTwvc2FtbDpJc3N1ZXI+PHNhbWxwOk5hbWVJRFBvbGljeSBBbGxv
d0NyZWF0ZT0idHJ1ZSIgRm9ybWF0PSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FN
TDoyLjA6bmFtZWlkLWZvcm1hdDp1bnNwZWNpZmllZCIgLz48L3NhbWxwOkF1
dGhuUmVxdWVzdD4=
When you escape that, you get:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4gPHNhbWxw%0AOkF1dGhuUmVxdWVzdCB4bWxuczpzYW1scD0idXJuOm9hc2lzOm5hbWVzOnRj%0AOlNBTUw6Mi4wOnByb3RvY29sIiBJRD0iYWdkb2JqY2Zpa25lb21tZmphbWRj%0AbGVuamNwY2ptZ2RnYm1wZ2ptbyIgVmVyc2lvbj0iMi4wIiBJc3N1ZUluc3Rh%0AbnQ9IjIwMDctMDQtMjZUMTM6NTE6NTZaIiBQcm90b2NvbEJpbmRpbmc9InVy%0AbjpvYXNpczpuYW1lczp0YzpTQU1MOjIuMDpiaW5kaW5nczpIVFRQLVBPU1Qi%0AIFByb3ZpZGVyTmFtZT0iZ29vZ2xlLmNvbSIgQXNzZXJ0aW9uQ29uc3VtZXJT%0AZXJ2aWNlVVJMPSJodHRwczovL3d3dy5nb29nbGUuY29tL2Evc29sd2ViLm5v%0AL2FjcyIgSXNQYXNzaXZlPSJ0cnVlIj48c2FtbDpJc3N1ZXIgeG1sbnM6c2Ft%0AbD0idXJuOm9hc2lzOm5hbWVzOnRjOlNBTUw6Mi4wOmFzc2VydGlvbiI%2BZ29v%0AZ2xlLmNvbTwvc2FtbDpJc3N1ZXI%2BPHNhbWxwOk5hbWVJRFBvbGljeSBBbGxv%0Ad0NyZWF0ZT0idHJ1ZSIgRm9ybWF0PSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FN%0ATDoyLjA6bmFtZWlkLWZvcm1hdDp1bnNwZWNpZmllZCIgLz48L3NhbWxwOkF1%0AdGhuUmVxdWVzdD4%3D%0A
So, the bottom line is that you're getting junk from the decode because the data is not of the correct format.
It appears that the data is also deflated/compressed.
require 'zlib'
inflated=Base64::decode64(CGI::unescape('fZJNT%2BMwEIbvSPwHy%2Fd8tMvHympSdUGISuwS0cCBm%2BtMUwfbk%2FU4zfLvSVMq2Euv45n3fd7xzOb%2FrGE78KTRZXwSp5yBU1hpV2f8ubyLfvJ5fn42I2lNKxZd2Lon%2BNsBBTZMOhLjQ8Y77wRK0iSctEAiKLFa%2FH4Q0zgVrceACg1ny9uMy7rCdaM2%2Bs0BWrtppK2UAdeoVjW2ruq1bevGImcvR6zpHmtJ1MHSUZAuDKU0vY7Si2h6VU5%2BiMuJuLx65az4dPql3SHBKaz1oYnEfVkWUfG4KkeBna7A%2Fxm6M14j1gZihZazBRH4MODcoKPOgl%2BB32kFz08PGd%2BG0JJIkr7v46%2BhRCaEpod17DCRivYZCkmkd4N28B3wfNyrGKP5bws9DS6PKDz%2FMpsl36Tyz%2F%2Fax1jeFmi0emcLY7C%2F8SDD0Z7dobcyHbbV3QVbcZW0TlqQemNhoqzJD%2B4%2Fn8Yw7l8AA%3D%3D'))
zlib = Zlib::Inflate.new(-Zlib::MAX_WBITS)
zlib.inflate(inflated)