UTF-8 and Chinese Characters - utf-8

I have a function that calls up google API:
def get_lat_long(place):
place = re.sub('\s','+', str(place), flags=re.UNICODE)
url = 'https://maps.googleapis.com/maps/api/geocode/json?address=' + place
content = urllib2.urlopen(url).read()
obj = json.loads(content)
results = obj['results']
lat = long = None
if len(results) > 0:
loc = results[0]['geometry']['location']
lat = float(loc['lat'])
long = float(loc['lng'])
return [lat, long]
However, when I enter 師大附中 as a parameter,I get the error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
I tried doing str(place).encode('utf-8'), but I don't think that's the problem. I think it's because the function cannot read Chinese characters, so it needs to first convert Chinese characters to a unicode string before it reads it? That's just a guess though.

Assuming that place is of unicode type, you need to do something like this:
def get_lat_long(place):
place = urllib.quote_plus(place.encode('utf-8'))
url = 'https://maps.googleapis.com/maps/api/geocode/json?address=' + place

Related

Steganography program - converting python 2 to 3, syntax error in: base64.b64decode("".join(chars))

I have problem with the syntax in the last part of steg program. I tried to convert python 2 version (of the working code) to python 3, and this is the last part of it:
flag = base64.b64decode("".join(chars)) <- error
print(flag)
The program 1. encrypts the message in the Last Significiant Bits of the image as saves it as a new image. Then 2.decrypts the message, which is stored in "flag", and prints it.
* can the error be caused by the wrong type of input?:
message = input("Your message: ")
BELOW: UNHIDING PROGRAM
#coding: utf-8
import base64
from PIL import Image
image = Image.open("after.png")
extracted = ''
pixels = image.load()
#Iterating in 1st row
for x in range(0,image.width):
r,g,b = pixels[x,0]
# Storing LSB of each color
extracted += bin(r)[-1]
extracted += bin(g)[-1]
extracted += bin(b)[-1]
chars = []
for i in range(len(extracted)/8):
byte = extracted[i*8:(i+1)*8]
chars.append(chr(int(''.join([str(bit) for bit in byte]), 2)))
flag = base64.b64decode(''.join(chars))
print flag
BELOW: HIDING PROGRAM:
import bitarray
import base64
from PIL import Image
with Image.open('before.png') as im:
pixels=im.load()
message = input("Your message: ")
encoded_message = base64.b64encode(message.encode('utf-8'))
#Convert the message into an array of bits
ba = bitarray.bitarray()
ba.frombytes(encoded_message)
bit_array = [int(i) for i in ba]
#Duplicate the original picture
im = Image.open("before.png")
im.save("after.png")
im = Image.open("after.png")
width, height = im.size
pixels = im.load()
#Hide message in the first row
i = 0
for x in range(0,width):
r,g,b = pixels[x,0]
#print("[+] Pixel : [%d,%d]"%(x,0))
#print("[+] \tBefore : (%d,%d,%d)"%(r,g,b))
#Default values in case no bit has to be modified
new_bit_red_pixel = 255
new_bit_green_pixel = 255
new_bit_blue_pixel = 255
if i<len(bit_array):
#Red pixel
r_bit = bin(r)
r_last_bit = int(r_bit[-1])
r_new_last_bit = r_last_bit & bit_array[i]
new_bit_red_pixel = int(r_bit[:-1]+str(r_new_last_bit),2)
i += 1
if i<len(bit_array):
#Green pixel
g_bit = bin(g)
g_last_bit = int(g_bit[-1])
g_new_last_bit = g_last_bit & bit_array[i]
new_bit_green_pixel = int(g_bit[:-1]+str(g_new_last_bit),2)
i += 1
if i<len(bit_array):
#Blue pixel
b_bit = bin(b)
b_last_bit = int(b_bit[-1])
b_new_last_bit = b_last_bit & bit_array[i]
new_bit_blue_pixel = int(b_bit[:-1]+str(b_new_last_bit),2)
i += 1
pixels[x,0] = (new_bit_red_pixel,new_bit_green_pixel,new_bit_blue_pixel)
#print("[+] \tAfter: (%d,%d,%d)"%(new_bit_red_pixel,new_bit_green_pixel,new_bit_blue_pixel))
im.save('after.png')
error
ValueError: string argument should contain only ASCII characters
help for base64.b64decode says:
b64decode(s, altchars=None, validate=False)
Decode the Base64 encoded bytes-like object or ASCII string s.
...
Considering that in Python 2 there were "normal" strs and unicode-strs (u-prefixed), I suggest taking closer look at what produce "".join(chars). Does it contain solely ASCII characters?
I suggest adding:
print("Codes:",[ord(c) for c in chars])
directly before:
flag = base64.b64decode("".join(chars))
If there will be number >127 inside codes, that mean it might not work as it is fit only for pure ASCII strs.

TreeView.insert throws UnicodeDecodeError

I'm trying to populate TreeView with data from os.listdir(path).
All is ok until I read a directory name with a non-utf character. In my case 0xf6 which is not utf8.
As I'm running on Windows the charset from os.listdir() is Windows-1252 or ANSI.
How can I solve this problem to achieve correct display in TreeView?
Here some of my code:
def fill_tree(treeview, node):
if treeview.set(node, "type") != 'directory':
return
path = treeview.set(node, "fullpath")
# Delete the possibly 'dummy' node present.
treeview.delete(*treeview.get_children(node))
parent = treeview.parent(node)
for p in os.listdir(path):
ptype = None
p = os.path.join(path, p)
if os.path.isdir(p):
ptype = 'directory'
fname = os.path.split(p)[1].decode('cp1252').encode('utf8')
if ptype == 'directory':
oid = treeview.insert(node, 'end', text=fname, values=[p, ptype])
treeview.insert(oid, 0, text='dummy')
Regards
Göran
The UnicodeDecodeError is due to passing byte strings when the function is expecting Unicode strings. Python 2 attempts to implicitly decode byte strings to Unicode. Use Unicode strings explicitly instead. os.listdir(unicode_path) will return Unicode string, for example os.listdir(u'.').

How to decoding IFC using Ruby

In Ruby, I'm reading an .ifc file to get some information, but I can't decode it. For example, the file content:
"'S\X2\00E9\X0\jour/Cuisine'"
should be:
"'Séjour/Cuisine'"
I'm trying to encode it with:
puts ifcFileLine.encode("Windows-1252")
puts ifcFileLine.encode("ISO-8859-1")
puts ifcFileLine.encode("ISO-8859-5")
puts ifcFileLine.encode("iso-8859-1").force_encoding("utf-8")'
But nothing gives me what I need.
I don't know anything about IFC, but based solely on the page Denis linked to and your example input, this works:
ESCAPE_SEQUENCE_EXPR = /\\X2\\(.*?)\\X0\\/
def decode_ifc(str)
str.gsub(ESCAPE_SEQUENCE_EXPR) do
$1.gsub(/..../) { $&.to_i(16).chr(Encoding::UTF_8) }
end
end
str = 'S\X2\00E9\X0\jour/Cuisine'
puts "Input:", str
puts "Output:", decode_ifc(str)
All this code does is replace every sequence of four characters (/..../) between the delimiters, which will each be a Unicode code point in hexadecimal, with the corresponding Unicode character.
Note that this code handles only this specific encoding. A quick glance at the implementation guide shows other encodings, including an \X4 directive for Unicode characters outside the Basic Multilingual Plane. This ought to get you started, though.
See it on eval.in: https://eval.in/776980
If someone is interested, I wrote here a Python Code that decode 3 of the IFC encodings : \X, \X2\ and \S\
import re
def decodeIfc(txt):
# In regex "\" is hard to manage in Python... I use this workaround
txt = txt.replace('\\', 'µµµ')
txt = re.sub('µµµX2µµµ([0-9A-F]{4,})+µµµX0µµµ', decodeIfcX2, txt)
txt = re.sub('µµµSµµµ(.)', decodeIfcS, txt)
txt = re.sub('µµµXµµµ([0-9A-F]{2})', decodeIfcX, txt)
txt = txt.replace('µµµ','\\')
return txt
def decodeIfcX2(match):
# X2 encodes characters with multiple of 4 hexadecimal numbers.
return ''.join(list(map(lambda x : chr(int(x,16)), re.findall('([0-9A-F]{4})',match.group(1)))))
def decodeIfcS(match):
return chr(ord(match.group(1))+128)
def decodeIfcX(match):
# Sometimes, IFC files were made with old Mac... wich use MacRoman encoding.
num = int(match.group(1), 16)
if (num <= 127) | (num >= 160):
return chr(num)
else:
return bytes.fromhex(match.group(1)).decode("macroman")

In ruby, how do you do a DES encryption with a ECB mode and PKCS7 padding

I am trying to convert some C# code into ruby. Here is a snippet of the C# code:
string Encrypt(string toEncrypt, string key) {
byte[] keyArray;
byte[] toEncryptArray = UTF8Encoding.UTF8.GetBytes(toEncrypt);
keyArray = UTF8Encoding.UTF8.GetBytes(key);
TripleDESCryptoServiceProvider tdes = new TripleDESCryptoServiceProvider();
tdes.Key = keyArray;
tdes.Mode = CipherMode.ECB;
tdes.Padding = PaddingMode.PKCS7;
ICryptoTransform cTransform = tdes.CreateEncryptor();
byte[] resultArray = cTransform.TransformFinalBlock(toEncryptArray, 0, toEncryptArray.Length);
return Convert.ToBase64String(resultArray, 0, resultArray.Length);
My biggest issue seems to be around getting the padding specification right.
Here is what I have so far...
des = OpenSSL::Cipher::Cipher.new('des-ecb')
des.encrypt # OpenSSL::PKCS7 has to passed in somewhere
des.key = '--The Key--'
update_value = des.update(val)
After running through all of the available OpenSSL ciphers and testing to see if any of the outputs resulted in the same encrypted string with no success, I then did the same thing, but this time passing in a padding integer (from 0 - 20), and iterated over all of the ciphers again.
This resulted in a success!
The final code:
def encrypt val
des = OpenSSL::Cipher::Cipher.new 'DES-EDE3'
des.encrypt
des.padding = 1
des.key = '--SecretKey--'
update_value = des.update(val)
up_final = update_value + des.final
Base64.encode64(up_final).gsub(/\n/, "")
end
The biggest thing to note is the I had to remove the newline characters, and had to put in a padding of 1.
I'm still confused on the padding...but, wanted to update everyone on what I found in case someone runs into this in the future
:Update: The padding didn't matter after all...if you take out that line it still encrypts the same as if you had any number in there...the big difference I was missing was taking out the newlines
Try 'des-ede3-ecb' or just '3des' as names instead. 'des-ecb' is unlikely to return a triple DES cipher.
PKCS#7 is normally the default for OpenSSL, so you may not have to specify it.
Make sure that your character-encoding (UTF-8, compatible with ASCII for values up to 7F) and encoding (base 64) matches as well.

How to convert UTF8 byte arrays to string in lua

I have a table like this
table = {57,55,0,15,-25,139,130,-23,173,148,-24,136,158}
it is utf8 encoded byte array by php unpack function
unpack('C*',$str);
how can I convert it to utf-8 string I can read in lua?
Lua doesn't provide a direct function for turning a table of utf-8 bytes in numeric form into a utf-8 string literal. But it's easy enough to write something for this with the help of string.char:
function utf8_from(t)
local bytearr = {}
for _, v in ipairs(t) do
local utf8byte = v < 0 and (0xff + v + 1) or v
table.insert(bytearr, string.char(utf8byte))
end
return table.concat(bytearr)
end
Note that none of lua's standard functions or provided string facilities are utf-8 aware. If you try to print utf-8 encoded string returned from the above function you'll just see some funky symbols. If you need more extensive utf-8 support you'll want to check out some of the libraries mention from the lua wiki.
Here's a comprehensive solution that works for the UTF-8 character set restricted by RFC 3629:
do
local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
function utf8(decimal)
if decimal<128 then return string.char(decimal) end
local charbytes = {}
for bytes,vals in ipairs(bytemarkers) do
if decimal<=vals[1] then
for b=bytes+1,2,-1 do
local mod = decimal%64
decimal = (decimal-mod)/64
charbytes[b] = string.char(128+mod)
end
charbytes[1] = string.char(vals[2]+decimal)
break
end
end
return table.concat(charbytes)
end
end
function utf8frompoints(...)
local chars,arg={},{...}
for i,n in ipairs(arg) do chars[i]=utf8(arg[i]) end
return table.concat(chars)
end
print(utf8frompoints(72, 233, 108, 108, 246, 32, 8364, 8212))
--> Héllö €—

Resources