Encoding of bash terminal (representation of Cyrillic letters) - bash

I have been using git-bash terminal, but, as I suppose, after editing font I have got a trouble - Cyrillic characters are not represented in the terminal. Instead of them it shows this: \xd0\x9f\xd1\x80\xd0\xb8\xd0\xbd\xd1\x8f\xd1\x82\xd0\xbe!.
I got this string using API:
import requests
import os
api_key = os.getenv('yandex_praktikum_api_key')
HEADERS = {
'Authorization': f'OAuth {api_key}'
}
response = requests.get(
'https://praktikum.yandex.ru/api/user_api/homework_statuses/',
params={'from_date': 0},
headers=HEADERS
)
print(response.text)
I tried to fix this by changing font back to Lucida Console (14px), but it didn't work. I checked my terminal encoding typing echo $LC_CTYPE and got ru_RU.UTF-8. Then after typing $LANG I got empty string. So, how can I fix it?
OS: Windows 10. File encoding is UTF-8. Terminal type is x-term.
Locale shows this:
User#DESKTOP-CVQ282P MINGW64 ~/Desktop
$ locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

Related

Encoding of requirement file using pip

On OSX (El Capitan, 10.11.6), using virtualenv (15.1.0), I am getting an error when installing requirements from a text file with pip (9.0.1):
virtualenv env
source env/bin/activate
pip install -r requirements.txt
but not when looping over each requirement manually:
for r in $(cat requirements.txt); do pip install "$r"; done
This makes me think there might be an issue with the default encoding assumed by pip when reading the requirements file. Is there a way (environment variable, I presume) to set the default encoding of requirement files?
The error I get is:
Exception:
Traceback (most recent call last):
File "/path/to/env/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/path/to/env/lib/python2.7/site-packages/pip/commands/install.py", line 312, in run
wheel_cache
File "/path/to/env/lib/python2.7/site-packages/pip/basecommand.py", line 295, in populate_requirement_set
wheel_cache=wheel_cache):
File "/path/to/env/lib/python2.7/site-packages/pip/req/req_file.py", line 84, in parse_requirements
filename, comes_from=comes_from, session=session
File "/path/to/env/lib/python2.7/site-packages/pip/download.py", line 422, in get_file_content
content = auto_decode(f.read())
File "/path/to/env/lib/python2.7/site-packages/pip/utils/encoding.py", line 31, in auto_decode
return data.decode(locale.getpreferredencoding(False))
LookupError: unknown encoding:
The following test code:
#!/usr/bin/env python
import sys
import locale
print sys.stdin.encoding
print locale.getpreferredencoding()
print locale.getpreferredencoding(False)
print sys.getdefaultencoding()
print sys.getfilesystemencoding()
returns:
None
US-ASCII
ascii
utf-8
From the command-line:
$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="utf-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=
According to your test the locale.getpreferredencoding() is absent — it should look like this:
UTF-8
UTF-8
US-ASCII
ascii
utf-8
Doing locale mine looks almost like yours (other than US vs GB and LC_CTYPE). It does seem peculiar that LC_CTYPE seems different, and maybe worth looking into as to why.
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
There are a couple of things you can try to correct the issue, the first being your shebang:
#!/usr/bin/python
Try changing it to this perhaps, as #!/usr/bin/env python might not set the locale properly. If that doesn't work you can always try forcing the encoding within your script:
import locale
loc = locale.getlocale()
locale.setlocale(locale.LC_ALL, '') # use user's preferred locale
locale.setlocale(locale.LC_ALL, 'C') # use default (C) locale
locale.setlocale(locale.LC_ALL, loc) # restore saved locale
# OR
locale.getpreferredencoding(do_setlocale=True) or "utf-8"
# OR
if locale.getpreferredencoding() == '':
locale.setlocale(locale.LC_ALL,'UTF-8')
Ultimately you'll want to figure out why locale.getpreferredencoding() is coming up empty for the first couple tests, and why you have a mismatched LC_CTYPE set in your locale.

UTF-8 characters not displayed correctly in console

I'm implementing a card game in Clojure and I want to use unicode characters to represent suits:
(def color-str "Maps card colors to strings"
{ :kreuz "♣", :grun "♠", :herz "♥", :schell "♦" })
However instead of desired characters I get this result:
{:grun "ΓÖá", :herz "ΓÖÑ", :kreuz "ΓÖú", :schell "ΓÖª"}
Similarly, when I redefine color-str to:
(def color-str "Maps card colors to strings"
{ :kreuz \u2663, :grun \u2660, :herz \u2665, :schell \u2666 })
I get:
{:grun \ΓÖá, :herz \ΓÖÑ, :kreuz \ΓÖú, :schell \ΓÖª}
File is saved as UTF-8 without BOM. I already tried adding:
:javac-options ["-encoding utf8"]
:jvm-opts ["-Dfile.encoding=UTF-8"]
to the project.clj file but it didn't help. I know that console (Cygwin's Bash) is able to show those characters - when I copy-pasted { :kreuz "♣", :grun "♠", :herz "♥", :schell "♦" } directly into REPL it displayed them correctly.
What did I miss?
My solution was to run:
cmd /c chcp 65001
It sets default code page in console to UTF-8. By default CMD uses code page default to current language and localization settings e.g. 437 for United States. Settings it to UTF-8 (65001) solves the issue.
I got the idea from 2nd answer in this question after #Jesper suggested that it's related to the console encoding.
The character encoding of the console is set to something different than what Clojure thinks it is set to.

what is the encoding of the subprocess module output in Python 2.7?

I'm trying to retrieve the content of a zipped archive with python2.7 on 64bit windows vista. I tried by making a system call to 7zip (my favourite archive manager) using the subprocess module:
# -*- coding: utf-8 -*-
import sys, os, subprocess
Extractor = r'C:\Program Files\7-Zip\7z.exe'
ArchiveName = r'C:\temp\bla.zip'
output = subprocess.Popen([Extractor,'l','-slt',ArchiveName],stdout=subprocess.PIPE).stdout.read()
This works fine as long as the archive content contains only ascii filenames, but when I try it with non-ascii I get an encoded output string variable where ä, ë, ö, ü have been replaced by \x84, \x89, \x94, \x81 (etcetera). I've tried all kinds of decode/encode calls but I'm just too inexperienced with python (and generally too stupid) to reproduce the original characters with umlaut (which is required if I would like to follow-up this step with e.g. an extraction subprocess call to 7z).
Simply put my question is: How do I get this to work also for archives with non-ascii content?
... or to put it in a more convoluted way: Is the output of subprocess always of a fixed encoding or not?
In the former case -> Which encoding is it?
In the latter case -> How can I control or uncover the encoding of the output of subprocess? Inspired by similar questions on this blog I've tried adding
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
and I've also tried
my_env = os.environ
my_env['PYTHONIOENCODING'] = 'utf-8'
output = subprocess.Popen([Extractor,'l','-slt',ArchiveName],stdout=subprocess.PIPE,env=my_env).stdout.read()
but neither seems to alter the encoding of the output variable (or to reproduce the umlaut).
You can try using the -sccUTF-8 switch from 7zip to force the output in utf-8.
Here is ref page: http://en.helpdoc-online.com/7-zip_9.20/source/cmdline/switches/scc.htm

File.exist? not working when directory name has special characters

File.exist? in not working with directory name having special characters. for something like given below
path = "/home/cis/Desktop/'El%20POP%20que%20llevas%20dentro%20Vol.%202'/*.mp3"
it works fine but if it has letters like ñ its returns false.
Plz help with this.
Try the following:
Make sure you're running 1.9.2 or greater and put # encoding: UTF-8 at the top of your file (which must be in UTF-8 and your editor must support it).
If you're running MRI(i.e. not JRuby or other implementation) you can add environment variable RUBYOPT=-Ku instead of # encoding: UTF-8 to the top of each file.

Calling system from a post in sinatra

I made a very small app for the raspberry pi, that uses Sinatra:
https://github.com/khebbie/SpeakPi
The app lets the user input some text in a textarea and asks Google to create an mp3 file for it.
In there I have a shell script called speech2.sh which calls Google and plays the mp3 file:
#!/bin/bash
say() {
wget -q -U Mozilla -O out.mp3 "http://translate.google.com/translate_tts?tl=da&q=$*";
local IFS=+;omxplayer out.mp3;
}
say $*
When I call speech.sh from the commandline like so:
./speech2.sh %C3%A6sel
It pronounces %C3%A6 like the danish letter 'æ', which is correct!
I call speech2.sh from a Sinatra route like so:
post '/say' do
message = params[:body]
system('/home/pi/speech2.sh '+ message)
haml :index
end
And when I do so Google pronounces some very weird chars like 'a broken pipe...' which is wrong!
All chars a-z are pronounced correctly
I have tried some URL encoding and decoding, nothing worked.
I tried outputting the message to the command-line and it was exactly "%C3%A6" that certainly did not make sense.
Do you have any idea what I am doing wrong?
EDIT
To Sum it up and simplify - if I type like so in bash:
./speech2.sh %C3%A6sel
It works
If I start an irb session and type:
system('/home/pi/speech2.sh', '%C3%A6sel')
It does not work!
Since it is handling UTF-8, make sure that the encoding remains right the way through the process by adding the # encoding: UTF-8 magic comment at the top of the Ruby script and passing the ie=UTF-8 parameter in the query string when calling Google Translate.

Resources