UTF-8 characters not displayed correctly in console - utf-8

I'm implementing a card game in Clojure and I want to use unicode characters to represent suits:
(def color-str "Maps card colors to strings"
{ :kreuz "♣", :grun "♠", :herz "♥", :schell "♦" })
However instead of desired characters I get this result:
{:grun "ΓÖá", :herz "ΓÖÑ", :kreuz "ΓÖú", :schell "ΓÖª"}
Similarly, when I redefine color-str to:
(def color-str "Maps card colors to strings"
{ :kreuz \u2663, :grun \u2660, :herz \u2665, :schell \u2666 })
I get:
{:grun \ΓÖá, :herz \ΓÖÑ, :kreuz \ΓÖú, :schell \ΓÖª}
File is saved as UTF-8 without BOM. I already tried adding:
:javac-options ["-encoding utf8"]
:jvm-opts ["-Dfile.encoding=UTF-8"]
to the project.clj file but it didn't help. I know that console (Cygwin's Bash) is able to show those characters - when I copy-pasted { :kreuz "♣", :grun "♠", :herz "♥", :schell "♦" } directly into REPL it displayed them correctly.
What did I miss?

My solution was to run:
cmd /c chcp 65001
It sets default code page in console to UTF-8. By default CMD uses code page default to current language and localization settings e.g. 437 for United States. Settings it to UTF-8 (65001) solves the issue.
I got the idea from 2nd answer in this question after #Jesper suggested that it's related to the console encoding.

The character encoding of the console is set to something different than what Clojure thinks it is set to.

Related

Cmder wrong colors using Windows Terminal

I am trying to use Cmder in Windows Terminal. I tried following this guide, and I did everything as it says.
However, there is a small issue. No matter what I do, the prompt background colour does not change, it stays black.
I couldn't figure out the issue. Any suggestions?
In the comments section of the same article
I ran into this issue as well and was able to get it working. In your
"%cmder_root%\config" directory, create a file called "my_prompt.lua"
and add the following to it:
function my_prompt_filter()
cwd = clink.get_cwd()
prompt = "\x1b[1;32;49m{cwd} {git}{hg}{svn} \n\x1b[1;39;49m{lamb} \x1b[0m"
new_value = string.gsub(prompt, "{cwd}", cwd)
clink.prompt.value = string.gsub(new_value, "{lamb}", "λ")
end
clink.prompt.register_filter(my_prompt_filter, 1)
Kudos to Eric Grandt
#AMagyar 's answer is great, except that if you use conda, pyenv or other virtual environments, that information would be omitted (when it should be (base) λ, you will get λ). Instead, you can create my_prompt.lua as something like:
function my_prompt_filter()
local prompt = clink.prompt.value
prompt = string.gsub(prompt, '^\x1b%[1;32;40m', '\x1b[1;32;49m')
prompt = string.gsub(prompt, '\n\x1b%[1;39;40m', '\n\x1b[1;39;49m')
clink.prompt.value = prompt
end
clink.prompt.register_filter(my_prompt_filter, 1)
Everything is settled.
What controls the terminal text & background color?
In function set_prompt_filter in %cmder_root%\vender\clink.lua, you may read lines like:
local cmder_prompt = "\x1b[1;32;40m{cwd} {git}{hg}{svn} \n\x1b[1;39;40m{lamb} \x1b[0m"
This is the prototype of cmder prompts. The {cwd}, {git}, {lamb}, etc, are to be substituted with the actual content later on. The \x1b[1;32;40m is the ANSI escape sequence that controls the color of following text. 32 means green text color, 40 means black background color, 39 means default text color, and 49 means default background color.
Why was pyenv/conda environment omitted?
Also in function set_prompt_filter in %cmder_root%\vender\clink.lua, you may find how cmder added the information of virtual environments into {lamb} (or more specifically, prompts with () or []). So either you have to retrieve that information from the original prompt, or simply just replace the color codes, as in this answer.

Encoding of bash terminal (representation of Cyrillic letters)

I have been using git-bash terminal, but, as I suppose, after editing font I have got a trouble - Cyrillic characters are not represented in the terminal. Instead of them it shows this: \xd0\x9f\xd1\x80\xd0\xb8\xd0\xbd\xd1\x8f\xd1\x82\xd0\xbe!.
I got this string using API:
import requests
import os
api_key = os.getenv('yandex_praktikum_api_key')
HEADERS = {
'Authorization': f'OAuth {api_key}'
}
response = requests.get(
'https://praktikum.yandex.ru/api/user_api/homework_statuses/',
params={'from_date': 0},
headers=HEADERS
)
print(response.text)
I tried to fix this by changing font back to Lucida Console (14px), but it didn't work. I checked my terminal encoding typing echo $LC_CTYPE and got ru_RU.UTF-8. Then after typing $LANG I got empty string. So, how can I fix it?
OS: Windows 10. File encoding is UTF-8. Terminal type is x-term.
Locale shows this:
User#DESKTOP-CVQ282P MINGW64 ~/Desktop
$ locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

Reading a JSON file in Ruby prepending unknown characters

I have a simple JSON file like this:
{
"env": "Development",
"app_host": "https://localhost:3455",
"server_host": "localhost",
"server_port": "3455"
}
When I read this file using the below code, the output contains some unknown characters in the beginning.
contents = IO.read('config.json')
puts contents
output:
{
"env": "Development",
"app_host": "https://localhost:3455",
"server_host": "localhost",
"server_port": "3455"
}
Can someone let me know how to fix this?
These characters are the bytes of a UTF-8 byte order mark (BOM), being displayed as code page 437 characters.
From your comment, it seems Visual Studio is inserting a BOM into the files. When you then read the file in and try to display it in your console it is displaying as ∩╗┐, since your console’s encoding is set to CP437, and the three bytes that make up the BOM in UTF-8 (0xEF,0xBB,0xBF) correspond to those characters in that encoding.
You should probably look into changing the encoding your console is using, as well as seeing if you can configure VS not to add the BOM (I’m not on Windows so I don’t know how you would do either of those).
From the Ruby side, you could specify the encoding in your call to IO.read like this:
IO.read('config.json', :encoding => 'bom|utf-8')
This will strip the BOM when reading the file.

Copy yaml formatting (indent) from one file to another

A translator completely messed up a yaml file by copying everything into word (don't ask).
I have already cleaned up the file using regexes, but the indent (spacing) is now missing; everything starts at the first character:
es:
default_blocks:
thank_you_html: "thank you text"
instead of
en:
default_blocks:
thank_you_html: "thank you text"
Do you have a good idea on how to automatically copy the format/structure/indent from the correct file (say en.yml) to the corrupt one (say es.yml)? (I'm using textmate 2.0 as editor)
Thanks!
Assuming the original and the translation contain exactly the same strings per line (except for the indentation problem), a quick&dirty script scanning the leading whitespace may solve this:
#!/usr/bin/env ruby
# encoding: UTF-8
indented = File.readlines(ARGV[0]).map do |l|
l.scan(/^\s+/)[0]
end.zip(File.readlines(ARGV[1])).map { |e| e.join }.join
File.open(ARGV[1], "w") { |io| io.write(indented) }
Save it, make it executable and call
./script_name.rb en.yml es.yml
Wouldn't mess with Textmate if this is not a regular task, but you could easily transform this to a command and either prompt for the two files via a dialog or select both in the file browser, open one of them in the current tab and differentiate them via environment variables ($TM_FILEPATH, $TM_SELECTED_FILES)

Run scala script save in utf-8 gets error

I am new to scala and I tried some small programs in book "Programming in Scala", when the scala script is saved in ANSI, it works well. But when I saved it in UTF-8, a error was thrown up as "error: illegal character ?import". I run this small example program on windows. And the example program is like
import scala.io.Source
if(args.isEmpty){
}else{
Source.fromFile(args(0)).getLines.toList.zipWithIndex.foreach { case (line, i) => println(i + " "+line)}
}
what's going on there?
I guess you saved your file with BOM.
If you save your source code without BOM (How to do it depends on which text editor you are using), it will works fine.

Resources