How to create a file name with UTF-8 characters in Cygwin [closed] - bash

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
Using a shell script run under Cygwin, I want to create a file name which contains Danish characters (Ø, Æ, and Å). I have a bash script which basically does this: echo "some data" > "file name with Danish letters.txt". After running such script, all Danish letters look like a dot in the file name. I have tested this using Cygwin 32 under Windows 7 and Cygwin 64 under Windows 10. The locale command produces the following:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
Running:
echo "rødgrød" | od -ctx1
produces this:
0000000 r 303 270 d g r 303 270 d \n
72 c3 b8 64 67 72 c3 b8 64 0a
0000012
Here is an example of how the bash script looks like:
echo "some data" > "Peter Sørensen.txt"
The letter ø looks like a dot when I look at the created file name in Windows. Here is a screenshot of the file name:
In cygwin, running the ls command results in this message:
ls: cannot compare file names ‘Peter S\370rensen.txt’ and ‘test.sh’: Invalid or incomplete multibyte or wide character
And here is a screenshot of how this file name looks like in cygwin terminal after running the ls command:

Related

Adding Line Breaks in QR Code / 2D Barcode

In Windows 10, I am using qrencode for printing QR codes to image files. The printing is successful but I am stuck at adding line breaks. I have tried the below method with no success in line breaks.
Windows Command Prompt:
d:\ qrencode -o qrcode.png "INDO GERMAN ALKALOIDS \nUnique ID: ABC-123456789 \nAPI-Name: ABCDEFGH \nBrand: Indo-101 \nAddress: Inga House, Mahakali Road, Andheri-East, Mumbai-400093, \nTel-022-28202932/33, \nMobile: 9833942075, \nBatch No.: XYZ888999000, \nBatch Size: 1020, \nMfgd.Date: 29-12-2022, \nExpiry Date: 31-12-2023, \nContainer Code: RRR-101020, \nMfgr Lic.No.: ------------, \nStorage Instruction: Store in cool area 20deg"
After playing around for some time, I came across the below command in the Ubuntu Manuals:
cat bigfile.txt | qrencode -S -v 40 -l L -o output.png
I placed the required content as below in a text file named qr-data.txt
INDO GERMAN ALKALOIDS
Unique ID: ABC-123456789
API-Name: ABCDEFGH
Brand: Indo-101
Then at the DOS prompt I typed:
type qr-data.txt | qrencode -o qr-code.png
It now works perfect for me.
Note that I used [type] instead of [cat] in MS DOS.

How to read a file in utf8 encoding and output in Windows 10?

What is proper procedure to read and output utf8 encoded data in Windows 10?
My attempt to read utf8 encoded file in Windows 10 and output lines into terminal does not reproduce symbols of some languages.
OS: Windows 10
Native codepage: 437
Switched codepage: 65001
In cmd window issued command chcp 65001. Following ruby code reads utf8 encoded file and outputs lines with puts.
fname = 'hello_world.dat'
File.open(fname,'r:UTF-8') do |f|
puts f.read
end
hello_world.dat content
Afrikaans: Hello Wêreld!
Albanian: Përshendetje Botë!
Amharic: ሰላም ልዑል!
Arabic: مرحبا بالعالم!
Armenian: Բարեւ աշխարհ!
Basque: Kaixo Mundua!
Belarussian: Прывітанне Сусвет!
Bengali: ওহে বিশ্ব!
Bulgarian: Здравей свят!
Catalan: Hola món!
Chichewa: Moni Dziko Lapansi!
Chinese: 你好世界!
Croatian: Pozdrav svijete!
Czech: Ahoj světe!
Danish: Hej Verden!
Dutch: Hallo Wereld!
English: Hello World!
Estonian: Tere maailm!
Finnish: Hei maailma!
French: Bonjour monde!
Frisian: Hallo wrâld!
Georgian: გამარჯობა მსოფლიო!
German: Hallo Welt!
Greek: Γειά σου Κόσμε!
Hausa: Sannu Duniya!
Hebrew: שלום עולם!
Hindi: नमस्ते दुनिया!
Hungarian: Helló Világ!
Icelandic: Halló heimur!
Igbo: Ndewo Ụwa!
Indonesian: Halo Dunia!
Italian: Ciao mondo!
Japanese: こんにちは世界!
Kazakh: Сәлем Әлем!
Khmer: សួស្តី​ពិភពលោក!
Kyrgyz: Салам дүйнө!
Lao: ສະ​ບາຍ​ດີ​ຊາວ​ໂລກ!
Latvian: Sveika pasaule!
Lithuanian: Labas pasauli!
Luxemburgish: Moien Welt!
Macedonian: Здраво свету!
Malay: Hai dunia!
Malayalam: ഹലോ വേൾഡ്!
Mongolian: Сайн уу дэлхий!
Myanmar: မင်္ဂလာပါကမ္ဘာလောက!
Nepali: नमस्कार संसार!
Norwegian: Hei Verden!
Pashto: سلام نړی!
Persian: سلام دنیا!
Polish: Witaj świecie!
Portuguese: Olá Mundo!
Punjabi: ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ ਦੁਨਿਆ!
Romanian: Salut Lume!
Russian: Привет мир!
Scots Gaelic: Hàlo a Shaoghail!
Serbian: Здраво Свете!
Sesotho: Lefatše Lumela!
Sinhala: හෙලෝ වර්ල්ඩ්!
Slovenian: Pozdravljen svet!
Spanish: ¡Hola Mundo!
Sundanese: Halo Dunya!
Swahili: Salamu Dunia!
Swedish: Hej världen!
Tajik: Салом Ҷаҳон!
Thai: สวัสดีชาวโลก!
Turkish: Selam Dünya!
Ukrainian: Привіт Світ!
Uzbek: Salom Dunyo!
Vietnamese: Chào thế giới!
Welsh: Helo Byd!
Xhosa: Molo Lizwe!
Yiddish: העלא וועלט!
Yoruba: Mo ki O Ile Aiye!
Zulu: Sawubona Mhlaba!
Steven Penny suggested to use PowerShell and do not change code page. Following picture demonstrates that the issue persists.
Windows Terminal installer (which is not a part of Windows distribution) solves utf8 output issue, please see included screen capture.
The problem is, you are using a some methods and tools that are really old. First:
Native codepage: 437
Switched codepage: 65001
You don't need to mess with the codepage any more, just leave it as the default. Also, from you picture I see you are also using Console Host, which is also really old. Windows Terminal [1] has been available since 2019, and has built in UTF-8 support. Using Windows Terminal, I can run your script, even without specifying UTF-8:
fname = 'hello_world.dat'
File.open(fname,'r') do |f|
puts f.read
end
and I get perfect result:
To use Windows Terminal, download the msixbundle file [2], then install it. Or, as it's essentially just a Zip file, you can rename it to file.zip and extract it with Windows, then run WindowsTerminal.exe. Or, since you are really having trouble with this process, you can use a portable version I just created
[3] (at your own risk).
https://github.com/microsoft/terminal
https://github.com/microsoft/terminal/releases/tag/v1.8.1444.0
https://github.com/microsoft/terminal/files/6563899/CascadiaPackage_1.8.1444.0_x64.zip

opening a file with an accented character in its name, in Python 2 on Windows

In a directory in Windows I have 2 files, both of them with an accented character in its name: t1û.fn and t2ű.fn; The dir command in the Command Prompt shows both correctly:
S:\p>dir t*.fn
Volume in drive S is q
Volume Serial Number is 05A0-8823
Directory of S:\p
2017-09-03 14:54 4 t1û.fn
2017-09-03 14:54 4 t2ű.fn
2 File(s) 8 bytes
0 Dir(s) 19,110,621,184 bytes free
Screenshot:
However, Python can't see both files:
S:\p>python -c "import os; print [(fn, os.path.isfile(fn)) for fn in os.listdir('.') if fn.endswith('.fn')]"
[('t1\xfb.fn', True), ('t2u.fn', False)]
It looks like Python 2 uses a single-byte API for filenames, thus the accented character in t1û.fn is mapped to the single byte \xfb, and the accented character in t2ű.fn is mapped to the unaccented ASCII single byte u.
How is it possible to use a multi-byte API for filenames on Windows in Python 2? I want to open both files in the console version of Python 2 on Windows.
Use a unicode string:
f1 = open(u"t1\u00fb.fn") # t1û.fn
f2 = open(u"t2\u0171.fn") # t2ű.fn

os x screen command,'.screenrc', termcap

I need help in the conceptual area surrounding:
/usr/bin/screen,
~/.screenrc,
termcap
My Goal: is to create a 'correctly' formatted log file via 'screen'.
Symptom: The log file contains hundreds of carriage-return bytes [i.e. (\015) or (\r) ]. I would like to replace every carriage-return byte with a linefeed byte [i.e. (\012) or (\n)].
My Approach: I have created the file: ~/.screenrc and added a 'termcap' line to it with the hope of intercepting the inbound bytes and translating the carriage-return bytes into linefeed bytes BEFORE they are written to the log file. I cycled through nine different syntactical forms of my request. None had the desired effect (see below for all nine forms).
My Questions:
Can my goal be accomplished with my approach?
If yes, what changes do I need to make to achieve my goal?
If no, what alternative should I implement?
Do I need to mix in the 'stty' command?
If yes, how?
Note: I can create a 'correctly' formatted file using the log file as input to 'tr':
$ /usr/bin/tr '\015' '\012' <screenlog.0 | head
<5 BAUD ADDRESS: FF>
<WAITING FOR 5 BAUD INIT>
<5 BAUD ADDRESS: 33>
<5 BAUD INIT: OK>
Rx: C233F1 01 00 # 254742 ms
Tx: 86F110 41 00 BE 1B 30 13 # 254753 ms
Tx: 86F118 41 00 88 18 00 10 # 254792 ms
Tx: 86F128 41 00 80 08 00 10 # 254831 ms
Rx: C133F0 3E # 255897 ms
Tx: 81F010 7E # 255903 ms
$
The 'screen' log file ( ~/screenlog.0 ) is created using the following command:
$ screen -L /dev/tty.usbserial-000014FA 115200
where:
$ ls -dl /dev/*usb*
crw-rw-rw- 1 root wheel 17, 25 Jul 21 19:50 /dev/cu.usbserial-000014FA
crw-rw-rw- 1 root wheel 17, 24 Jul 21 19:50 /dev/tty.usbserial-000014FA
$
$
$ ls -dl ~/.screenrc
-rw-r--r-- 1 scottsmith staff 684 Jul 22 12:28 /Users/scottsmith/.screenrc
$ cat ~/.screenrc
#termcap xterm* 'XC=B%,\015\012' # 01 no effect
#termcap xterm* 'XC=B%\E(B,\015\012' # 02 no effect
#termcap xterm* 'XC=B\E(%\E(B,\015\012' # 03 no effect
#terminfo xterm* 'XC=B%,\015\012' # 04 no effect
#terminfo xterm* 'XC=B%\E(B,\015\012' # 05 no effect
#terminfo xterm* 'XC=B\E(%\E(B,\015\012' # 06 no effect
#termcapinfo xterm* 'XC=B%,\015\012' # 07 no effect
#termcapinfo xterm* 'XC=B%\E(B,\015\012' # 08 no effect
termcapinfo xterm* 'XC=B\E(%\E(B,\015\012' # 09 no effect
$
$ echo $TERM
xterm-256color
$ echo $SCREENRC
$ ls -dl /usr/lib/terminfo/?/*
ls: /usr/lib/terminfo/?/*: No such file or directory
$ ls -dl /usr/lib/terminfo/*
ls: /usr/lib/terminfo/*: No such file or directory
$ ls -dl /etc/termcap
ls: /etc/termcap: No such file or directory
$ ls -dl /usr/local/etc/screenrc
ls: /usr/local/etc/screenrc: No such file or directory
$
System:
MacBook Pro (17-inch, Mid 2010)
Processor 2.53 GHz Intel Core i5
Memory 8 GB 1067 MHz DDR3
Graphics NVIDIA GeForce GT 330M 512 MB
OS X Yosemite Version 10.10.4
Screen(1) Mac OS X Manual Page: ( possible relevant content ):
CHARACTER TRANSLATION
Screen has a powerful mechanism to translate characters to arbitrary strings depending on the current font and terminal type. Use this feature if you want to work with a common standard character set (say ISO8851-latin1) even on terminals that scatter the more unusual characters over several national language font pages.
Syntax: XC=<charset-mapping>{,,<charset-mapping>}
<charset-mapping> := <designator><template>{,<mapping>}
<mapping> := <char-to-be-mapped><template-arg>
The things in braces may be repeated any number of times.
A tells screen how to map characters in font ('B': Ascii, 'A': UK, 'K': german, etc.) to strings. Every describes to what string a single character will be translated. A template mechanism is used, as most of the time the codes have a lot in common (for example strings to switch to and from another charset). Each occurrence of '%' in gets substituted with the specified together with the character. If your strings are not similar at all, then use '%' as a template and place the full string in . A quoting mechanism was added to make it possible to use a real '%'. The '\' character quotes the special char- acters '\', '%', and ','.
Here is an example:
termcap hp700 'XC=B\E(K%\E(B,\304[,\326\\,\334]'
This tells screen how to translate ISOlatin1 (charset 'B') upper case umlaut characters on a hp700 terminal that has a german charset. '\304' gets translated to '\E(K[\E(B' and so on. Note that this line gets parsed three times before the internal lookup table is built, therefore a lot of quoting is needed to create a single '\'.
Another extension was added to allow more emulation: If a mapping translates the unquoted '%' char, it will be sent to the terminal whenever screen switches to the corresponding . In this special case the template is assumed to be just '%' because the charset switch sequence and the char- acter mappings normally haven't much in common.
This example shows one use of the extension:
termcap xterm 'XC=K%,%\E(B,[\304,\\\326,]\334'
Here, a part of the german ('K') charset is emulated on an xterm. If screen has to change to the 'K' charset, '\E(B' will be sent to the terminal, i.e. the ASCII charset is used instead. The template is just '%', so the mapping is straightforward: '[' to '\304', '\' to '\326', and ']' to '\334'.
The section on character translation is describing a feature which is unrelated to logging. It is telling screen how to use ISO-2022 control sequences to print special characters on the terminal. In the manual page's example
termcap xterm 'XC=K%,%\E(B,[\304,\\\\\326,]\334'
this tells screen to send escape(B (to pretend it is switching the terminal to character-set "K") when it has to print any of [, \ or ]. Offhand (referring to XTerm Control Sequences) the reasoning in the example seems obscure:
xterm handles character set "K" (German)
character set "B" is US-ASCII
assuming that character set "B" is actually rendered as ISO-8859-1, those three characters are Ä, Ö and Ü (which is a plausible use of German, to print some common umlauts).
Rather than being handled by this feature, screen's logging is expected to record the original characters sent to the terminal — before translation.

Problem with Ant's AnsiColorLogger in Snow Leopard

I have Ant configured to use the AnsiColorLogger. In Mac OS 10.5, everything was fine. Since upgrading to Snow Leopard, the AnsiColorLoggger no longer works. I see the Ant output (uncolorized) for a second then it just disappears. Has anyone else gotten this working in Snow Leopard? Other ANSI colors are working fine in Terminal.app (colored ls output, colors in my prompt).
Also, would this be a better question on SuperUser?
UPDATE: I have sorted out the issue. It has to do with ANT giving escape sequences that while appropriate for a linux xterm, are NOT correctly interpreted by Mac OS X. It is possible to filter the ANT output to convert these sequences and restore colorized output.
The moral of the story is that this wrapper script will achieve colorized output:
# cat /workspace/SDK/bin/ant-wrapper.sh
/usr/bin/ant -logger org.apache.tools.ant.listener.AnsiColorLogger "$#" | perl -pe 's/(?&lt=\e\[)2;//g'
# alias ant='/workspace/SDK/bin/ant-wrapper.sh'
# ant publish
(output has lots of pretty colors; well, maybe not so pretty, more like an easter egg)
Original Post (and debugging steps):
I'm having similar issues with regard to AnsiColorLogger not displaying colors at all. I'm not sure what the author means by "[output appears] for a second then it just disappears". That seems like a strange problem to occur on the Terminal.
My Box:
# uname -a
Darwin Dave-Dopsons-MacBook-Pro.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
This is the ANT Logger we are using:
http://ant.apache.org/manual/listeners.html#AnsiColorLogger
Here's a related forum post (tried the advice given, to no avail): http://ant.1045680.n5.nabble.com/Macosx-and-AnsiColorLogger-td1355310.html
I did "ant | less", and I DO see escape sequences, but still no colors:
Buildfile: /workspace/Words/words_blackberry/build.xml
ESC[2;32m
publish:ESC[m
Still blocked on this, and would love advice if anyone has gotten it to work on OSX
GOT IT!
So here's the output of colorized ls:
# CLICOLOR_FORCE=exfxcxdxbxegedabagacad ls -lGF | less
total 112
-rw-r--r-- 1 ddopson admin 6511 May 29 12:41 build.xml
drwxr-xr-x 6 ddopson admin 204 May 28 23:59 ESC[34meclipse-binESC[mESC[m/
lrwxr-xr-x 1 ddopson admin 35 May 23 21:24 ESC[35mfilesESC[mESC[m# -> ../artwork/output/blackberry/files/
lrwxr-xr-x 1 ddopson admin 36 May 23 21:20 ESC[35mimagesESC[mESC[m# -> ../artwork/output/blackberry/images/
Notice how the escape sequences are subtly different; they don't have the '2;' like ANT did...
So to test this theory:
ant -logger org.apache.tools.ant.listener.AnsiColorLogger publish | sed 's/2;//g'
... and the output is COLORIZED! Victory!
I've take ddopson's knowledge and crammed it into a single line:
ant () { command ant -logger org.apache.tools.ant.listener.AnsiColorLogger "$#" | sed 's/2;//g' ; }
This works by using a Bash Function. Place this in your ~/.profile file and it will do the same thing as ddopson's ant-wrapper.sh, but without needing a second file to make it work. Slightly more elegant and less fragile.

Resources