How do I get accented letters to actually work on bash? - bash

My bash installation on cygwin doesn't handle accented letters properly. I tried adding
set input-meta on # to accept 8-bit characters
set output-meta on # to show 8-bit characters
set convert-meta on # to show it as character, not the octal representation
to my input rc, but this doesn't quite work yet. Indeed, if I type
$ echo ù
then before i press enter it is automatically changed to
$ echo \303
although the output is right, for I get
$ echo \303
ù
I get the same result for anyother accented letter. Usually though I use a non-italian keyboard, and I use autohotkey to substitute letters with an apostrophe after them with an accented letter. When this is the case, accented letters get substituted with a \302, and they print garbage depending on the letter: prints a 3y for a ù, a ¢ for an ò, and nothing for everething else.
How do I get all this to make sense?
EDIT: my locale settings, cygwin version and terminal are the following
$ uname -a
CYGWIN_NT-6.1-WOW64 ferdi-Asus 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin
$ locale
LANG=it_IT.UTF-8
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_ALL=
$ tty
/dev/pty1
I'm invoking it simply clicking the Cygwin terminal link. It redirects to
C:\cygwin\bin\mintty.exe -i /Cygwin-Terminal.ico -
The relevant part of the autohotkey script is the following
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
...
::avra'::avrà
::avro'::avrò
...

To get accented letters on bash via Cygwin using Mintty 1.1.2 just do the following:
Go to the menu (if you don't see any menu, right click on your Terminal).
Click Options....
Click Text.
Change the Locale to C.
Change the Character set to ISO-8859-1 (Western European).
Then test it:

Related

Why '$p' appears at the first line of vim in iterm?

I'm using Mac OS 10.13.4.
iTerm2 Version is Build 2.1.1
Vim Version is
VIM - Vi IMproved 8.0 (2016 Sep 12, compiled Nov 29 2017 18:37:46)
Included patches: 1-503, 505-680, 682-1283
Compiled by root#apple.com
And I Installed the Vundle plugin
And when I open vim in iTerm2 with oh-my-zsh, the $p appears at the first line.
It doesn't appear with Terminal.app of Apple?
How to solve this problem?
vim is assuming that iTerm2 is xterm (not a good assumption), and attempting to determine the state of the cursorBlink resource by sending an escape sequence containing $p which iTerm2 does not handle properly (see source-code for vim for the escape sequence, and also where it uses the feature). While vim starts with the TERM setting (i.e., "xterm"), it does make a few checks to exclude things like gnome-terminal as noted in the source code comment. But iTerm2 happens to fool vim in this case. So the result goes to the screen.
In XTerm Control Sequences that is documented like this:
CSI ? Ps$ p
Request DEC private mode (DECRQM). For VT300 and up, reply is
CSI ? Ps; Pm$ y
where Ps is the mode number as in DECSET/DECSET, Pm is the
mode value as in the ANSI DECRQM.
Two private modes are read-only (i.e., 1 3 and 1 4 ), pro-
vided only for reporting their values using this control
sequence. They correspond to the resources cursorBlink and
cursorBlinkXOR.
Although vim is fooled here, the problem is due to iTerm2, for which you've probably set the TERM environment variable to xterm-256color, or something like that. It doesn't match iTerm2's behavior very well (the function keys don't match up, etc). ncurses provides a better one. But out-of-the-box MacOS has a terminal database that's about ten years old, and lacks that entry. To get a good terminal database (i.e., one where you could set TERM to iterm2), you could do that with MacPorts or home-brew.
Running infocmp to get a measure of the differences between the (correct) iterm2 entry and linux, xterm-256color shows that
it's actually closer to nsterm-256color (a correct entry for Terminal.app which Apple doesn't provide) with 38 lines of difference,
next closest linux with 76 lines and
still further away from xterm-256color with 94 lines.
The feature that's missing or mis-implemented in iTerm2 isn't in the terminal description: it's a special feature that vim has to guess about.
As mentioned above, this $p is due to vim sending a control sequence for a cursor blink request. In your .vimrc, you can turn off the cursor blink request by clearing the t_RC terminal option. I wrapped this in a conditional which turns it off only for 256 color terminals.
if &term =~ '256color'
set t_RC=
endif
Setting the terminal explicitly to linux solved it for me:
export TERM=linux
The p means print, which is the command the range is being given to, and yes.. it displays to you what is in that range.
In vim you can look at :help :range and :help :print to find out more about how this works. These types of ranges are also used by sed and other editors.
They probably used the 1,$ terminology in the tutorial to be explicit, but note that you can also use % as its equivalent. Thus, %p will also print all the lines in the file.

How do I get out of ESC mode in zsh?

When using zsh, I sometimes accidentally press Escape out of habit, expecting it to clear the entire line as it does in Windows. Instead, it goes into a mode that I'm not sure how to get out of. The cursor goes back one character, and some keys perform some special commands, but all I really want to do is get out of this mode and be able to press Ctrl+U to clear the line.
Searching around has been tough - I get results for escaping characters.
Short answer: press a.
Medium answer: press a, then enter bindkey -e.
Long answer: Like a lot of UNIX shells, zsh has an emacs-like mode and a vi-like mode. You're in vi-like mode, and ESC takes you out of the vi-like insert mode. a puts you back into insert mode, with the cursor after the current character. (Sorry for the two different uses of "mode," but it is the accepted terminology in both cases.)
bindkey -e overrides the settings from the rc files and puts zsh into emacs mode, which only has one mode (i.e., no "ESC mode"), so this won't bother you any more. Unfortunately, it won't carry over to your next shell invocation. bindkey -v would switch from emacs mode back to vi mode.
In the absence of any other configuration, zsh defaults to emacs mode, so unless there's something in one of the rc files, the likely culprit is that the EDITOR variable is some form of vi, which causes zsh to default to vi mode. If you don't like vi mode, then you should probably hunt down what part of the system-wide or user-specific configuration is causing zsh to default to vi mode and turn it off by removing it or overriding it in one of those rc files.
If everything else fails, when you're in ESC mode, type :w then Enter to save, and then :q to exit. You can also type :wq and Enter

How can I clear / refresh my terminal after printing special characters?

Sometimes, I print data with special characters. After that my terminal looks like this:
As you can see, it is useless to clear the terminal.
Is there any way to get back to normal after special characters were printed except for closing the terminal and opening it again?
When you see such a mess on the screen reset command is your friend. Basically it will reset all special characters to their default values and re-initializes your terminal. Most probably you have this command in the system under /usr/bin directory as a link to tset.
If for any reasons reset is not present then you can run echo -e \\033c where \\033c is a special code, which should be read as ESC c.
You can even clean your terminal from other terminal with a little help of cat command. For example if your problematic terminal resides on /dev/pts/3 then run the following sequence:
$ cat >/dev/pts/3
ESC c, ENTER, Ctrl-D
and /dev/pts/3 should be cleaned up.

Zsh tab completion duplicating command name

I'm on OS X Mountain Lion, running the included ZSH shell (4.3.11) with Oh-My-ZSH installed over the top.
When using tab completion with commands such as homebrew, when ZSH lists the available commands, it is also duplicating the command. For example:
$ brew {tab}
will result in:
$ brew brew
[list of homebrew commands]
I'm unsure what is causing this error, as when I resize the terminal window, the first instance of the command name disappears.
If I hit backspace when the duplicates are displayed, I can only delete the second instance of the command, zsh won't let me backspace any further. Also, if I do remove the duplicate with backspace, zsh then acts as if there is no command typed at all.
My .zshrc along with all my other .configuration files can be found at https://github.com/daviesjamie/dotfiles
UPDATE: I found this post about someone having the same problem on Ubuntu. However, I don't understand the given solution, and I'm not even sure if it applies to my set up?
This effect also could be reproduced if you use any of fancy UTF-8 characters like arrow, "git branch" character and so on.
Just remove this chars from prompt and duplication will not occur.
Also adding
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
to ~/.profile can help
The problem is likely to arise from misplaced %{ %} brackets that tell zsh that text inside has zero width. The only things that should be enclosed in them are escape sequences that change color or boldness of the text. If you are using new zsh (>=4.3.{unknown version}) I would even suggest to use %F{color}...%f, %K{color}...%k, %B...%b instead of %{${fg[green]}%} or what you have there.
The problem with them is that there is no way to query the terminal with a question like “Hey, I outputted some text. Where is the cursor now?” and zsh has to compute the length of its prompt by itself. When you type some text and ask zsh to complete zsh will say terminal to move cursor to specific location and type completed cmdline there. With misplaced %{%} brackets this specific location is wrong.
If you use iTerm on Mac, be sure to check "Set locale variables automatically" in your profile preferences. I had it unchecked for an SSH connection and it resulted in the same bug and I fixed it by leaving that option checked.
It's an old thread but I faced similar issue in my zsh setup with oh-my-zsh configuration.
Setting export LC_ALL=en_US.UTF-8 fixed the issue.
A lot of answers in a lot of places suggest the export LC_ALL=en_US.UTF-8 solution. This, however did not work for me. I continued to have this issue using oh-my-zsh on both Arch linux and PopOS.
The only solution that worked for me was this suggestion by romkatv on an issue on the oh-my-zsh github repository.
It turns out, at least in my case, that the autocomplete duplication issue would only show up if there was a non-ASCII character somewhere on the line (like an emoji). And ZSH would incorrectly assume that this non-ASCII character needs to take up 2 character spaces instead of 1.
So the solution that worked was to open up the .zsh-theme file of whatever theme you're using, find all non-ASCII characters and use %{%G%} to tell ZSH to only use one character width for that character
For example, the default oh-my-zsh theme robbyrussel contains 2 non-ASCII characters. The '➜' character in the prompt
PROMPT="%(?:%{$fg_bold[green]%}➜ :%{$fg_bold[red]%}➜ )"
and the '✗' character in the prompt for git directories
ZSH_THEME_GIT_PROMPT_DIRTY="%{$fg[blue]%}) %{$fg[yellow]%}✗"
Using %{%G<character>%} around the 2 non-ASCII characters like this
PROMPT="%(?:%{$fg_bold[green]%}%{%G➜%} :%{$fg_bold[red]%}%{%G➜%} )"
and this
ZSH_THEME_GIT_PROMPT_DIRTY="%{$fg[blue]%}) %{$fg[yellow]%}%{%G✗%}"
is what finally fixed the issue for me.
So all you need to do is make a copy of the theme file you want to use and edit all the non-ASCII characters as shown above and you should hopefully never see the duplication issue again.
My solution to make both local and ssh work is something like a combination of #Marc's and #neotohin's answers:
Set export LANG=en_US.UTF-8 (simply uncomment that part in the template .zshrc; exporting LC_ALL, as in #neotohin's answer, instead of LANG may also work, I didn't try)
Uncheck "Set locale environment variables on startup" in the Terminal profile's "Advanced" section (reason: that setting sets LC_CTYPE=UTF-8 instead of en_US.UTF-8, which brakes the locale for me in ssh)

Unicode (utf-8) with git-bash

I'm having some trouble getting unicode to work for git-bash (on windows 7). I have tried many things without success. Although, I'm not quite sure what is responsible to for this so i might be working in the wrong direction.
It really seems this should be possible as the encoding for cmd.exe can be changed to unicode with 'chcp 65001'.
Here are some things I've tried (besides the obvious of looking through the configuration options in the GUI).
Setting environment variables in '.bashrc'. I guess it makes sense this doesn't work since i think it's a linux thing. The 'locale' command does not exist.
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
Starting out in cmd.exe, changing the encoding to unicode with 'chcp 65001' and then starting up git-bash. This causes me to get a permission denied when trying to cat my unicode test file. However, catting a file without unicode works just fine. As demonstrated, dropping back out to cmd.exe i can still "cat" the file. Using my default encoding (437) i can cat the file in bash (no permission denied but the output is fudged).
S:\>chcp 65001
Active code page: 65001
S:\>"C:\Program Files (x86)\Git\bin\sh.exe" --login -i
zarac#TOWELIE /z
cat /s/unicode.txt
cat: write error: Permission denied
zarac#TOWELIE /z
cat /s/nounicode.txt
abc
zarac#TOWELIE /z
L /s/unicode.txt
-rw-r--r-- 1 zarac Administ 7 May 18 10:30 /s/unicode.txt
zarac#TOWELIE /z
whoami
towelie\zarac
zarac#TOWELIE /z
exit
Z:\>type S:\unicode.txt
abc£
Using the /U flag when starting the shell (makes sense that it doesn't work because it's not quite what it's for if-i-understand-correctly, but it has to do with unicode so i tried it).
C:\Windows\SysWOW64\cmd.exe /U /C "C:\Program Files (x86)\Git\bin\sh.exe" --login -i
As I prefer to use Console2, I've tried adding a dword value named CodePage with the value 65001 (decimal) to the windows registry under [HKEY_CURRENT_USER\Console] as well as [HKEY_CURRENT_USER\Console\Git Bash]. This seems to have the same effect as setting 'chcp 65001' accept that it's "automatic". (http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters)
JPSoft's TCC/LE
PowerCMD
stackoverflow
duckduckgo
ixquick / google
So, method 2 seems viable if that permission issue can be fixed. However, I'm open to pretty much any solution although i prefer if i can use Console2 (due mostly to it's nifty tab feature). Perhaps one solution would be to setup an SSH server and then use Putty/Kitty to connect to it, but that's just wrong! ; )
PS. Is there any official documentation for git-bash?
I faced the same issue in MSYS Git 2.8.0 and as it turned out it just needed changing the configuration.
$ git --version
git version 2.8.0.windows.1
The default configuration of Git Bash console in my system did not show Greek filenames.
$cd ~
$ls
AppData/
'Application Data'#
Contacts/
Cookies#
Desktop/
Documents/
Downloads/
Favorites/
Links/
'Local Settings'#
NTUSER.DAT
.
.
.
''$'\316\244\316\261'' '$'\316\255\316\263\316\263\317\201\316\261\317\206\316\254'' '$'\316\274\316\277\317\205'#
The last line should display "Τα έγγραφά μου", the greek translation of "My Documents". In order to fix it I followed the below steps:
Check your existing locale configuration
$locale
LANG=en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
As shown above, in my case it was not UTF-8
Change the locale to a UTF-8 encoding. Click the icon on the left side of MINGW title bar, select "Options" and in the "Text" category choose "UTF-8" Character set. You should also choose a unicode font, such as the default "Lucida Console". My configuration looks as following:
Change the language for the current window (no need to do this on future windows, as they will be created with the settings of step 2)
$ LANG='C.UTF-8'
The ls command should now display properly
AppData/
'Application Data'#
Contacts/
Cookies#
Desktop/
Documents/
Downloads/
Favorites/
Links/
'Local Settings'#
NTUSER.DAT
.
.
.
'Τα έγγραφά μου'#
Found this answer elsewhere:
chcp.com 65001
Git bash chcp windows7 encoding issue
That's what actually solved it for me.
As CharlesB said in a comment, msysgit 1.7.10 handles unicode correctly. There are still a few issues but I can confirm that updating did solve the issue I was having.
See: https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support
Check if the issue persists with Git 2.1 (August 2014).
See commit 617ce96 or commit 1c950a5 by Karsten Blees (kblees)
Win32: support Unicode console output
WriteConsoleW seems to be the only way to reliably print unicode to the
console (without weird code page conversions).
Also redirects vfprintf to the winansi.c version.
Win32: add Unicode conversion functions
Add Unicode conversion functions to convert between Windows native UTF-16LE encoding to UTF-8 and back.
To support repositories with legacy-encoded file names, the UTF-8 to UTF-16 conversion function tries to create valid, unique file names even for invalid UTF-8 byte sequences, so that these repositories can be checked out without error.
It is likely to be a port of something already integrated in msysgit, but at least that means the Windows version of Git won't have to diverge/patch from the main Git repo source code in order to include those improvements.
I can see that there are some problems with character encoding with git bash for windows. Less for the work with git itself and the tools it ships with (curl, cat, grep etc.). I didn't run into problems with these over the years character encoding related.
Normally with each new version problems get better resolved. E.g. with the version from a year ago, I couldn't enter characters like "ä" into the shell, so it was not possible to write
echo "ä"
To quickly test if UTF-8 is supported and at which level. A workaround is to write the byte-sequences octal:
$ echo -e "\0303\0244"
ä
Still issues I do have when I execute my windows php.exe binary to output text:
$ php -r 'echo "\xC3\xA4";'
ä
This does not give the the "ä" in the terminal, but it outputs "├ñ" instead. The workaround I have for that is, that I wrap the php command in a bash-script that processes the output through cat:
#!/bin/bash
{ php.exe "$#" 2>&1 1>&3 | cat 1>&2; } 3>&1 | cat
ref. reg. stdout + stderr cat
This magically then makes php working again:
$ php -r 'echo "\xC3\xA4";'
ä
Applies to
$ git --version
git version 1.9.4.msysgit.1
I must admit I miss deeper understanding why this is all the way it is. But I'm finally happy that I found a workaround to use php in git bash with UTF-8 support.
For me the solution was just to enable unicode support.
Docs: https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support
git config --global core.quotepath off
I found the following steps helpful:
Run Git Bash
Right-click and select Options...
Select Text group at the left
Change Font to Consolas
Select C as Locale and UTF-8 as Character set
Apply and Save.
In the terminal execute:
git config --global core.quotepath false
In rare cases, execute in the terminal as well:
export LANG='C.UTF-8'
The problem with chcp 65001 is that there are bugs in the C runtime (MSVCRT) that make stdio calls return inconsistent results when run under code page 65001.
That should be better with Git 2.23 (Q3 2019)
See commit 090d1e8 (03 Jul 2019) by Karsten Blees (kblees).
(Merged by Junio C Hamano -- gitster -- in commit 0328db0, 11 Jul 2019)
gettext: always use UTF-8 on native Windows
On native Windows, Git exclusively uses UTF-8 for console output (both with MinTTY and native Win32 Console).
Gettext uses setlocale() to determine the output encoding for translated text, however, MSVCRT's setlocale() does not support UTF-8.
As a result, translated text is encoded in system encoding (as per GetAPC()), and non-ASCII chars are mangled in console output.
Side note: There is actually a code page for UTF-8: 65001.
In practice, it does not work as expected at least on Windows 7, though, so we cannot use it in Git. Besides, if we overrode the code page, any process spawned from Git would inherit that code page (as opposed to the code page configured for the current user), which would quite possibly break e.g. diff or merge helpers.
So we really cannot override the code page.
In init_gettext_charset(), Git calls gettext's bind_textdomain_codeset() with the character set obtained via locale_charset(); Let's override that latter function to force the encoding to UTF-8 on native Windows.
In Git for Windows' SDK, there is a libcharset.h and therefore we define HAVE_LIBCHARSET_H in the MINGW-specific section in config.mak.uname, therefore we need to add the override before that conditionally-compiled code block.
Rather than simply defining locale_charset() to return the string "UTF-8", though, we are careful not to break LC_ALL=C: the ab/no-kwset patch series, for example, needs to have a way to prevent Git from expecting UTF-8-encoded input.
And:
See commit 697bdd2 (04 Jul 2019), and commit 9423885, commit 39a98e9 (27 Jun 2019) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 0a2ff7c, 11 Jul 2019)
mingw: use Unicode functions explicitly
Many Win32 API functions actually exist in two variants: one with the A suffix that takes ANSI parameters (char * or const char *) and one with the W suffix that takes Unicode parameters (wchar_t * or const wchar_t *).
The ANSI variant assumes that the strings are encoded according to whatever is the current locale.
This is not what Git wants to use on Windows: we assume that char * variables point to strings encoded in UTF-8.
There is a pseudo UTF-8 locale on Windows, but it does not work as one might expect. In addition, if we overrode the user's locale, that would modify the behavior of programs spawned by Git (such as editors, difftools, etc), therefore we cannot use that pseudo locale.
Further, it is actually highly encouraged to use the Unicode versions
instead of the ANSI versions, so let's do precisely that.
Note: when calling the Win32 API functions without any suffix, it depends whether the UNICODE constant is defined before the relevant headers are #include'd.
Without that constant, the ANSI variants are used.
Let's be explicit and avoid that ambiguity.

Resources