Unicode (utf-8) with git-bash - windows

I'm having some trouble getting unicode to work for git-bash (on windows 7). I have tried many things without success. Although, I'm not quite sure what is responsible to for this so i might be working in the wrong direction.
It really seems this should be possible as the encoding for cmd.exe can be changed to unicode with 'chcp 65001'.
Here are some things I've tried (besides the obvious of looking through the configuration options in the GUI).
Setting environment variables in '.bashrc'. I guess it makes sense this doesn't work since i think it's a linux thing. The 'locale' command does not exist.
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
Starting out in cmd.exe, changing the encoding to unicode with 'chcp 65001' and then starting up git-bash. This causes me to get a permission denied when trying to cat my unicode test file. However, catting a file without unicode works just fine. As demonstrated, dropping back out to cmd.exe i can still "cat" the file. Using my default encoding (437) i can cat the file in bash (no permission denied but the output is fudged).
S:\>chcp 65001
Active code page: 65001
S:\>"C:\Program Files (x86)\Git\bin\sh.exe" --login -i
zarac#TOWELIE /z
cat /s/unicode.txt
cat: write error: Permission denied
zarac#TOWELIE /z
cat /s/nounicode.txt
abc
zarac#TOWELIE /z
L /s/unicode.txt
-rw-r--r-- 1 zarac Administ 7 May 18 10:30 /s/unicode.txt
zarac#TOWELIE /z
whoami
towelie\zarac
zarac#TOWELIE /z
exit
Z:\>type S:\unicode.txt
abc£
Using the /U flag when starting the shell (makes sense that it doesn't work because it's not quite what it's for if-i-understand-correctly, but it has to do with unicode so i tried it).
C:\Windows\SysWOW64\cmd.exe /U /C "C:\Program Files (x86)\Git\bin\sh.exe" --login -i
As I prefer to use Console2, I've tried adding a dword value named CodePage with the value 65001 (decimal) to the windows registry under [HKEY_CURRENT_USER\Console] as well as [HKEY_CURRENT_USER\Console\Git Bash]. This seems to have the same effect as setting 'chcp 65001' accept that it's "automatic". (http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters)
JPSoft's TCC/LE
PowerCMD
stackoverflow
duckduckgo
ixquick / google
So, method 2 seems viable if that permission issue can be fixed. However, I'm open to pretty much any solution although i prefer if i can use Console2 (due mostly to it's nifty tab feature). Perhaps one solution would be to setup an SSH server and then use Putty/Kitty to connect to it, but that's just wrong! ; )
PS. Is there any official documentation for git-bash?

I faced the same issue in MSYS Git 2.8.0 and as it turned out it just needed changing the configuration.
$ git --version
git version 2.8.0.windows.1
The default configuration of Git Bash console in my system did not show Greek filenames.
$cd ~
$ls
AppData/
'Application Data'#
Contacts/
Cookies#
Desktop/
Documents/
Downloads/
Favorites/
Links/
'Local Settings'#
NTUSER.DAT
.
.
.
''$'\316\244\316\261'' '$'\316\255\316\263\316\263\317\201\316\261\317\206\316\254'' '$'\316\274\316\277\317\205'#
The last line should display "Τα έγγραφά μου", the greek translation of "My Documents". In order to fix it I followed the below steps:
Check your existing locale configuration
$locale
LANG=en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
As shown above, in my case it was not UTF-8
Change the locale to a UTF-8 encoding. Click the icon on the left side of MINGW title bar, select "Options" and in the "Text" category choose "UTF-8" Character set. You should also choose a unicode font, such as the default "Lucida Console". My configuration looks as following:
Change the language for the current window (no need to do this on future windows, as they will be created with the settings of step 2)
$ LANG='C.UTF-8'
The ls command should now display properly
AppData/
'Application Data'#
Contacts/
Cookies#
Desktop/
Documents/
Downloads/
Favorites/
Links/
'Local Settings'#
NTUSER.DAT
.
.
.
'Τα έγγραφά μου'#

Found this answer elsewhere:
chcp.com 65001
Git bash chcp windows7 encoding issue
That's what actually solved it for me.

As CharlesB said in a comment, msysgit 1.7.10 handles unicode correctly. There are still a few issues but I can confirm that updating did solve the issue I was having.
See: https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support

Check if the issue persists with Git 2.1 (August 2014).
See commit 617ce96 or commit 1c950a5 by Karsten Blees (kblees)
Win32: support Unicode console output
WriteConsoleW seems to be the only way to reliably print unicode to the
console (without weird code page conversions).
Also redirects vfprintf to the winansi.c version.
Win32: add Unicode conversion functions
Add Unicode conversion functions to convert between Windows native UTF-16LE encoding to UTF-8 and back.
To support repositories with legacy-encoded file names, the UTF-8 to UTF-16 conversion function tries to create valid, unique file names even for invalid UTF-8 byte sequences, so that these repositories can be checked out without error.
It is likely to be a port of something already integrated in msysgit, but at least that means the Windows version of Git won't have to diverge/patch from the main Git repo source code in order to include those improvements.

I can see that there are some problems with character encoding with git bash for windows. Less for the work with git itself and the tools it ships with (curl, cat, grep etc.). I didn't run into problems with these over the years character encoding related.
Normally with each new version problems get better resolved. E.g. with the version from a year ago, I couldn't enter characters like "ä" into the shell, so it was not possible to write
echo "ä"
To quickly test if UTF-8 is supported and at which level. A workaround is to write the byte-sequences octal:
$ echo -e "\0303\0244"
ä
Still issues I do have when I execute my windows php.exe binary to output text:
$ php -r 'echo "\xC3\xA4";'
ä
This does not give the the "ä" in the terminal, but it outputs "├ñ" instead. The workaround I have for that is, that I wrap the php command in a bash-script that processes the output through cat:
#!/bin/bash
{ php.exe "$#" 2>&1 1>&3 | cat 1>&2; } 3>&1 | cat
ref. reg. stdout + stderr cat
This magically then makes php working again:
$ php -r 'echo "\xC3\xA4";'
ä
Applies to
$ git --version
git version 1.9.4.msysgit.1
I must admit I miss deeper understanding why this is all the way it is. But I'm finally happy that I found a workaround to use php in git bash with UTF-8 support.

For me the solution was just to enable unicode support.
Docs: https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support
git config --global core.quotepath off

I found the following steps helpful:
Run Git Bash
Right-click and select Options...
Select Text group at the left
Change Font to Consolas
Select C as Locale and UTF-8 as Character set
Apply and Save.
In the terminal execute:
git config --global core.quotepath false
In rare cases, execute in the terminal as well:
export LANG='C.UTF-8'

The problem with chcp 65001 is that there are bugs in the C runtime (MSVCRT) that make stdio calls return inconsistent results when run under code page 65001.
That should be better with Git 2.23 (Q3 2019)
See commit 090d1e8 (03 Jul 2019) by Karsten Blees (kblees).
(Merged by Junio C Hamano -- gitster -- in commit 0328db0, 11 Jul 2019)
gettext: always use UTF-8 on native Windows
On native Windows, Git exclusively uses UTF-8 for console output (both with MinTTY and native Win32 Console).
Gettext uses setlocale() to determine the output encoding for translated text, however, MSVCRT's setlocale() does not support UTF-8.
As a result, translated text is encoded in system encoding (as per GetAPC()), and non-ASCII chars are mangled in console output.
Side note: There is actually a code page for UTF-8: 65001.
In practice, it does not work as expected at least on Windows 7, though, so we cannot use it in Git. Besides, if we overrode the code page, any process spawned from Git would inherit that code page (as opposed to the code page configured for the current user), which would quite possibly break e.g. diff or merge helpers.
So we really cannot override the code page.
In init_gettext_charset(), Git calls gettext's bind_textdomain_codeset() with the character set obtained via locale_charset(); Let's override that latter function to force the encoding to UTF-8 on native Windows.
In Git for Windows' SDK, there is a libcharset.h and therefore we define HAVE_LIBCHARSET_H in the MINGW-specific section in config.mak.uname, therefore we need to add the override before that conditionally-compiled code block.
Rather than simply defining locale_charset() to return the string "UTF-8", though, we are careful not to break LC_ALL=C: the ab/no-kwset patch series, for example, needs to have a way to prevent Git from expecting UTF-8-encoded input.
And:
See commit 697bdd2 (04 Jul 2019), and commit 9423885, commit 39a98e9 (27 Jun 2019) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 0a2ff7c, 11 Jul 2019)
mingw: use Unicode functions explicitly
Many Win32 API functions actually exist in two variants: one with the A suffix that takes ANSI parameters (char * or const char *) and one with the W suffix that takes Unicode parameters (wchar_t * or const wchar_t *).
The ANSI variant assumes that the strings are encoded according to whatever is the current locale.
This is not what Git wants to use on Windows: we assume that char * variables point to strings encoded in UTF-8.
There is a pseudo UTF-8 locale on Windows, but it does not work as one might expect. In addition, if we overrode the user's locale, that would modify the behavior of programs spawned by Git (such as editors, difftools, etc), therefore we cannot use that pseudo locale.
Further, it is actually highly encouraged to use the Unicode versions
instead of the ANSI versions, so let's do precisely that.
Note: when calling the Win32 API functions without any suffix, it depends whether the UNICODE constant is defined before the relevant headers are #include'd.
Without that constant, the ANSI variants are used.
Let's be explicit and avoid that ambiguity.

Related

Why '$p' appears at the first line of vim in iterm?

I'm using Mac OS 10.13.4.
iTerm2 Version is Build 2.1.1
Vim Version is
VIM - Vi IMproved 8.0 (2016 Sep 12, compiled Nov 29 2017 18:37:46)
Included patches: 1-503, 505-680, 682-1283
Compiled by root#apple.com
And I Installed the Vundle plugin
And when I open vim in iTerm2 with oh-my-zsh, the $p appears at the first line.
It doesn't appear with Terminal.app of Apple?
How to solve this problem?
vim is assuming that iTerm2 is xterm (not a good assumption), and attempting to determine the state of the cursorBlink resource by sending an escape sequence containing $p which iTerm2 does not handle properly (see source-code for vim for the escape sequence, and also where it uses the feature). While vim starts with the TERM setting (i.e., "xterm"), it does make a few checks to exclude things like gnome-terminal as noted in the source code comment. But iTerm2 happens to fool vim in this case. So the result goes to the screen.
In XTerm Control Sequences that is documented like this:
CSI ? Ps$ p
Request DEC private mode (DECRQM). For VT300 and up, reply is
CSI ? Ps; Pm$ y
where Ps is the mode number as in DECSET/DECSET, Pm is the
mode value as in the ANSI DECRQM.
Two private modes are read-only (i.e., 1 3 and 1 4 ), pro-
vided only for reporting their values using this control
sequence. They correspond to the resources cursorBlink and
cursorBlinkXOR.
Although vim is fooled here, the problem is due to iTerm2, for which you've probably set the TERM environment variable to xterm-256color, or something like that. It doesn't match iTerm2's behavior very well (the function keys don't match up, etc). ncurses provides a better one. But out-of-the-box MacOS has a terminal database that's about ten years old, and lacks that entry. To get a good terminal database (i.e., one where you could set TERM to iterm2), you could do that with MacPorts or home-brew.
Running infocmp to get a measure of the differences between the (correct) iterm2 entry and linux, xterm-256color shows that
it's actually closer to nsterm-256color (a correct entry for Terminal.app which Apple doesn't provide) with 38 lines of difference,
next closest linux with 76 lines and
still further away from xterm-256color with 94 lines.
The feature that's missing or mis-implemented in iTerm2 isn't in the terminal description: it's a special feature that vim has to guess about.
As mentioned above, this $p is due to vim sending a control sequence for a cursor blink request. In your .vimrc, you can turn off the cursor blink request by clearing the t_RC terminal option. I wrapped this in a conditional which turns it off only for 256 color terminals.
if &term =~ '256color'
set t_RC=
endif
Setting the terminal explicitly to linux solved it for me:
export TERM=linux
The p means print, which is the command the range is being given to, and yes.. it displays to you what is in that range.
In vim you can look at :help :range and :help :print to find out more about how this works. These types of ranges are also used by sed and other editors.
They probably used the 1,$ terminology in the tutorial to be explicit, but note that you can also use % as its equivalent. Thus, %p will also print all the lines in the file.

How does one change the language of the command line interface of Git?

I’d like to change the language of git (to English) in my Linux installation without changing the language for other programs and couldn’t find the settings.
How to do it?
Add these lines to your ~/.bashrc, ~/.bash_profile or ~/.zprofile to force git to display all messages in English:
# Set Git language to English
#alias git='LANG=en_US git'
alias git='LANG=en_GB git'
The alias needs to override LC_ALL on some systems, when the environment variable LC_ALL is set, which has precedence over LANG. See the UNIX Specification - Environment Variables for further explanation.
# Set Git language to English
#alias git='LC_ALL=en_US git'
alias git='LC_ALL=en_GB git'
In case you added these lines to ~/.bashrc the alias will be defined when a new interactive shell gets started. In case you added it to ~/.bash_profile the alias will be applied when logging in.
If you just want to have one command in english instead you can just write LC_ALL=C before the command, for example:
LC_ALL=C git status
will result in
# On branch master
nothing to commit, working directory clean
The locale as used in C is English and always available without installing additional language packs
(see https://askubuntu.com/a/142814/34298)
To change it for the whole current bash session just enter
LANG=C
To change it for example to german enter
LANG=de_DE.UTF-8
Adding this line solved the problem for me:
Update: it seems like more components require a Locale as well now.
$ more ~/.bash_profile
export LANG=en_US (obsolete)
export LANG="en_US.UTF-8" (Updated)
Run LC_MESSAGES=C git, not LC_ALL=C or LANG=C and no need delete or rename files.
This command change output Git messages to english.
Note: since Git 2.3.1+ (Q1/Q2 2015), Git will add Accept-Language header if possible.
See commit f18604b by Yi EungJun (eungjun-yi)
Add an Accept-Language header which indicates the user's preferred
languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG.
This gives git servers a chance to display remote error messages in
the user's preferred language.
You have locale for git gui or other GUIs, but not for the command-line, considering it was one of the questions of GitSurvey 2010
localization of command-line messages (i18n) 258 3.6%
Of course, since 2010, as po/README describes:
Before strings can be translated they first have to be marked for translation.
Git uses an internationalization interface that wraps the system's
gettext library, so most of the advice in your gettext documentation
(on GNU systems info gettext in a terminal) applies.
In place since git 1.7.9+ (January 2012):
Git uses gettext to translate its most common interface messages into the user's language if translations are available and the locale is appropriately set.
Distributors can drop new PO files in po/ to add new translations.
So, if your update has mess up the translation, check what gettext uses:
See, for instance, "Locale Environment Variables"
A locale is composed of several locale categories, see Aspects. When a program looks up locale dependent values, it does this according to the following environment variables, in priority order:
LANGUAGE
LC_ALL
LC_xxx, according to selected locale category: LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, ...
LANG
Variables whose value is set but is empty are ignored in this lookup.
LANG is the normal environment variable for specifying a locale. As a user, you normally set this variable (unless some of the other variables have already been set by the system, in /etc/profile or similar initialization files).
LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, and so on, are the environment variables meant to override LANG and affecting a single locale category only.
For example, assume you are a Swedish user in Spain, and you want your programs to handle numbers and dates according to Spanish conventions, and only the messages should be in Swedish. Then you could create a locale named ‘sv_ES’ or ‘sv_ES.UTF-8’ by use of the localedef program. But it is simpler, and achieves the same effect, to set the LANG variable to es_ES.UTF-8 and the LC_MESSAGES variable to sv_SE.UTF-8; these two locales come already preinstalled with the operating system.
LC_ALL is an environment variable that overrides all of these. It is typically used in scripts that run particular programs. For example, configure scripts generated by GNU autoconf use LC_ALL to make sure that the configuration tests don't operate in locale dependent ways.
Some systems, unfortunately, set LC_ALL in /etc/profile or in similar initialization files. As a user, you therefore have to unset this variable if you want to set LANG and optionally some of the other LC_xxx variables.
Earlier, HTTP transport clients learned to tell the server side what locale they are in by sending Accept-Language HTTP header, but this was done only for some requests but not others.
This is fixed with Git 2.38 (Q3 2022):
See commit b0c4adc (11 Jul 2022) by Li Linchao (Cactusinhand).
(Merged by Junio C Hamano -- gitster -- in commit 4b8cdff, 19 Jul 2022)
remote-curl: send Accept-Language header to server
Helped-by: Junio C Hamano
Signed-off-by: Li Linchao
Git server end's ability to accept Accept-Language header was introduced in f18604b ("http: add Accept-Language header if possible", 2015-01-28, Git v2.4.0-rc0 -- merge), but this is only used by very early phase of the transfer, which is HTTP GET request to discover references.
For other phases, like POST request in the smart HTTP, the server does not know what language the client speaks.
Teach git client to learn end-users preferred language and throw accept-language header to the server side.
Once the server gets this header, it has the ability to talk to end-user with language they understand.
This would be very helpful for many non-English speakers.
GIT defaults to english if it cannot find the Locale language.
So if you want GIT to be in english, just sabotage the language file that it is running with. In my case it was always running with german (ie: de.msg).
If I deleted it or renamed the it, then it defaulted to english.
Here I renamed the file
As Bengt suggested : Add these lines to your ~/.bashrc or ~/.bash_profile to force git to display all messages in English:
vim ~/.bashrc - for this profile (if you are user ubuntu and you edit this it will be only for this user);
add this lines:
# Set Git language to English
#alias git='LANG=en_US git'
alias git='LANG=en_GB git'
#you can add also
LANG=en_GB
and after you close the file you need to write in shell:
source ~/.bashrc
to reload new settings or exit the terminal and connect again :)
Here is my solution to change git language follow answer this and this
1) nano ~/.bashrc
2) add alias git='LANG=en_GB git' to the file
2) save the file
4) source ~/.bashrc
Now your git already change the language. However, IF after your restart terminal and it not working anymore, you need to
4.1) nano ~/.profile
4.2) add source ~/.bashrc
4.3) save the file
it will make source ~/.bashrc run whenever you open the terminal
Hope it help

GIT: does not handle filenames which contain unicode char(e.g. chinese/korean)

Issues:
Using ls in GIT shows all unicode in filenames as '?' (i.e. ???.mp3).
When using git add -A the following error is returned: "fatal: unable to stat 'example/???.mp3': no such file or directory"
Is there a solution to this?
Thanks.
As of MSysGit 1.7.10 (the latest version at this time), Unicode is correctly supported on Windows, at the condition you tweak some settings and use a truetype font in the console.
See explanations here, including how to deal with previous repositories.
Msysgit doesn't have support for non-ASCII characters in filenames. See its issue 80 for details.
Consider using Cygwin's git package instead, which does have full UTF-8 support.
Git for Windows now uses Unicode for filenames.
[Edit: s/not/now/.. sic :( ]

Cygwin command not found bad characters found in .bashrc 357\273\277

I'm new to Cygwin, I just installed it and attempted to set some simple environment variables. However, when I open the command shell, I get the error "#357\273\277 command not found"
I found an article that discusses what the problem is and how to "discover" the hidden bad character:
http://mblog.lib.umich.edu/DataDiscussions/archives/2010/01/index.html
but I don't know how to resolve the issue by removing the character (which I validated was a problem in my .bashrc file using the od command). I attempted to change the preferences view in Notepad++ to UTF-8 and ANSI to no avail, but the file was not altered at all.
Any help would be appreciated...
As far as I know, a common problem with files saved in Notepad++ as UTF-8 and Cygwin is that Notepad++ saves UTF-8 encoded files with a byte order mark by default. This BOM character is not quite compatible with unix-like environments like Cygwin.
If you need unicode characters in these files, then you can try using the "UTF-8 without BOM" encoding in Notepad++, otherwise you can use ANSI or other encodings that don't use a BOM by default.
Besides the encoding, make sure the file's saved with unix (LF) line-breaks.
Before feeding your files to cygwin bash, you can do a dos2unix conversion first to take care possible conflicts like CR LF. Open bash
name#host ~
$ dos2unix your_file.sh
It looks like if I change the encoding from UTF-8 to ANSI (not the view preferences), the file will update and the special characters are gone, fixing the "\357\273\277 command not found" issue. Hooray!
One way to strip these is in Linux is by using vi. If you say
vi filename
and then in vi use the ed command :se fileencoding=ASCII
this will strip the oddball characters out.
You can confirm this by saving the file and then running od -c on the file.
Before:
od -c changes.sql | head
0000000 357 273 277 I N S E R T I N T O `
After:
od -c changes.sql | head
0000000 I N S E R T I N T O ` c o n
Since you have edited your .bashrc outside of Cygwin and used a Windows editor, the editor might have messed up your newline character (ie. CR, R, etc.) You can tell Notepad++ to show hidden characters. I think you can find it in its settings. Changing charsets is one thing, but being able to see hidden characters is another.
This article mentions a few programs that you can use to convert text files from one standard to another. Try using dos2unix on the file (in the cygwin command line).

OS X Terminal UTF-8 issues

Okay, so I finally got myself a MacBook Air after 15 years of linux. And before I got it my big concern was UTF-8 support because no matter if I get files sent to me from windows or mac-clients theres always issues with encoding, while on ubuntu I can be sure that all output no matter what program will produce perfect utf-8 encoded data.
And now on my second day (today) with OS X Im tearing my hair of by frustration. Why?
When I open Nano and type some swedish characters like ÅÄÖ in it, it puts out blank characters at the end of the line (which i guess is the other byte in each character)
When I open python and try using swedish characters, it does not output anything at all
When I connect to a Ubuntu server trough SSH I cant type åäö in bash, tough it works in VIM (still trough SSH). And in nano backspace does not work, but if check the box "Delete sends ctrl+H" in the Terminal preferences, backspace starts working in nano but stops working in VIM.
I've tried unchecking all other encodings then UTF-8 in terminal preferences but that does not seem to work either.
I'm sure that every non US-person must have the same issues, so hove do I fix them? I just want full UTF-8 support... :'(
For me, this helped:
I checked locale on my local shell in terminal
$ locale
LANG="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
Then connected to any remote host I am using via ssh and edited file /etc/profile as root - at the end I added line:
export LANG=cs_CZ.UTF-8
After next connection it works fine in bash, ls and nano.
Go to Terminal -> Preferences -> Advanced (Tab) go down to International and select Unicode (UTF-8) as Character Encoding.
And tick Set locale environment variables on startup.
Unfortunately, the Preferences dialog is not always very helpful, but by tweaking around you should be able to get everything working.
To be able to type Swedish characters in Terminal, add the following lines to your ~/.inputrc (most likely you must create this file):
set input-meta on
set output-meta on
set convert-meta off
This should do the work both with utf8 and other codings in bash, nano and many other programs. Some programs, like tmux, also depends on the locale. Then, adding for instance export LC_ALL=en_US.UTF-8 to your ~/.profile file should help, but keep in mind that a few (mainly obscure) programs require a standard locale, so if you have trouble running or compiling a program, try going back to LC_ALL=C.
Some references that may be helpful:
http://homepage.mac.com/thgewecke/mlingos9.html#unicode
http://hints.macworld.com/article.php?story=20060825071728278
The following is a summary of what you need to do under OS X Mavericks (10.9). This is all summarized in
http://hints.macworld.com/article.php?story=20060825071728278
Go to Terminal->Preferences->Settings->Advanced.
Under International, make sure the character encoding is set to Unicode (UTF-8).
Also, and this is key: under Emulation, make sure that Escape non-ASCII input with Control-V is unchecked (i.e. is not set).
These two settings fix things for Terminal.
Make sure your locale is set to something that ends in .UTF-8. Type locale and look at the LC_CTYPE line. If it doesn't say something like en_US.UTF-8 (the stuff before the dot might change if you are using a non-US-English locale), then in your Bash .profile or .bashrc in your home directory, add a line like this:
export LC_CTYPE=en_US.UTF-8
This will fix things for command-line programs in general.
Add the following lines to .inputrc in your home directory (create it if necessary):
set meta-flag on
set input-meta on
set output-meta on
set convert-meta off
This makes Bash be eight-bit clean, so it will pass UTF-8 characters in and out without messing with them.
Keep in mind you will have to restart Bash (e.g. close and reopen the Terminal window) to get it to pay attention to all the settings you make in 2 and 3 above.
Short versatile answer (fits to other national languages, even Lithuanian or Russian)
open Terminal
edit .profile in home directory - nano .profile or in Catalina or newer nano .zshenv
add line export LC_ALL=en_US.UTF-8
press Ctrl+x and Y (exit and save)
This solved for me even small country rare national characters. You may need to close and open Terminal to make changes effective.
Also if you like Linux behavior (use lot of Alt shortcuts like Alt+. or Alt+, in mc) then you should disable Mac style Option key function:
Terminal->Preferences->Profiles->Keyboard and check box:
Use Option as Meta key
To make nano work as you want it to, try:
export LANG="UTF-8"
Or get a newer version of nano via MacPorts:
# cf. http://www.macports.org/install.php
port info nano
port variants nano
sudo port install nano +utf8 +color +no_wrap
With respect to ssh & UTF-8 issues comment out SendEnv LANG LC_* in /etc/ssh_config.
See: Terminal in OS X Lion: can't write åäö on remote machine
My terminal was just acting silly, not printing out åäö. I found (and set) this setting:
Under Terminal -> Preferences... -> Profiles -> Advanced.
Seems to have fixed my problem.
Check whether nano was actually built with UTF-8 support, using nano --version. Here it is on Cygwin:
nano --version
GNU nano version 2.2.5 (compiled 21:04:20, Nov 3 2010)
(C) 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
2008, 2009 Free Software Foundation, Inc.
Email: nano#nano-editor.org Web: http://www.nano-editor.org/
Compiled options: --enable-color --enable-extra --enable-multibuffer
--enable-nanorc --enable-utf8
Note the last bit.
Since nano is a terminal application. I guess it's more a terminal problem than a nano problem.
I met similar problems at OS X (I cannot input and view the Chinese characters at terminal).
I tried tweaking the system setting through OS X UI whose real effect is change the environment variable LANG.
So finally I just add some stuff into the ~/.bashrc to fix the problem.
# I'm Chinese and I prefer English manual
export LC_COLLATE="zh_CN.UTF-8"
export LC_CTYPE="zh_CN.UTF-8"
export LC_MESSAGES="en_US.UTF-8"
export LC_MONETARY="zh_CN.UTF-8"
export LC_NUMERIC="zh_CN.UTF-8"
export LC_TIME="zh_CN.UTF-8"
BTW, don't set LC_ALL which will override all the other LC_* settings.
Try
Having a Powerline compatible font installed https://github.com/powerline/fonts
Setting these ENV vars in .zshrc or .bashrc:
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Just add a file on remote server
$ sudo nano /etc/environment
LANG=en_US.utf-8
LC_ALL=en_US.utf-8
PS: Top answer has a suggestion to change /etc/profile file on remote server, it works, but this file is often overwritten by system, and doesn't help for long.
/etc/profile file contains disclaimer:
It's NOT a good idea to change this file unless you know what you are doing. It's much better to create a custom.sh shell script in /etc/profile.d/ to make custom changes to your environment, as this will prevent the need for merging in future updates.
In my case, simply using the uxterm command instead of xterm solved the problem. It's available in /opt/X11/bin/uxterm by installing the XQuartz package provided by Apple.

Resources