Weird character set in white spaces - macos

I am having an issue with char set on my mac and these characters show up differently on XQuartz and terminal. I am working on mac OS X (10.8.5) and the compiler is clang-503.0.40. Locale output is:
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="
and I have tried setting stty, but some funny characters creep into white spaces on the screen and in the files that I am writing my output to.
Example: ÿÿ2,ÿÿÿÿÿÿÿÿÿÿÿ1,12,21,ÿ0,ÿ2,ÿ0.00,60.00,WinterDesignDay
Whereas the expected output is:2, 1,12,21, 0.00,60.00,WinterDesignDay
I went through a lot of questions listed on https://stackoverflow.com/questions/tagged/encoding but I couldn't find an answer that solves my problem. Any help is greatly appreciated!

Related

Displaying Telugu on Terminal or iterm2 applications of Mac OS

Terminal and iterm2 applications on my Mac don't display Telugu characters properly. The characters get all jumbled up. I see the same issue with other languages like Kannada and Sanskrit. Some characters seem fine but some others are getting jumbled (as if one character is being super-imposed on another).
I set my text-encoding of Terminal to utf-8, did export LC_CTYPE=en_US.UTF-8 as suggested by other answers but nothing seems to work. Here is my locale:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
Enabling "double width" setting did not solve the problem either. I also checked "set locale environment variable on startup". That did not work either.
Note that the characters are being displayed properly in other applications like browsers, word processors, etc. So the problem is local to terminal apps like Terminal and iterm2.
This is how the word "Telugu" is being displayed

Lynx UTF-8 support

I am using Lynx on OS X 10.11. However, it does not print UTF-8 for non-ASCII characters, but rather either an ASCII representation of them, or the ef bf bd "replacement" character (?).
I have been studying this guide for help.
The output from the locale command:
locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
When I run Lynx with
lynx http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
here is what the display appears like:
According to the posts in the article, Lynx should print UTF-8 properly.
lynx -dump ... prints the same.
(running export LC_ALL="en_US.UTF-8" doesn't help either.)
What is strange, is that if I run with the -mime_header argument, eg:
lynx -mime_header http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
It prints the characters properly. (Albeit, as a dump rather than opening in a browser environment):
EDIT:
Forgot to mention,
-assume_charset=utf8 and -assume_unrec_charset=utf8
don't help either.
EDIT:
Well I am able to get the output I want by hard-setting CHARACTER_SET in lynx.cfg. Though this seems like a bit of a workaround, as in the documentation it states:
# ... The 'o'ptions menu setting will be stored in the user's RC
# file whenever those settings are saved, and thereafter will be used as the
# default. ...
However, the setting only persists for the session it is set in. That won't work for me as I am primarily using lynx -dump in a script. But as I pretty much am only UTF-8, I guess I can live with the hard setting for now.
I do think you should use
lynx -dump --display_charset=utf-8
rather than hard-setting the config file
so
lynx --display_charset=utf-8 http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
alternatively
check
https://www.brow.sh/

bash copyright changes to question mark

My Bash script runs a query placing the returned data into a file:
result=$($ORACLE_HOME/bin/sqlplus -s $DB_USER/$DB_PASS#$ORACLE_SID <<END>>$RETURN_FILE
set linesize 32767 pagesize 0 feedback off verify off heading off echo off;
$QUERY
exit;
END
)
The output in the file has copyright, register and hyphens all changed to ?
I have viewed this multiple ways, so it is not the editor, it is the file itself.
How can I correct this?
I have checked locale and from other posts I think it is correct:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
My version info is:
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Red Hat Enterprise Linux Server release 6.8 (Santiago)
BTW-this was originally a ColdFusion process running the same query against the same DB and the output file showed everything correctly.
Turns out export NLS_LANG=AMERICAN_AMERICA.AL32UTF8 was the answer (Oracle issue not Linux).

Powerline Font not working in iTerm 2

I'm trying to use Prezto on OS X with iTerm 2. I have the patched powerline font installed on my system, and I'm using the patched powerline font, but my prompt is still not being displayed correctly.
My locale settings are all utf-8:
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
I'm just not sure how to fix this. I use the same patched font setup on linux with the Terminator terminal and tmux, and I don't have any problems. Is there something else I have to do for iTerm?
It seems that you want to use the theme agnoster, set 12pt Meslo LG S DZ Regular for Powerline for Non-ASCII Font.

Why doesn't Encoding.default_external respect LANG?

It's my understanding that Ruby's Encoding.default_external is given a default value based on the environment variables LC_ALL and LANG, giving precedence to the former. I've run into several bugs where the default external encoding somehow ends up set to ASCII even though the environment variables are set to UTF-8.
For example:
$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:US-ASCII>
irb(main):002:0> ENV['LC_ALL']
=> nil
irb(main):003:0> ENV['LANG']
=> "en_US.UTF-8"
In the environments where this has happened, I've also grepped through all the gems being loaded for any code manually setting the default external encoding, but haven't found anything. How is what I'm seeing above possible? I'm using Ruby 2.2 above, but I've seen this happen on all Ruby 2.x versions.
I figured it out. Not only does the LANG environment variable need to be set, but the locale it species must have been generated for the OS. On a stock Linux image, the default locale may be something that is not UTF-8. In my particular case, I'm using Debian 7.7 and the default locale is "POSIX". I was able to set the default locale by installing the locales package and following the interactive prompts to generate the en_US.UTF-8 locale:
$ apt-get -y install locales
If the locales package is already installed, you can just reconfigure it instead:
$ dpkg-reconfigure locales
Now setting LANG will change the current system locale, and Ruby's Encoding.default_external will be set properly:
$ export LANG=en_US.UTF-8
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:UTF-8>
For an example of how to automate the generation and configuration of the default locale instead of doing it interactively, take a look at this Docker image.

Resources