character set encoding in mutt - utf-8

This question is increasingly esoteric, as we generally migrate away from text-based email reading. But often, I read and send email using mutt. When I copy something from a web page (usually encoded as utf-8), and then paste it into mutt, I get weird character encodings. Like, \xe2\x80\x9cWhen push .. (for a double quote).
I am using Mutt 1.4.2.3i (2007-05-26);
My shell is GNU bash, version 4.0.10(2)-release (i386-portbld-freebsd7.2);
My .bashrc file has export LANG=en_CA.UTF-8;
In my .muttrc, I specify set charset = "en_ca.UTF-8";
Indeed, in mutt itself, if I type :set &charset ?charset I get back charset="utf-8".
But I still don't get the result I'm looking for: when pasting text into an email, I get incorrect characters. Thanks!

Keep in mind that mutt uses vi as its editor...

Related

How to archive an entire FTP server where many of the filenames seem to include illegal characters

I am trying to use wget -m <address> to download the contents of an FTP server. A lot of the content is icelandic and so contains a bunch of weird characters that I think are causing issues as I keep seeing:
Incomplete or invalid multibyte sequence encountered
I have tried adding flags such as --restrict-file-names=nocontrol but to no avail.
I have also tried using lftp but doesn't seem to make any difference.
According to wget manual
If you specify ‘nocontrol’, then the escaping of the control
characters is also switched off.
that is it as actually more permissive than default, bunch of weird characters suggest you have some issues with getting encoding right and therefore ascii seems to be best fit for your use case
The ‘ascii’ mode is used to specify that any bytes whose values are
outside the range of ASCII characters (that is, greater than 127)
shall be escaped. This can be useful when saving filenames whose
encoding does not match the one used locally.
As I do not have ability to test, please try it and write about result it give.

RStudio: keeping special characters in a script

I wrote a script with German special characters e.g. ü.
However, whenever I close R and reopen the script the characters are substituted:
Before "für"; "hinzufügen"; "Ø" - After "für"; "hinzufügen"; "Ã".
I tried to remedy it using save with encoding and choosing UTF-8 as it is stated here but it did not work.
What am I missing?
You don't say what OS you're using, but this kind of thing really only happens on Windows nowadays, so I'll assume that.
The problem is that Windows has a local encoding that is not UTF-8. It is commonly something like Latin1 in English-speaking countries. I'm not sure what encoding people use in German-speaking countries, if that's where you are. From the junk you saw, it looks as though you saved the file in UTF-8, then read it using your local encoding. The encodings for writing and reading have to match if you want things to work.
In RStudio you can try "Reopen with encoding..." and specify UTF-8, and you'll probably get your original back, as long as you haven't saved it after the bad read. If you did that, you've got a much harder cleanup to do.

Always quote plain version with mutt

I read mails with mutt and a lot of people send mail with both plain and html version.
I prefer to see html version with elinks in some cases so I had to set text/html as my preferred alternative.
However, when replying, the quote is infamous (a lot of ugly characters) so I'd like to use the text/plain version in the quote.
Two linked questions emerge:
is it possible to have alternative_order option set depending on the folder
is it possible to quote the text version of the email even if the mail was seen in html
This question has also bee asked on http://www.debian-administration.org/articles/75#comment_28 without reply.
Using the result from another question related to mutt configuration, I can think of a solution to set the alternative_order:
Suppose, I prefer to see plain text on folder1, html elsewhere:
#this reset all conf when changing folder
set my_reset_source=`~/.dotfiles/mutt/reset.sh ~/.dotfiles/mutt/*.config > /tmp/mutt-reset`
folder-hook . source /tmp/mutt-reset
folder-hook . source ~/.dotfiles/mutt/html_prefered
folder-hook folder1 source ~/.dotfiles/mutt/plain_prefered
with ~/.dotfiles/mutt/plain_prefered, being:
alternative_order text/plain
This solution, however does not answer the other part of the question: how to always quote the plain text version.

Is there anyway for a bash (or any other shell) script to detect whether the current terminal supports unicode characters?

I'd like to use Unicode characters if they are supported by the terminal, and fall back to ASCII characters if the user's terminal can't display them correctly. Is there any relatively easy way to do this in a shell script?
First, you're probably confusing Unicode with a particular encoding. Suppose you know that the termnal supports Unicode characters -- you still don't know how to print them!
You're probably thinking about something like UTF-8, the most popular Unicode encoding out there.
To get the encoding of the current locale, use
locale charmap
This is the encoding of the current locale, and theoretically it may differ from the encoding used by the terminal, but in that case something is broken on user's side.
In script print
:set encoding=utf-8
If you want your terminal support unicode, become new terminal with -u8 option
type in terminal xterm -u8

Why is this displaying weird characters on the console?

Take this text for example:
the three umlauts are ä, ö, and ü..
Let's assume they are in a text file, which I'm reading like this:
data = File.read("umlauts.txt")
Now, if I try to output them, I get this:
the three umlauts are Σ, ÷, and ⁿ.
If I write it to a file, they get outputted correctly. How can I make them show up properly on a windows command prompt? I'm using Ruby 1.8.6. I want to be able to perform quick debug from the command prompt.
What encoding is the file? I'm guessing probably utf-8. Windows cmd prompt does not use utf-8.
Here's a good article that covers this: http://illegalargumentexception.blogspot.com/2009/04/i18n-unicode-at-windows-command-prompt.html
Maybe set a different code page for cmd?
For explanations on encodings, read this.

Resources