mariadb amazon aws instance does not allow accent letters in command line - amazon-ec2

I've just installed an Amazon EC2 linux instance with MariaDB.
When trying to input accented chars in the db command line (like â, é, etc), it doesn't allow me.
I can't figure out what's wrong. The bash command line allows them with no problems.
UPDATE:
Maybe I haven't explained the issue well. As I am spanish, I am very used to write accented chars in every application. I can write them all kind of editors, including vim or nano. Bash allows them obviously. But when logged in the MariaDB (db command line ( MariaDB [adatabase]> )), I can't input any letter that is accented. This happens only in the MariaDB that I've installed in the Amazon EC2 instance, but not in another MariaDB database that I've installed (for testing purposes) in my own computer.
A new information for this issue is that with PHP (mysqli_connect function), I can insert or update rows with accented strings, without problem. So it's only the MariaDB command line that doesn't allow me to enter data with accented letters. Again: it's not that the letters don't appear in the database, but I can't even input them in the command line.
I've added:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4
to the mariadb-server.cnf file, and
[client]
default-character-set=utf8mb4
to the client.cnf file, both of them in the /etc/my.cnf.d folder, but no solution.

Have you established that the connection is UTF-8? Or is Bash even using that?
Have you declared your column to be CHARACTER SET utf8mb4? Or what?
See "best practice" and diagnostics for various common failure cases here.
More
init-connect (in my.cnf) is ignored by user root. Be sure not to use root (or other SUPER user) for application work.
I don't know (and I suspect you don't know) what encoding bash, vim, nano, and other editors use. It could (should) be UTF-8. But it could be latin1 or any of all too many other encodings, most of which are quite happy to handle all the accented letters of Spanish and the rest of Western European languages.
If you can somehow get the hex for, say, ñ, we can dig deeper. If you get the single character, hex F1, then it is one of cp1250, cp1257, dec8, latin1, latin2, latin5, latin7. If you get hex C3B1 then it is utf8 or utf8mb4 (as named in MySQL) or UTF-8 (as named by everyone else).
Spanish Characters
chars: áéíñóúü
ÁÉÍÑÓÚÜ
¿¡€’“”«»–—
latin1: E1 E9 ED F1 F3 FA FC
C1 C9 CD D1 D3 DA DC
BF A1 80 92 93 94 AB BB 96 97
UTF-8: C3A1 C3A9 C3AD C3B1 C3B3 C3BA C3BC
C381 C389 C38D C391 C393 C39A C39C
C2BF C2A1 E282AC E28099 E2809C E2809D C2AB C2BB E28093 E28094
CMD (in Windows)
The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. See some code pages .
To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console
$ mysql
https://stackoverflow.com/a/6788223/1766831 suggests (with an update to utf8mb4):
mysql --default-characterset=utf8mb4
and (or)
[mysql]
default-character-set = utf8mb4
Note that it is [mysql], not [mysqld].

Related

Convert mangled characters back to UTF-8

Here is what I did:
I dumped a SQLite database with UTF-8 data (sqlite3 example.db .dump > dump.sql), but since this was in powershell, I assume the piping converted it to windows-1252
I loaded that dumped data into a new database, again using powershell (Get-Content dump.sql | sqlite3 example2.db)
I dumped that new database and am left with a new .sql file (this time it was not through powershell - so I assume it was unmodified)
This new sql file's UTF-8 characters are seriously mangled, and I was wondering if there was a way to convert it back into correct UTF-8.
As a few examples, here are what some sequences are in the new file, and what they should be (all are viewed as UTF-8):
ÒüéÒü¬ÒüƒÒü½ should be あなたに
´╝ü should be a full width exclamation mark
Òé¡Òé╗Òé¡ should be キセキ
Does anyone have any idea as to how I might undo this mangling? Any method would be very helpful!
This is in powershell 7.0.1
Edit:
On further inspection, you can duplicate my predicament by redirecting any such data to a file in powershell (note that the data cannot itself be entered in powershell). Hence, setting up a script like this gives the same outcome:
test.sh
#!/bin/bash
echo "キ"
And then running wsl ./test.sh > test.txt will give an output of Òé¡, not キ
Edit 2:
It seems as if the codepage the UTF-8 text was converted to is almost 437: some characters are restored using this assumption (e.g. 木), but others are not. If it's close to 437, but isn't, what could it be?
It turns out, since I am in the UK, the codepage I wanted was 850. Saving the file as 850 and then reloading it as UTF-8 fixed my issue!

Corruption when using certain batch variable names in custom build command

I have a VS2013 project with a custom build command. In the command script I set an environment variable, and read it out again in the same script. I can confirm by calling set that setting the variable works. However, depending on the variable name, I can't read it out again.
The following works as expected when run as a batch script:
set AVAR=xxx
set ABLAH=xxx
set BBLAH=xxx
set DEV=xxx
set #ABLAH=xxx
echo %AVAR%
echo %ABLAH%
echo %BBLAH%
echo %DEV%
echo %#ABLAH%
But produces the following output in the project:
1> xxx
1> «LAH
1> »LAH
1> ÞV
1> xxx
In this case, the name AVAR works, but many others don't. Also, variables starting with # seem to work. Any idea what is going on?
I've found the solution. Visual Studio (msbuild) converts %XX escape sequences like in URLs. I only expected it to so in URLs, like browsers do. However, it seems to replace them everywhere.
So when it encounters %ABCDE%, it recognizes %AB and inserts the character « = 0xAB, giving «CDE% to the batch interpreter. But if the code is not a valid hexadecimal number, it silently ignores it, and the interpreter sees the right characters. That's why variable names with # at the beginning always worked.
So the solution is to escape at least all % in front valid hex codes 00-FF, better even all of them, with %25.
An easy solution would be to just edit the corresponding commands in the GUI (via project properties), and not directly in the .vcxproj or .props file. This way, VS inserts the correct escape codes. In my case this was not possible since the commands were defined as user macros (Property Pages: Common Properties/User Macros). My commands span multiple lines, but the user macro editor only supports single lines.
Another thing to watch out for is that it not only replaces percent signs. Other symbols have special meaning and have to be replaced, too. (This goes beyond XML entities, like & -> &.) Here is a list of special characters from MSDN. The characters are: % $ # ' ; ? *. It doesn't seem to be necessary to replace all of them all the time, but if you notice funky behavior then this is a thing to look at. You can try to enter these characters through the GUI and see how and if VS escapes them in the project file.
On other character to note especially is the semicolon. If you define a property with unescaped semicolons, like <MyPaths>DirA;DirB</MyPaths>, msbuild/VS will internally convert them to newlines (well, or it splits the property into a list or something). But it will still show the paths as separated with semicolons in the property pages! Except when you click the dropdown button next to a property and select <Edit...>, then it will show the paths as a list or separated by newlines! This is completely invisible most of the time, except when you set a property not in XML or the GUI, but you are reading the output of a command into a property. In this case the command must output newlines, if you want the effect of a semicolon. Otherwise you don't get multiple paths, but one long path with semicolons in it.
Batch files are usually in North American and Western European countries "ASCII" files using an OEM code page like code page 850 (OEM multilingual Latin I) or code page 437 (OEM US) and not code page Windows-1252 as used usually for single byte encoded text files. The code page to use for a batch file depends on local settings for non Unicode files in console. The code page does not matter if just characters with a code value smaller 128 are used in batch file, i.e. the batch file is a real ASCII file.
Therefore make sure that you edit and save the batch file as ASCII file using the right code page and not as Unicode file using UTF-8, UTF-16 Little Endian or UTF-16 Big Endian. Editor of Visual Studio uses by default UTF-8 encoding for the files. This is the wrong encoding for batch files.
Character « has in table of code page 850 the code value 174 decimal (0xAB). In table of code page 1252 code value 174 is for character ® which is an indication that you want to output in batch file characters encoded in UTF-8 (also code value 174 for character ®) or Windows-1252.
A simple batch code for demonstration stored as ANSI file with code page Windows-1252.
#echo off
cls
echo This batch file was saved as ANSI file using code page Windows-1252.
echo.
echo Registered trademark symbol ® has code value 174 in Windows-1252.
echo.
echo But active code page is not Windows 1252 in console window.
echo.
chcp
echo.
echo Therefore the left guillemet character is output instead of registered
echo trademark symbol as this character has in code page 850 code value 174.
echo.
echo Press any key to continue ...
pause>nul
And batch files are for DOS/Windows and should therefore use carriage return + line-feed as line terminator instead of just line-feed (UNIX) or just carriage return (old MAC).
Some text editors display line terminator type and encoding respectively code page somewhere in status bar at bottom of main application window for active file.

Character replacement batch file

I'm trying to do a batch script using Windows command line to convert some characters for example:
É to Й
Ö to Ц
Ó to У
Ê to К
Å to Е
Í to Н
à to Г
Ø to Ш
Ù to Щ
Ç to З
with no success. That's because I am using a program that does not support a Cyrillic font.
And I have already the file with these words, like:
ОБОГРЕВ ЗОНЫ 1
ДАВЛЕНИЕ ЦВЕТА 1
...
and so on...
Is it possible?
I'm guessing that you'd like to convert the character set (alias code page) of a file so you can open and read it.
I'm assuming you are using a Windows computer.
Let's say that your file is russian.txt and when you open it with notepad, the characters doesn't make any sense. The russian.txt file's character encoding is most propably ANSI and it's code page is Windows-1251.
Some words about character encoding:
In ANSI one character is one byte long.
Different languages have different code pages: Windows-1251 = Russian, Windows-1252 = Western Languages (English, German, Swedish...), Windows-1253 = Greek ...
In UTF-8 English characters are one byte long and non-English characters two bytes long.
In Unicode all characters are two bytes long.
UTF-8 and Unicode doesn't need code pages.
You can check the encoding by opening the file in notepad and clicking File, Save As. At the right bottom corner beside the Save-button you can see the encoding.
With some googling I found a site where you can do the character encoding conversion online. I Haven't tested it, but here's the address:
http://i-tools.org/charset
I've made a script (= a small program) which changes the character encoding from any ANSI and code page combination to UTF-8 or Unicode or vice versa.
Let's say you have and English Windows computer and want to convert the russian.txt (ANSI / Windows-1251) to UTF-8.
Here's how:
Open this web-page and copy the script in it to the clipboard:
VB6/VBScript change file encoding to ansi
Create a new file named ConvertCharset.vbs to the same folder, where the russian.txt is, say C:\Temp.
Open the ConvertCharset.vbs in notepad (right click+edit) and paste.
Open CMD (Windows-button+R, cmd, Enter).
In CMD-window type (hit Enter-key at each end of the line):
cd C:\Temp\
cscript ConvertCharset.vbs /InputCharset:Windows-1251 /OutputCharset:utf-8 /InputFile:russian.txt /OutputFile:russian_utf-8.txt
Now the you can open the russian_utf-8.txt in notepad and you'll see the Russian characters OK.
More info:
http://en.wikipedia.org/wiki/Character_encoding
http://en.wikipedia.org/wiki/Windows-1251
http://en.wikipedia.org/wiki/UTF-8
VB6/VBScript change file encoding to ansi

Issue with encoding of a character (not able to sed or .gsub)

I am dealing with some multilingual data(English and Arabic) in a json file with a weird character i am not able to parse. I am not sure what the character is. I tried getting the ASCII value via vim and this is what i got
"38 0x26"
This is the status line in vim i used to get the value (http://vim.wikia.com/wiki/Showing_the_ASCII_value_of_the_current_character).
:set statusline=%<%f%h%m%r%=%b\ 0x%B\ \ %l,%c%V\ %P
This is how the character looks in vim -
I tried 'sed' and '.gsub' to replace this character unsuccessfully.
Is there a way where i can replace this character(preferably with .gsub ruby) with '&' or something else?
Thanks
try with something like
sed 's/[[:alpnum:][:space:]\[\]{}()\.\*\\\/_(AllAsciiVariationYouWant)/&/g;t
s/./?/g' YourFile
where (AllAsciiVariationYouWant) is all character that you want to keep as is (without the surrounding "()" )
JSON is encoded in UTF-8 (Unicode). If you're seeing funky-looking characters in your file, it's probably because your editor is not treating Unicode characters properly. That could be caused by the use of a terminal emulator that doesn't support Unicode; an incorrect $LANG setting; vim not being able to correctly determine the encoding of the file; and likely other reasons.
What terminal program are you using? What's your $LANG environment variable set to (echo $LANG)? If you're certain your terminal supports Unicode, try:
LANG=en_US.utf-8 vim your_file_here.json
(The above example assumes that U.S. English is appropriate for the file, which it may not be.)
As for replacing characters in the file, vim's substitution command can be used:
:%s/old text/new text/g
The above command will run the substitute command on all lines in the file (%), replacing every instance of "old text" with "new text". (The g at the end tells vim to replace every instance on a line, not just the first it finds.)

No norwegian characters in LaTeX

I have translated a document from English to Norwegian in the LaTeX format, and while using norwegian special characters, I get an error using
\usepackage[utf8x]{inputenc}
to try and display the norwegian (scandinavian) special characters in PostScript/PDF/DVI format, saying
Package utf8x Error: MalformedUTF-8sequence.
So while that didn't work, I tried out another possible solution:
\usepackage{ucs}
\usepackage[norsk]babel
And when I tried to save that in Emacs I get this message:
These default coding systems were tried to encode text
in the buffer `lol.tex':
(utf-8-unix (905 . 4194277) (916 . 4194245) (945 . 4194278) (950
. 4194277) (954 . 4194296) (990 . 4194277) (1010 . 4194277) (1013
. 4194278) (1051 . 4194277) (1078 . 4194296) (1105 . 4194296))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: \345 \305 \346 \345 \370 \345 \345 \346 \345 \370 ...
Thanks to Emacs I have the possibility to check out the properties of those characters and the first one tells me:
character: \345 (4194277, #o17777745, #x3fffe5)
preferred charset: eight-bit (Raw bytes 128-255)
code point: 0xE5
syntax: w which means: word
buffer code: #xE5
file code: not encodable by coding system utf-8-unix
display: not encodable for terminal
Which doesn't tell me much. When I try to build this with texi2dvi --dvipdf filename.text I get a perfectly fine PDF, all without the special norwegian characters.
When I am about to save Emacs also ask me:
"Select coding system (default raw-text):"
And I type in utf-8 to choose its coding system. I have also tried to choose default raw-text to see if I get some different result. But nothing.
At last I tried
\lstset{inputencoding=utf8x, extendedchars=\true}
... a code I came over while trying to google the solution to this problem. Which gives me this error:
Undefined control sequence.
So basically, I have tried every encoding option I have been able to find and nothing works. I am desperately trying to make this work since the norwegian translation must be published before the deadline.
As an additional information I may add that I found out later on that I only had the en_US.UTF-8 in my locale, so I added nb_NO.UTF-8 and nb_NO.ISO-8859-15 and ran locale-gen + reboot without any changes.
I hope I provided enough information to get some assistance, the characters in question is æ ø å.
Apparently your emacs is having a hard time saving the file as UTF-8 (which doesn't make much sense since it should be able to represent all characters using that encoding). You should try using another editor with multiple encoding support to save the file as UTF-8.
While you're unable to save the file in UTF-8, LaTeX will not be able to correctly read it, unless you specify your current file encoding as inputenc package parameter. You may want to try to, for instance, save the file as-is in emacs but specifying \usepackage[latin1]{inputenc} which should do the trick if emacs is writing the file using something in the *iso-8859-** family.
I solved this error by setting the coding system for saving file:
C-x C-m f utf-8-unix

Resources