Changing from latin1 to utf8 on mysql server - utf-8

We've recently started receiving 'utf8' codec can't decode bytes in position 30-31: unexpected end of data when collecting data from the database(using Python and SQLAlchemy). We've located the error to some special Japanese characters.
All tables have CHARSET=utf8 and these are the settings we have on the server when I run show variables;
| character_set_client | latin1
| character_set_connection | latin1
| character_set_database | latin1
| character_set_filesystem | binary
| character_set_results | latin1
| character_set_server | latin1
| character_set_system | utf8
| character_sets_dir | /usr/share/mysql/charsets/
| collation_connection | latin1_swedish_ci
| collation_database | latin1_swedish_ci
| collation_server | latin1_swedish_ci
If we wan't to move our environment to utf8 - Which settings are recommended and how should we export and import the current data we have to make it work with the new settings?
I've read some posts about exporting the data as latin1 by adding --default-character-set=latin1 to the mysqldump command and then importing it to the database that have the new settings, but since our original tables already is in utf8, this won't work.
Tried setting the connection after reading this thread: SQLAlchemy and UnicodeDecodeError
This solves the issue of the application crashing, but all the old data is damaged.

You can convert the collation of tables with a single query like :
ALTER TABLE <yourtable> CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Related

When restoring a database on Cockroach why is there a difference in number of index entries/rows restored?

When backing up a database in CockroachDB the number of rows/index_entries/bytes written are included in the output:
job_id | status | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+---------+---------------+--------------
903515943941406094 | succeeded | 1 | 100000 | 200000 | 4194304
When restoring the same backup the number of the same metrics are reported:
job_id | status | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+---------+---------------+--------------
803515943941406094 | succeeded | 1 | 99999 | 199999 | 4194200
What causes the difference between the two and is all my data restored?
A couple of things impact the metrics you are observing, the backup phase may make a copy of specific system tables for metadata, which are not directly restored from the backup image. The configurations from these system tables are applied to your restored database, but they don't count toward the reported metrics for rows / index_entries.
So a relatively small delta between the two values is not unusual in full/backup restore scenarios and not out of the ordinary.

Rendering this pandoc table

+---------------+---------------+--------------------+
| Right | Left | Centered |
+==============:+:==============+:==================:+
| Bananas | $1.34 | built-in wrapper |
+---------------+---------------+--------------------+
I used the standard command
pandoc test.txt -o test.pdf
But I get the above unrendered text. Am I doing something wrong here?
I am new to pandoc tables. Are grid tables fundamentally different than other tables supported by pandoc markdown?
I managed to render this adapted version in stackedit.io
| Right | Left | Centered |
|-:|:-|:-:|
| Bananas | $1.34 | built-in wrapper |
And your version in pandoc/try, which did work as expected. Try specifying the input and output format, and also try a different output format to see if that works.

What is the difference between Operational and Config in YANG?

What is the difference between Operational and Config in YANG model? Is it a correct way to supporting GET,PUT,POST and DELETE interfaces both in Operational and Config ?
Config is what represents configuration data, usually what will be writable via the northbound agents (CLI, Netconf, Web, etc.), it is also what will be retrieved in a get-config Netconf operation.
Operational data is status data, data that is not writable via the northbound agents, it will come from a data provider application.
A web client should only be able to do a GET operation on operational data. Because it doesn't make sense to allow a client to change the information about a status.
For config data it makes sense to have all the operations.
NETCONF separates configuration and state (or operational) data:
The information that can be retrieved from a running system is separated into two classes, configuration data and state data. Configuration data is the set of writable data that is required to transform a system from its initial default state into its current state. State data is the additional data on a system that is not configuration data such as read-only status information and collected statistics.
RESTCONF works as NETCONF, but on HTTP: it does map CRUD verbs onto NETCONF operations:
+----------+-------------------------------------------------------+
| RESTCONF | NETCONF |
+----------+-------------------------------------------------------+
| OPTIONS | none |
| | |
| HEAD | <get-config>, <get> |
| | |
| GET | <get-config>, <get> |
| | |
| POST | <edit-config> (nc:operation="create") |
| | |
| POST | invoke an RPC operation |
| | |
| PUT | <copy-config> (PUT on datastore) |
| | |
| PUT | <edit-config> (nc:operation="create/replace") |
| | |
| PATCH | <edit-config> (nc:operation depends on PATCH content) |
| | |
| DELETE | <edit-config> (nc:operation="delete") |
+----------+-------------------------------------------------------+
On supporting GET,PUT,POST and DELETE, if you are reffering to http methods here, you should probably follow RestConf

Flyway 4.0.3., spring boot 1.4 and baseline

I set up boot with flyway:
flyway.baseline-version=2.7.0
flyway.baseline-on-migrate=true
I have 2 sql files in db/migrate: V__2.7.0_base.sql and V__2.8.0_jobs.sql.
When I start the app it fails, because it tries to exec V__2.7.0_base.sql again...
mvn:flyway:info
+---------+-----------------------+---------------------+---------+
| Version | Description | Installed on | State |
+---------+-----------------------+---------------------+---------+
| 2.7.0 | << Flyway Baseline >> | 2016-08-09 12:25:04 | Baselin |
| | 2.7.0 base | | Pending |
| | 2.8.0 jobs | | Pending |
+---------+-----------------------+---------------------+---------+
Why is 2.7.0 base pending? I set it as baseline - so it should be ignored?
Thank you
OK, looking into the sourcecode I found out, that the file naming schema must be like V2_7_0__base.sql.
No idea why there are so many examples out there with naming schema like V__2.7.0_base.sql...

Which encoding opens CSV files correctly with Excel on both Mac and Windows?

We have a web app that exports CSV files containing foreign characters with UTF-8, no BOM. Both Windows and Mac users get garbage characters in Excel. I tried converting to UTF-8 with BOM; Excel/Win is fine with it, Excel/Mac shows gibberish. I'm using Excel 2003/Win, Excel 2011/Mac.
Here's all the encodings I tried:
Encoding BOM Win Mac
-------- --- ---------------------------- ------------
utf-8 -- scrambled scrambled
utf-8 BOM WORKS scrambled
utf-16 -- file not recognized file not recognized
utf-16 BOM file not recognized Chinese gibberish
utf-16LE -- file not recognized file not recognized
utf-16LE BOM characters OK, same as Win
row data all in first field
The best one is UTF-16LE with BOM, but the CSV is not recognized as such. The field separator is comma, but semicolon doesn't change things.
Is there any encoding that works in both worlds?
Excel Encodings
I found the WINDOWS-1252 encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.
Depending on your system and the tools you use, this encoding could also be named CP1252, ANSI, Windows (ANSI), MS-ANSI or just Windows, among other variations.
This encoding is a superset of ISO-8859-1 (aka LATIN1 and others), so you can fallback to ISO-8859-1 if you cannot use WINDOWS-1252 for some reason. Be advised that ISO-8859-1 is missing some characters from WINDOWS-1252 as shown here:
| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name | Unicode Range |
| € | 128 | 8364 | 0x80 | U+20AC | € | euro sign | Currency Symbols |
| ‚ | 130 | 8218 | 0x82 | U+201A | ‚ | single low-9 quotation mark | General Punctuation |
| ƒ | 131 | 402 | 0x83 | U+0192 | ƒ | Latin small letter f with hook | Latin Extended-B |
| „ | 132 | 8222 | 0x84 | U+201E | „ | double low-9 quotation mark | General Punctuation |
| … | 133 | 8230 | 0x85 | U+2026 | … | horizontal ellipsis | General Punctuation |
| † | 134 | 8224 | 0x86 | U+2020 | † | dagger | General Punctuation |
| ‡ | 135 | 8225 | 0x87 | U+2021 | ‡ | double dagger | General Punctuation |
| ˆ | 136 | 710 | 0x88 | U+02C6 | ˆ | modifier letter circumflex accent | Spacing Modifier Letters |
| ‰ | 137 | 8240 | 0x89 | U+2030 | ‰ | per mille sign | General Punctuation |
| Š | 138 | 352 | 0x8A | U+0160 | Š | Latin capital letter S with caron | Latin Extended-A |
| ‹ | 139 | 8249 | 0x8B | U+2039 | ‹ | single left-pointing angle quotation mark | General Punctuation |
| Œ | 140 | 338 | 0x8C | U+0152 | Œ | Latin capital ligature OE | Latin Extended-A |
| Ž | 142 | 381 | 0x8E | U+017D | | Latin capital letter Z with caron | Latin Extended-A |
| ‘ | 145 | 8216 | 0x91 | U+2018 | ‘ | left single quotation mark | General Punctuation |
| ’ | 146 | 8217 | 0x92 | U+2019 | ’ | right single quotation mark | General Punctuation |
| “ | 147 | 8220 | 0x93 | U+201C | “ | left double quotation mark | General Punctuation |
| ” | 148 | 8221 | 0x94 | U+201D | ” | right double quotation mark | General Punctuation |
| • | 149 | 8226 | 0x95 | U+2022 | • | bullet | General Punctuation |
| – | 150 | 8211 | 0x96 | U+2013 | – | en dash | General Punctuation |
| — | 151 | 8212 | 0x97 | U+2014 | — | em dash | General Punctuation |
| ˜ | 152 | 732 | 0x98 | U+02DC | ˜ | small tilde | Spacing Modifier Letters |
| ™ | 153 | 8482 | 0x99 | U+2122 | ™ | trade mark sign | Letterlike Symbols |
| š | 154 | 353 | 0x9A | U+0161 | š | Latin small letter s with caron | Latin Extended-A |
| › | 155 | 8250 | 0x9B | U+203A | › | single right-pointing angle quotation mark | General Punctuation |
| œ | 156 | 339 | 0x9C | U+0153 | œ | Latin small ligature oe | Latin Extended-A |
| ž | 158 | 382 | 0x9E | U+017E | | Latin small letter z with caron | Latin Extended-A |
| Ÿ | 159 | 376 | 0x9F | U+0178 | Ÿ | Latin capital letter Y with diaeresis | Latin Extended-A |
Note that the euro sign is missing.
This table can be found at Alan Wood.
Conversion
Conversion is done differently in every tool and language. However, suppose you have a file query_result.csv which you know is UTF-8 encoded. Convert it to WINDOWS-1252 using iconv:
iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv
For UTF-16LE with BOM if you use tab characters as your delimiters instead of commas Excel will recognise the fields. The reason it works is that Excel actually ends up using its Unicode *.txt parser.
Caveat: If the file is edited in Excel and saved, it will be saved as tab-delimited ASCII. The problem now is that when you re-open the file Excel assumes it's real CSV (with commas), sees that it's not Unicode, so parses it as comma-delimited - and hence will make a hash of it!
Update: The above caveat doesn't appear to be happening for me today in Excel 2010 (Windows) at least, although there does appear to be a difference in saving behaviour if:
you edit and quit Excel (tries to save as 'Unicode *.txt')
compared to:
editing and closing just the file (works as expected).
The lowdown is: There is no solution. Excel 2011/Mac cannot correctly interpret a CSV file containing umlauts and diacritical marks no matter what encoding or hoop jumping you do. I'd be glad to hear someone tell me different!
You only have tried comma-separated and semicolon-separated CSV. If you had tried tab-separated CSV (also called TSV) you would have found the answer:
UTF-16LE with BOM (byte order mark), tab-separated
But: In a comment you mention that TSV is not an option for you (I haven't been able to find this requirement in your question though). That's a pity. It often means that you allow manual editing of TSV files, which probably is not a good idea. Visual checking of TSV files is not a problem. Furthermore editors can be set to display a special character to mark tabs.
And yes, I tried this out on Windows and Mac.
Here's the clincher on importing utf8-encoded CSV into Excel 2011 for Mac: Microsoft says: "Excel for Mac does not currently support UTF-8." Excel for Mac 2011 and UTF-8
Yay, way to go MS!
The best workaround for reading CSV files with UTF-8 on Mac is to convert them into XLSX format. I have found a script made by Konrad Foerstner, which I have improved little bit by adding support for different delimiter characters.
Download the script from Github https://github.com/brablc/clit/blob/master/csv2xlsx.py. In order to run it you will need to install a python module openpyxl for Excel file manipulation: sudo easy_install openpyxl.
In my case this worked (Mac, Excel 2011, both Cyrillic and Latin characters with Czech diacritics):
Charset UTF-16LE (simply UTF-16 was not enough)
BOM "\xFF\xFE"
\t (tab) as separator
Don't forget to encode also separator and CRLFs :-)
Use iconv instead of mb_convert_encoding
It seems to my case that Excel 2011 for Mac OS is not using Encoding.GetEncoding("10000") as i thought and wasted 2 days with but the same iso as on Microsoft OS.
The best proof for this is to make a file in Excel 2011 for MAC with special chars, save it as CSV and then open it in MAC text editor and the chars are scrambled.
For me this approach worked - meaning that csv export on Excel 2011 on MAC OS has special western europeean chars inside:
Encoding isoMacOS = Encoding.GetEncoding("iso-8859-1");
Encoding defaultEncoding = Encoding.Default;
// Convert the string into a byte array.
byte[] defaultEncodingBytes = defaultEncoding.GetBytes(exportText);
// Perform the conversion from one encoding to the other.
byte[] ansiBytes = Encoding.Convert(defaultEncoding, isoMacOS, defaultEncodingBytes);
decodedString = isoMacOS.GetString(ansiBytes);
UTF-8 with no BOM currently works for me in Excel Mac 2011 14.3.2.
UTF-8 + BOM kind of works, but BOM rendered as gibberish.
UTF-16 works if you Import the file and complete the wizard, but not if you just double-click it.
The following worked for me on Excel for Mac 2011 and Windows Excel 2002:
Using iconv on Mac, convert the file to UTF-16 Little-Endian + name it *.txt (the .txt extension forces Excel to run the Text Import Wizard):
iconv -f UTF-8 -t UTF-16LE filename.csv >filename_UTF-16LE.csv.txt
Open the file in Excel and in the Text Import Wizard choose:
Step 1: File origin: ignore it, it doesn't matter what you choose
Step 2: select proper values for Delimiters and Text qualifier
Step 3: if necessary, select column formats
PS The UTF-16LE created by iconv has BOM bytes FF FE in the beginning.
PPS My original csv file was created on a Windows 7 computer, in UTF-8 format (with the BOM bytes EF BB BF in the beginning) and used CRLF line breaks. Comma was used as field delimiter and single quote as text qualifier. It contained ASCII letters plus different latin letters with tildes, umlaut etc, plus some cyrillic. All displayed properly in both Excel for Win and Mac.
PPPS Exact software versions:
* Mac OS X 10.6.8
* Excel for Mac 2011 v.14.1.3
* Windows Server 2003 SP2
* Windows Excel 2002 v.10.2701.2625
On my Mac OS, Text Wrangler identified a CSV file created with Excel as having "Western" encoding.
After some googling I have made this small script (I am not sure about Windows availability, maybe with Cygwin?):
$ cat /usr/local/bin/utf8.sh
#!/bin/bash
INPUTFILE="$1"
iconv -f macroman -c -t UTF-8 $INPUTFILE |tr '\r' '\n' >/tmp/file.$$.csv
mv $INPUTFILE ms_trash
mv /tmp/file.$$.csv $INPUTFILE
In my case adding Preamble to file solved my problem:
var data = Encoding.UTF8.GetBytes(csv);
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(new MemoryStream(result), "application/octet-stream", "data.csv");
instead of csv, trying outputting html with an XLS extension and "application/excel" mime-type. I know this will work in Windows, but can't speak for MacOS
This works for me
Open the file in BBEdit or TextWrangler*.
Set the file as Unicode (UTF-16 Little-Endian) (Line Endings can be Unix or Windows). Save!
In Excel: Data > Get External Data > Import Text File...
Now the key point, choose MacIntosh as File Origin (it should be the first choice).
This is using Excel 2011 (version 14.4.2)
*There's a little dropdown at the bottom of the window
Solve this using java ( UTF-16LE with BOM ):
String csvReportStr = getCsvReport();
byte[] data = Charset.forName("UTF-16LE").encode(csvReportStr)
.put(0, (byte) 0xFF)
.put(1, (byte) 0xFE)
.array();
Note that CSV file should use TAB as separator. You can read the CSV file both on windows and MAC OS X.
Refer to: How do I encode/decode UTF-16LE byte arrays with a BOM?

Resources