I have the following source file (encoded in UTF-8 without BOM, displayed fine in the Source Code Editor):
#include <Windows.h>
int main()
{
MessageBoxW(0, L"Umlaute ÄÖÜ, 🙂", nullptr, 0);
return 0;
}
When running the program, the special characters (Umlaute and Emoji) are messed up in the Message Box.
However, if I save the source file manually as "UTF-8 with BOM", Visual Studio will properly convert the string to UTF-16 and when running the program, the special characters are displayed in the Message Box. But it would be annoying to convert every single file to UTF-8 with BOM. (Also, I think GCC for example does not like BOM?)
Why is Visual Studio messing up my string, if there is no BOM in the source file? The Auto-detect UTF-8 encoding without signature option is already enabled.
I tested the same source with MinGW-w64 and don't have the issue, regardless if there is a BOM or not.
Use the /utf-8 compiler switch. The MS compiler assumes a legacy ANSI encoding (Windows-1252 on US and Western European versions of Windows) if no BOM is found in the source file.
My code was working well with special chars. I could use Write-Host "é" without any issue.
And then I moved some of my functions to an other PS1 file that I "dot sourced" (using Import-Module does the same), and I got encoding errors : prénom became prénom
I don't understand anything about encoding. VS Code doesn't allow me to change the encoding of a file. It has a parameter to set the default encoding but its defaulted on UTF8 and when I set Windows1252 it changes nothing. If I use Geany to update the encoding to Windows1252 it works... until I save the file again with VS Code.
Everything was working well when all my code was in the same file. Why would creating this second .ps1 file (which I created from the Windows Explorer) be a problem?
Working on Windows 10, in french, with VS Code 1.50.
Thank you in advance
One day I just zipped a file with Chinese character called 周國賢 - 密封罩.flac, to a zip, using bandizip & designated encoding to utf-8.
And then I try to unzip it in my MacbookPro, which is (probably) using Macintosh as encoding. The file unzipped is called ©P∞ÍΩ - ±K´ ∏n.flac, which does not match the above Chinese name.
So, I try to test about the encoding, and found that Macintosh->big5 would return the Macintosh mysterious symbol into Cantonese, but have some unmatching characters: 周衰�璀� - 密封罩.flac.
I have tried another file: §˝µ· - ¨ı®ß.ape: and it actually output the correct name of the file: 王菲 - 紅豆.ape
So, here is my question: how do I unzip a file that with big5 chinese character properly and without any information loss? Or how do I zip a file correctly to prevent information loss/ incorrect characters? (edit #2: you can use bandizip to zip the file into utf-8 encoding)
BTW, The encoding converter I am using is https://r12a.github.io/apps/encodings/, which could be quite helpful for you to check for encoding. Don't forget to click change encodings shown. And I am not the owner of the encoding converter.
edit #1: I have found that the setting in bandizip is wrong...well sorry for the inconvenience caused. Nonetheless, I figure out that The Unarchiver in Mac Apple Store can unzip big5 correctly. This can be a workaround, but still I don't know how to unzip big5 characters properly WITHOUT any loss.
I just bought abbyy finereader 11 copr to rund it from another programm, but i cant find any commends to be used for finereader.exe.
so without any commands it simply openens and scans but i need to tell it where to save the document and how to name and the to close the app again, also it would be cool to have it as a background task.
While doing my OCR research project, found one. Works with FR12, didn't tested with earlier versions.
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt /quit
general command line: <open_keys/scanning> [<recognition_keys>] [<export_keys>]
<open_keys/scanning> ::= ImageFiles | /scan [SourceName] | /file [filename1 filename2], where
ImageFiles - list of files for recognition
SourceName - images source (scanner); if not specified, current is used
filename.. - list of files for recognition
<recognition_keys> ::= [/lang Language] [/optionsFile OptionsFileName], where
Language - name of language in English (russian, greek, Mixed)
OptionsFileName - path to options file
<export_key> ::= /out ExportFile | /send Target, where
ExportFile - name of file with extension to save file to
(txt, rtf, doc, docx, xml, htm(l), xls, xlsx, ppt, pptx, pdf, dbf, csv, lit);
Target - name of target app where to open
(MSWord, MSExcel, WordPro, WordPerfect, StarWriter, Mail, Clipboard, WebBrowser, Acrobat, PowerPoint)
This command opens FR ui, processes the file and then closes it (if you pass argument /quit). FineCmd.exe located in FR directory where you installed it
Hello I saw this msg very late but i m using ABBYY command line for 10years .
I prefer ABBYY 8 because makes same good job faster and does not open any GUI . It comes with FineOCR.exe:
"C:\...\ABBYY FineReader 8\FineOCR.exe" %1 /lang greek english /send MsWord
It does OCR and opens MsWord . FineOCR.txt is a simple help file.
Regarding ABBYY 11,12 (all versions) there is a FineCmd.exe . Using something like:
"c:\...\FineReader\FineCMD.exe" %1 /lang greek english /send MsWord
does what FineOCR did before (but no .txt help file)
Unfortunately, Such a professional OCR software doesn't support command line utilities. For batch processing, it offers HOT FOLDER utility inside it (from GUI). http://informationworker.ru/finereader10.en/hotfolder_and_scheduling/installandrun.htm
If you want to make OCR batch processing from your program, they sell another software, called 'ABBYY Recoginition Server'.
There also offer a comprehensive API for programmers : http://www.abbyy.com/ocr_sdk_windows/technical_specifications/developer_environment/
If your plan is to batch process them and write the contents to a Database, you can also do a programmatical trick to overcome such limitation, as I did recently in one of my projects (It is a bit offline-way but it is simple and works) : While parsing the files and putting them to your Database table from your program, move (or copy) them all into a folder while changing their filename to include an ID from your Database table. Then use 'hot folder' utility to OCR all files, by having the same filename with TXT extention (It is set from 'hot folder' settings). Then in your program parse the folder's text files, get their content as string, and parse the table IDS from filename, the rest is updating your table with that information.)
An year later, ABBYY does support command line usage: http://www.ocr4linux.com/en:documentation
Version 14 does not save the output file using:
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt /quit
or
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt
Versions 11 & 12 work well using the above commands (does save the output) but does display the GUI which can be closed using /quit.
Versions 9 & 10 don't come with FineCmd.exe or FineOCR.exe.
Version 8 can OCR and send the output to an application of choice but cannot save using /out. In my experience it does open the GUI.
which encodings do windows use when reading filenames from zip archive thru zip folders?
as far as I know
cyrillic is represented as cp866 and
central european - cp437
what about other?
Portuguese
Español (Spanish)
Français (French)
Polski (Polish)
Türkçe (Turkish)
Deutsch (German)
Italiano (Italian)
العربية (Arabic)
Farsi
ไทย (Thai)
中文 (Chinese)
日本語 (Japanese)
한국어 (Korean)
Tiếng Việt (Vietnamese)
I think first seven of this are in cp437.
Chinese may be in Big5
But I know nothing about others.
According to Apache Commons Compress Zip package documentation it's the platforms default encoding.
Windows' "compressed folder" feature doesn't recognize any flag or
extra field and creates archives using the platforms default encoding
- and expects archives to be in that encoding when reading them.