Qt4 - QDir::entryList() doesn't return files/dirs with invalid encoding

Qt4 - QDir::entryList() doesn't return files/dirs with invalid encoding - utf-8

My Qt4-based application (http://qcomicbook.linux-projects.net) has a problem with opening files located in directories with invalid encoding (most likely koi-8 encoding, or some other Asian encoding). The problem occurs in the following piece of code:
QDir dir(path);
dir.setSorting(flags);
dir.setFilter(QDir::AllDirs|QDir::Files);
const QStringList files = dir.entryList();
foreach (QString f, files) {
...
}
If path includes dirs/files with invalid encoding, then dir.entryList() just filters them out. The problem is also indicated by QFileDialog::getExistingDirectory dialog which displays "invalid encoding" warning along file dir names.
Is there any workaround for this, ideally transparent to the end user?

Related

Disable encoding checking in java gradle project

I want to migrate one of our java projects from ant to gradle. This project has got a lot of source code wrote by few programmers. The problem is that some of files are encoded in ANSi and some in UTF-8 (this generates compile errors). I know that I can set encoding using compileJava.options.encoding = 'UTF-8' but this will not work (not all files are encoded in UTF-8). Is it possible to disable encoding checking (I don't want to change encoding of all files)?

This is not an issue with Gradle but with javac. However, you can solve this issue running a one-time groovy script in your gradle build as described below.
Normally you'd only need to add following line to your build.gradle file:
compileJava.options.encoding = 'UTF-8'
However, some text editors when saving files to UTF-8 will generate a byte order mark (BOM) header at the beginning of the text files.
And javac does not understand the BOM, not even when you compile with encoding="UTF-8" option so you're probably getting an error such as this:
> javac -encoding UTF8 Test.java
Test.java:1: error: illegal character: \65279
?class Test {
You need to strip the BOM from your source files or convert your source file to another encoding. Notepad++ for example can convert the file encoding from one to another.
For lots of source files you can easily write a simple task in Groovy/Gradle to open your source text files and convert the UTF-8 removing the BOM prefix from the first line if found.
Add this to your build.gradle and run gradle convertSource
task convertSource << {
// convert sources files in source set to normalized text format
sourceSets.main.java.each { file ->
// read first "raw" line via BufferedReader
def r = new BufferedReader(new FileReader(file))
String s = r.readLine()
r.close()
// get entire file normalized
String text = file.text
// get first "normalized" line
String normalizedLine = new StringReader(text).readLine()
if (s != normalizedLine) {
println "rename: $file"
File target = new File(file.getParentFile(), file.getName() + '.bak')
if (!target.exists()) {
if (file.renameTo(target))
file.setText(text)
else
println "failed to rename or target already exists"
}
}
}
} // end task
The convertSource task will simply enumerate all of the source files, read first "raw" line from each source file then read the normalized text lines and compare first lines. If the first line is different then it would output a new target file with the normalized text and save backup of the original source. Only need to run convertSource task one-time after which you can remove original source files and the compile should work without getting encoding errors.

Windows Internet Shortcuts and unicode characters in URL

I have a process that creates Windows internet shortcut files (.url). The files are encoded in UTF-8. The files contain an [InternetShortcut] section, where a URL= is specified. In this case, these are file:/// protocol URLs, which allow people to open paths on their LAN. The URLs are all UNC paths.
Normally the process works fine. But when a UNC path contains Unicode characters, such as the "í" from the code sample below, Windows is unable to "find" the URL when an end user tries to open the internet shortcut from Windows Explorer:
A sample file follows:
[InternetShortcut]
URL=file:///\\lt-splourde\d$\POC\Montería Test\
IconIndex=1
When I open the sample .url file above with a text editor, I see the path with the proper Unicode characters. But when I try to open the file from Windows Explorer, in order to access the path, Windows reports that it is unable to access the path, and it seems to mangle the Unicode characters.
The source code that creates these shortcuts follows:
private void CreateShortcutAsUrl(string uncRootPath, string name, string path, int projectId)
{
path = path + (path.EndsWith(#"\") ? "" : #"\");
using (StreamWriter writer = new StreamWriter(
String.Format(#"{0}\{1}\{2}.url", uncRootPath,
ShortcutsDirectory, new FileServerController().SanitizeNameForDirectory(name)),
false, Encoding.UTF8))
{
writer.WriteLine(#"[InternetShortcut]");
writer.WriteLine(#"URL=file:///" + path);
writer.Flush();
}
}
Does anyone know of a solution for this issue?
Thanks!
(I had posted this on superuser originally, but I feel like the content is more programmer oriented)

Try the .NET equivalent of InternetCanonicalizeUrl, which is System.Uri.EscapeUriString, so something like this (assuming your URI is in szOriginalString
String szEscapedString = System.Uri.EscapeUriString(szOriginalString);
Then write szEscapedString as the URI instead of the original.

Command Prompt error since i am using a generic path to open an excel file

Command Prompt its not working since i am using a generic path to open a excel file. Here is the error message:
T:\PointOfSale\Projects\Automated Testing\TASWeb\TP\TP_Branch>ruby -rubygems Tes
tTP_UK.rb
TestTP_UK.rb:19:in 'method_missing': (in OLE method `Open': )(WIN32OLERuntimeEr
ror)
OLE error code:800A03EC in Microsoft Excel
'./../../../MasterFile.xls' could not be found. Check the spelling of the
file name, and verify that the file location is correct.
If you are trying to open the file from your list of most recently used files, m
ake sure that the file has not been renamed, moved, or deleted.
HRESULT error code:0x80020009
Exception occurred.
from TestTP_UK.rb:19:in `'
enter code here'
Generic path code
excel = WIN32OLE::new("excel.Application")
path = "#{File.dirname(__FILE__)}/../../../MasterFile.xls"
workbook = excel.Workbooks.Open(path)
worksheet = workbook.WorkSheets(1) # Get first workbook
site = worksheet.Range('A2').Value # Get the value at cell in worksheet.
workbook.Close
excel.Quit
Any Ideas

I believe you need to use an absolute path rather than a relative path when opening the file:
path = File.expand_path("../../../../MasterFile.xls", __FILE__)
Note that you will also need an additional '..' when using expand_path, since the first '..' is going back from the file.

Unknown reason of an error while debugging an Opencv project using opencv2 functions

I have openCV 2.3 and I am using Visual Studio 2010.
...
VideoCapture cap;
cap.open("Video.avi");
if( !cap.isOpened() )
{
puts("***Could not initialize capturing...***\n");
system("Pause");
return 0;
} ...
This is a code snippet of the while program.
I added a system command in order to hold the output window. I got no errors while building the project but when I began debugging, the output window had this output :
warning: Error opening file (../../modules/highgui/src/cap_ffmpeg_impl.hpp:477)
***Could not initialize capturing...***
Press any key to continue . . .
I checked this directory, the file is available but then why is it that it doesn't open ?
I even have the opencv_ffmpeg.dll in the bin folder with its path added to System Paths.
Still I get this same error ....
I even checked first 3 pages of google search I did but could not find an answer.
So please help!
The error which I mentioned is because, there has been an error in opening the .avi file ...
This is the part of code in cap_ffmpeg_impl.hpp -
int err = av_open_input_file(&ic, _filename, NULL, 0, NULL);
if (err < 0) {
CV_WARN("Error opening file"); //Error part
goto exit_func;
}
This file is available at D:\OpenCV2.3\opencv\modules\highgui\src. When I make any changes in this file, they do not reflect on the output window and when I removed this file, even then it did not give any error!:O
I am not able to understand what is happening ....??

The error message is perhaps a bit confusing. You should read it like this:
<severity>: <message> (<where the message originated from>)
The message does not tell you, that your program is unable to open the header file, it just informs you that the message was created in that header file.
As you observed correctly, the problem is, that your program cannot find the .avi file. Try supplying the absolute path to that video file:
cap.open("C:/absolute/path/to/Video.avi");

Get encoding of a file in Windows

This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?

Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click "Save As...".
It'll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicdoe

If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:
file *
This will give you the encoding details of all the files in that folder.

The (Linux) command-line tool 'file' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it's located in C:\Program Files\git\usr\bin.
Example:
C:\Users\SH\Downloads\SquareRoot>file *
_UpgradeReport_Files; directory
Debug; directory
duration.h; ASCII C++ program text, with CRLF line terminators
ipch; directory
main.cpp; ASCII C program text, with CRLF line terminators
Precision.txt; ASCII text, with CRLF line terminators
Release; directory
Speed.txt; ASCII text, with CRLF line terminators
SquareRoot.sdf; data
SquareRoot.sln; UTF-8 Unicode (with BOM) text, with CRLF line terminators
SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary info
SquareRoot.vcproj; XML document text
SquareRoot.vcxproj; XML document text
SquareRoot.vcxproj.filters; XML document text
SquareRoot.vcxproj.user; XML document text
squarerootmethods.h; ASCII C program text, with CRLF line terminators
UpgradeLog.XML; XML document text
C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
_UpgradeReport_Files; binary
Debug; binary
duration.h; us-ascii
ipch; binary
main.cpp; us-ascii
Precision.txt; us-ascii
Release; binary
Speed.txt; us-ascii
SquareRoot.sdf; binary
SquareRoot.sln; utf-8
SquareRoot.sln.docstates.suo; binary
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary infobinary
SquareRoot.vcproj; us-ascii
SquareRoot.vcxproj; utf-8
SquareRoot.vcxproj.filters; utf-8
SquareRoot.vcxproj.user; utf-8
squarerootmethods.h; us-ascii
UpgradeLog.XML; us-ascii

Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker
EXE can be found here

Install git ( on Windows you have to use git bash console). Type:
file --mime-encoding *
for all files in the current directory , or
file --mime-encoding */*
for the files in all subdirectories

Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).
Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by #Sybren, and I show how to do that via PowerShell in a later answer.
# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return 'utf8' }
switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
'^efbbbf' { return 'utf8' }
'^2b2f76' { return 'utf7' }
'^fffe' { return 'unicode' }
'^feff' { return 'bigendianunicode' }
'^0000feff' { return 'utf32' }
default { return 'ascii' }
}
}
dir ~\Documents\WindowsPowershell -File |
select Name,#{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} |
ft -AutoSize
Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)

A simple solution might be opening the file in Firefox.
Drag and drop the file into firefox
Press Ctrl+I to open the page info
and the text encoding will appear on the "Page Info" window.
Note: If the file is not in txt format, just rename it to txt and try again.
P.S. For more info see this article.

I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use #Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your profile.ps1:
$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe
And used like: file.exe --mime-encoding *. You must include .exe in the command for PS alias to work.
But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to ~\Documents\WindowsPowerShell. It's safe to use on a computer without git, but will write warnings when git is not found.
The .exe in the command is also how I use C:\WINDOWS\system32\where.exe from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.

you can simply check that by opening your git bash on the file location then running the command file -i file_name
example
user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8

Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM,
UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document.
For all other encodings, you have to trust heuristics based on statistics.
EDIT:
A powershell version of a C# answer from: Effective way to find any file's Encoding. Only works with signatures (boms).
# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
begin {
# set .net current directoy
[Environment]::CurrentDirectory = (pwd).path
}
process {
$reader = [System.IO.StreamReader]::new($filename,
[System.Text.Encoding]::default,$true)
$peek = $reader.Peek()
$encoding = $reader.currentencoding
$reader.close()
[pscustomobject]#{Name=split-path $filename -leaf
BodyName=$encoding.BodyName
EncodingName=$encoding.EncodingName}
}
.\get-encoding chinese8.txt
Name BodyName EncodingName
---- -------- ------------
chinese8.txt utf-8 Unicode (UTF-8)
get-childitem -file | .\get-encoding

Looking for a Node.js/npm solution? Try encoding-checker:
npm install -g encoding-checker
Usage
Usage: encoding-checker [-p pattern] [-i encoding] [-v]
Options:
--help Show help [boolean]
--version Show version number [boolean]
--pattern, -p, -d [default: "*"]
--ignore-encoding, -i [default: ""]
--verbose, -v [default: false]
Examples
Get encoding of all files in current directory:
encoding-checker
Return encoding of all md files in current directory:
encoding-checker -p "*.md"
Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):
encoding-checker -p "**"
For more examples refer to the npm docu or the official repository.

Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."
The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).

The only way that I have found to do this is VIM or Notepad++.

EncodingChecker
File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.
File Encoding Checker requires .NET 4 or above to run.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Qt4 - QDir::entryList() doesn't return files/dirs with invalid encoding - utf-8

Related

Disable encoding checking in java gradle project

Windows Internet Shortcuts and unicode characters in URL

Command Prompt error since i am using a generic path to open an excel file

Unknown reason of an error while debugging an Opencv project using opencv2 functions

Get encoding of a file in Windows

Categories

Resources