Windows cmd: piping python 3.5 py file results works but pyinstaller exe's leads to UnicodeEncodeError - utf-8

I am somewhat out of options here...
# -*- coding: utf-8 -*-
print(chr(246) + " " + chr(9786) + " " + chr(9787))
print("End.")
When I run the code mentioned above in my Win7 cmd window, I get the results depending on the way I invoke it:
python.exe utf8.py
-> ö ☺ ☻
python.exe utf8.py >test.txt
-> ö ☺ ☻ (in file)
utf8.exe
-> ö ☺ ☻
utf8.exe >test.txt
RuntimeWarning: sys.stdin.encoding == 'utf-8', whereas sys.stdout.encoding == 'cp1252', readline hook consumer may assume they are the same
Traceback (most recent call last):
File "Development\utf8.py", line 15, in <module>
print(chr(246) + " " + chr(9786) + " " + chr(9787))
File "C:\python35\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u263a' in position
Messing around with win_unicode_console doesn't help either. In the end, I get the same results.
PYTHONIOENCODING=utf-8
is set. But it seems, that when using PyInstaller, the parameter is ignored for stdout.encoding:
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
Output:
python.exe utf8.py > test.txt
utf-8
False
cp1252
mbcs
utf-8
utf8.exe >test.txt
cp1252
False
cp1252
mbcs
utf-8
The questions are: How does that happen? And: How can I fix that?
codecs.getwriter([something])(sys.stdout)
seems to be discouraged because it may lead to modules with broken output. Or is it possible to force that to utf-8 in case we did a check for a tty? Better: How to fix that in PyInstaller?
Thanks in advance...

Thanks to eryksun, the following workaround is working:
STDOUT_ENCODING = str(sys.stdout.encoding)
try:
PYTHONIOENCODING = str(os.environ["PYTHONIOENCODING"])
except:
PYTHONIOENCODING = False
# Remark: In case the stdout gets modified, it will only append all information
# that has been written into the pipe until that very moment.
if sys.stdout.isatty() is False:
print("Program is running in piping mode. (sys.stdout.isatty() is " + str(sys.stdout.isatty()) + ".)")
if PYTHONIOENCODING is not False:
print("PYTHONIOENCODING is set to a value. ('" + str(PYTHONIOENCODING) + "')")
if str(sys.stdout.encoding) != str(PYTHONIOENCODING):
print("PYTHONIOENCODING is differing from stdout encoding. ('" + str(PYTHONIOENCODING) + "' != '" + STDOUT_ENCODING + "'). This should normally not happen unless the PyInstaller setup is still broken. Setting hard utf-8 workaround.")
sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8', closefd=False)
print("PYTHONIOENCODING was differing from stdout encoding. ('" + str(PYTHONIOENCODING) + "' != '" + STDOUT_ENCODING + "'). This should normally not happen unless PyInstaller is still broken. Setting hard utf-8 workaround. New encoding: '" + str(PYTHONIOENCODING) + "'.", "D")
else:
print("PYTHONIOENCODING is equal to stdout encoding. ('" + str(PYTHONIOENCODING) + "' == '" + str(sys.stdout.encoding) + "'). - All good.")
else:
print("PYTHONIOENCODING is set False. ('" + str(PYTHONIOENCODING) + "'). - Nothing to do.")
else:
print("Program is running in terminal mode. (sys.stdout.isatty() is " + str(sys.stdout.isatty()) + ".) - All good.")
Trying to set up a new PyInstaller-Environment to see if that fixes it from the start next.

Related

NCBIWWW.qblast parsing xml files

I am using biopython to (attempt to) write a script that will take my downloaded Sanger sequencing results from Genewiz (multiple sequences downloaded into a single FASTA file), create a new file with the sequence trimmed to my desired length, run the trimmed sequences on BLAST, and list the species of the top hit. As I am pretty new to bioinformatics and programming I am working through each of these parts step-by-step using the biopython cookbook as a framework. I have managed to get my trimmed sequences in a new file and BLAST to run (is it always that slow?) but am getting stuck now on parsing. Any help would be appreciated! I will edit/post more questions as I work through this program, but one step at a time.
Code so far:
import os
os.chdir('C:\\Users\\asmit\\Desktop\\Sequences\\Cytb')
print("Current folder: " + os.getcwd())
import Bio
from Bio import SeqIO
import glob
for filename in glob.iglob('*download.fasta'):
name = str(filename)
newname = str(filename.strip("_download.fasta") + "_trim.fasta")
print("File procesing: " + name)
with open(newname, "w") as f:
for seq_record in SeqIO.parse(open(name, mode = 'r'), "fasta"):
f.write(str(">" + seq_record.id + "\n"))
x = 31
while x < 411:
f.write(str(seq_record.seq[x:x+50])+ "\n")
x = x + 50
print("All files trimmed.")
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
for filename in glob.iglob('*trim.fasta'):
name = str(filename)
newname = str(filename.strip(".fasta") + ".xml")
print("Running BLAST on " + name)
record = open(name).read()
result_handle = NCBIWWW.qblast("blastn", "nt", record, hitlist_size=1)
print("BLAST on " + name + " is complete.")
with open(newname, "w") as out_handle:
out_handle.write(result_handle.read())
out_handle.close()
result_handle.close()
print(newname + " is ready for parsing.")
print("All files BLASTed")

ODI - Call SQLLDR via Jython on Windows Shell

I'm testing some interfaces with Oracle Data Integrator 11g on Windows 7.
All the interfaces use the LKM MSSQL to Oracle (BCP/SQLLDR), while running them I got an error on the "Call SQLLDR via Jython" command. After some invesetigation I found that the root of the problem was the following line of code:
exitCode = os.system(sqlldr + " control=" + tempSessionFilePrefix + ".ctl log=" + tempSessionFilePrefix + ".log " + "userid=" + "<% out.print(odiRef.getInfo("DEST_USER_NAME")); %>" + "/" + "<% out.print(odiRef.getInfo("DEST_PASS")); %>" + tnsnameOption + " > " + tempSessionFilePrefix +".out" );
It should run on the Windows Shell a string in the form of:
sqlldr control=control_file.ctl log=log_file.log userid=ODI_STAGE/ODI_STAGE > shell_output.out
I did run the string generated directly on the command prompt and it worked without any problem.
So after playing a bit with the code, I couldn't make the os.system working so I replaced it with subprocess.call. I also have to remove the last part of the string where it attempts to save the ouput of the command prompt (> shell_output.out) to make the whole thing work:
exitCode = subprocess.call([sqlldr, "control=" + tempSessionFilePrefix + ".ctl", "log=" + tempSessionFilePrefix + ".log", "userid=" + "<% out.print(odiRef.getInfo("DEST_USER_NAME")); %>" + "/" + "<% out.print(odiRef.getInfo("DEST_PASS")); %>" + tnsnameOption], shell=True);
This one works smoothly.
Regarding the shell output, I suspect that the problem is the string part that starts with the '>' charcater that is parsed as part of the arguments of SQLLDR instead of a command to the prompt.
Now, while I can live without it, I would like to ask if someone knows any simple workaround to get also the shell output.
Ok I was finally able to get also the shell output.
I edited the "Call SQLLDR via Jython" command with the following:
from __future__ import with_statement
import subprocess
...
with open(tempSessionFilePrefix + ".out", "w") as fout:
exitCode = subprocess.call([sqlldr, "control=" + tempSessionFilePrefix + ".ctl", "log=" + tempSessionFilePrefix + ".log", "userid=" + "ODI_STAGE" + "/" + "<#=snpRef.getInfo("DEST_PASS") #>" + tnsnameOption], stdout=fout, shell=True);
Now everything work as intended.

csv file encoding issue in ruby

I parsing a csv file using ruby and getting an error
invalid byte sequence in utf-8 csv
I tried with encoding option
CSV.foreach(path, {headers: true, encoding: 'windows-1251:utf-8'}) do |row|
new_row = {}
headers = []
row.each do |k,v|
headers << k
v = v.force_encoding('UTF-8') || ''
v.gsub! "\xE2\x80\x96", "-"
v.gsub! "\xE2\x80\x93", "-"
v.gsub! "\xE2\x80\x94", "-"
v.gsub! "\xE2\x80\x95", "-"
v.gsub! "\xE2\x80\x98", "'"
v.gsub! "\xE2\x80\x99", "'"
v.gsub! "\xE2\x80\x9C", "\""
v.gsub! "\xE2\x80\x9D", "\""
v.gsub! "\xE2\x80\xA6", "..."
v.gsub! "\x0D\x0A", "\n"
v.gsub! "\xC2\xA0", " "
v.gsub! "\xC2\xB0", " "
new_row[k] = v
end
output_csv.puts headers if output_csv.header_row?
output_csv.puts new_row
end
now i'm ended up with
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
The string which is raising this issue in CSV file is "G�ran"
Below is the sample input row
David Evans & Assocs www.deainc.com 13858534 jpv#deainc.com G�ran Volk 5034990383
Can anyone suggest me how to solve this issue.
That issue most likely induced by saving the file in wrong encoding. Say, you have unicode symbol “★” in your file. Saving it as ASCII or Latin1 or other 1-byte-per-symbol encoding, you loose some data.
The symbol “�” is known as replacement character. It’s used to indicate “here was unicode that was apparently lost during encoding convertion.”

QUrl containing parentheses

Our application is a 32 bit application. When it is installed in windows 7 64bit, typically it installs at “C:\Program Files (x86)”, instead of “C:\Program Files”. We are constructing a Url based on the install location and pass it around as part of a web service. We are constructing the Url like this:
ppmPath = "http://" + ipAddress + ":13007/" + folder + ".ppm" + "?filePath="
+ applicationDirPath + "/" + FIRMWARE;
QUrl ppmURL( ppmPath, QUrl::TolerantMode );
ppmPath = QString( ppmURL.toEncoded() );
The variable types and meaning are usual.
Since “applicationDirPath” for Windows 7 64 bit contains one closing bracket “)” - in the “(x86)” substring – apparently the URL is broken. If we install it to any other location, it works perfectly, even though the location has any other special character.
How to deal with “)” character in the URL, so that is is not broken?
From the documentation it doesn't look like parentheses are automatically encoded by QUrl, even in tolerant mode. If you first wrap your URL in a QString and then replace all ( characters with "%28" and all ) characters with "%29" then it should behave like you expect.
QString ppmPath = QString("http://" + ipAddress + ":13007/" + folder + ".ppm" + "?filePath="
+ applicationDirPath + "/" + FIRMWARE);
QUrl ppmURL( ppmPath, QUrl::TolerantMode );
ppmPath = QString( ppmURL.toEncoded() );
ppmPath.replace(QChar('('), "%%28");
ppmPath.replace(QChar(')'), "%%29");
I'm not 100% sure the double-% needs to be there, but I remember having trouble with that in the past. Try it both ways.
Alternatively, you could try playing with QUrl::toPercentEncoding() and skip the constructor altogether. It appears to convert parentheses.
QUrl ppmURL(QString("http://" + ipAddress + ":13007/" + folder + ".ppm"), QUrl::TolerantMode );
QString filepath = QUrl::toPercentEncoding(applicationDirPath + "/" + FIRMWARE);
ppmUrl.addEncodedQueryItem("filepath", filepath.toLocal8Bit());
ppmPath = QString( ppmURL.toEncoded() );

test if a PDF file is finished in Ruby (on Solaris/Unix)?

i have a server, that generates or copies PDF-Files to a specific folder.
i wrote a ruby script (my first ever), that regularily checks for own PDF-files and displayes them with acrobat. So simple so nice.
But now I have the Problem: how to detect the PDF is complete?
The generated PDF ends with %%EOF\n
but the copied ones are generated with some Apple-Magic (Acrobat Writer I think), that has an %%EOF near the beginning of the File, lots of binary Zeros and another %%EOF near the end with a carriage return (or line feed) and a binary zero at the end.
while true
dir = readpfad
Dir.foreach(dir) do |f|
datei = File.join(dir, f)
if File.file?(datei)
if File.stat(datei).owned?
if datei[-9..-1].upcase == "__PDF.PDF"
if File.stat(datei).size > 5
test = File.new(datei)
dummy = test.readlines
if dummy[-1][0..4] == "%%EOF"
#move the file, so it will not be shown again
cmd = "mv " + datei + " " + movepfad
system(cmd)
acro = ACROREAD + " " + File.join(movepfad, f) + "&"
system(acro)
else
puts ">>>" + dummy[-1] + "<<<"
end
end
end
end
end
end
sleep 1
end
Any help or idea?
Thanks
Peter
All the %%EOF token means is that there should be one within the last 1024 bytes of the physical end of file. The structure of PDF is such that a PDF document may have 1 or more %%EOF tokens within it (the details are in the spec).
As such, "contains %%EOF" is not equivalent to "completely copied". Really, the correct answer is that the server should signal when it's done and your code should be a client of that signal. In general, polling -- especially IO bound polling is the wrong answer to this problem.

Resources