Why is the output file from Biopython not found? - bioinformatics

I work with a Mac. I have been trying to make a multiple sequence alignment in Python using Muscle. This is the code I have been running:
from Bio.Align.Applications import MuscleCommandline
cline = MuscleCommandline(input="testunaligned.fasta", out="testunaligned.aln", clwstrict=True)
print(cline)
from Bio import AlignIO
align = AlignIO.read(open("testunaligned.aln"), "clustal")
print(align)
I keep getting the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'testunaligned.aln'
Does anyone know how I could fix this? I am very new to Python and computer science in general, and I am totally at a loss. Thanks!

cline in your code is an instance of MuscleCommandline object that you initialized with all the parameters. After the initialization, this instance can run muscle, but it will only do that if you call it. That means you have to invoke cline()
When you simply print the cline object, it will return a string that corresponds to the command you can manually run on the command line to get the same result as when you invoke cline().
And here the working code:
from Bio.Align.Applications import MuscleCommandline
cline = MuscleCommandline(
input="testunaligned.fasta",
out="testunaligned.aln",
clwstrict=True
)
print(cline)
cline() # this is where mucle runs
from Bio import AlignIO
align = AlignIO.read(open("testunaligned.aln"), "clustal")
print(align)

Related

Viewing/Editing YAML content in a window using PySimpleGUI

I am learning python and PySimpleGUI seems like a good start for exercises. With this self-exercise I'm working on, I would like to view and edit a YAML file. So far, I am able to create a prompt to browse and select a yaml. I am able to print the data in the console. But my next step is to view the yaml view PySimpleGUI window. I will work on how to edit the yaml content once I can figure out how to display it.
Here is my code:
import PySimpleGUI as sg
import yaml
from yaml.loader import SafeLoader
import os
working_directory = os.getcwd()
layout = [
[sg.Text("Shoose your yaml file:")],
[sg.InputText(key="-FILE_PATH-"),
sg.FileBrowse(initial_folder=working_directory, file_types=[("YAML Files","*.yaml")])],
[sg.Button("Submit"), sg.Exit()],
[sg.Multiline(size=(30,5), key= data)]
]
window = sg.Window("File Loader", layout).Finalize()
while True:
event, values = window.read()
if event in (sg.WIN_CLOSED, 'Exit'):
break
elif event == "Submit":
file_path = values["-FILE_PATH-"];
with open(file_path) as f:
data = yaml.load(f, Loader=SafeLoader)
# print(data)
print(values[data])
window.close()
Running this code i get this error:
Traceback (most recent call last):
File "yaml_gui.py", line 13, in <module>
[sg.Multiline(size=(30,5), key= data)]
NameError: name 'data' is not defined
I'm stuck because I am not sure why it is returning this error. The code works if I decide to just print the results in my terminal by using print(data). But when I use print(values[data]), it doesn't work.

Fiona Driver Error when downloading files via URL

This is simple to test if you get the error on your side:
import geopandas as gpd
gdf = gpd.read_file('https://hepgis.fhwa.dot.gov/fhwagis/AltFuels_Rounds1-5_2021-05-25.zip')
File "fiona/ogrext.pyx", line 540, in fiona.ogrext.Session.start
File "fiona/_shim.pyx", line 90, in fiona._shim.gdal_open_vector
fiona.errors.DriverError: '/vsimem/6101ab5f23764c15b5fe47aa52a049d6' not recognized as a supported file format.
Interestingly, I have received this error for other URLs recently and thought there was something wrong with the URL. But, now I suspect that something else is going on since it is happening with more than one URL. On the other hand, some URLs don't have this issue. One other interesting thing, this error only occurs sometimes. For instance, if I rerun that command it will work maybe 1 out of 20 times.
My Fiona version:
fiona 1.8.20 py39hea8b339_1 conda-forge
Any help would be much appreciated.
Investigating, the URL does not return a zip file. See code below, it actually returns a HTML input page...
import geopandas as gpd
import requests, io
from pathlib import Path
from zipfile import ZipFile, BadZipFile
import urllib
import fiona
url = "https://hepgis.fhwa.dot.gov/fhwagis/AltFuels_Rounds1-5_2021-05-25.zip"
try:
gdf = gpd.read_file(url)
except Exception:
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
try:
zfile = ZipFile(f)
zfile.extractall(f.stem)
except BadZipFile:
with open(f) as fh:
print(fh.read())

Pydotplus, Graphviz error: Program terminated with status: 1. stderr follows: 'C:\Users\En' is not recognized as an internal or external command

from pydotplus import graph_from_dot_data
from sklearn.tree import export_graphviz
from IPython.display import Image
dot_data = export_graphviz(tree,filled=True,rounded=True,class_names=['Setosa','Versicolor','Virginica'],feature_names=['petal length','petal width'],out_file=None)
graph = graph_from_dot_data(dot_data)
Image(graph.create_png())
Program terminated with status:
1. stderr follows: 'C:\Users\En' is not recognized as an internal or external command,
operable program or batch file.
it seems that it split my username into half.How do i overcome this?
I have a very similar example that I'm trying out, it's based on a ML how-to book which is working with a Taiwan Credit Card dataset predicting default risk. My setup is as follows:
from six import StringIO
from sklearn.tree import export_graphviz
from IPython.display import Image
import pydotplus
Then creating the decision tree plot is done in this way:
dot_data = StringIO()
export_graphviz(decision_tree=class_tree,
out_file=dot_data,
filled=True,
rounded=True,
feature_names = X_train.columns,
class_names = ['pay','default'],
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
I think it's all coming from the out_file=dot_data argument but cannot figure out where the file path is created and stored as print(dot_data.getvalue()) did not show any pathname.
In my research I came across sklearn.plot_tree() which seems to do everything that the graphviz does. So I took the above exporet_graphviz arguments and were matching arguments were in the .plot_tree method I added them.
I ended up with the following which created the same image as was found in the text:
from sklearn import tree
plt.figure(figsize=(20, 10))
tree.plot_tree(class_tree,
filled=True, rounded=True,
feature_names = X_train.columns,
class_names = ['pay','default'],
fontsize=12)
plt.show()

Python Time based rotating file handler for logging

While using time based rotating file handler.Getting error
os.rename('logthred.log', dfn)
WindowsError: [Error 32] The process cannot access the file because it
is being used by another process
config :
[loggers]
keys=root
[logger_root]
level=INFO
handlers=timedRotatingFileHandler
[formatters]
keys=timedRotatingFormatter
[formatter_timedRotatingFormatter]
format = %(asctime)s %(levelname)s %(name)s.%(functionname)s:%(lineno)d %
(output)s
datefmt=%y-%m-%d %H:%M:%S
[handlers]
keys=timedRotatingFileHandler
[handler_timedRotatingFileHandler]
class=handlers.TimedRotatingFileHandler
level=INFO
formatter=timedRotatingFormatter
args=('D:\\log.out', 'M', 2, 0, None, False, False)
Want to achieve time based rotating file handler and multiple process can write same log file.In python,I didn't find any thing which can help to resolve this issue.
I have read discussion on this issue (python issues).
Any suggestion which can resolve this issue.
Found the solution : Problem in Python 2.7 whenever we create child Process then file handles of parent process also inherited by child process so this is causing error. We can block this inheritance using this code.
import sys
from ctypes import windll
import msvcrt
import __builtin__
if sys.platform == 'win32':
__builtin__open = __builtin__.open
def __open_inheritance_hack(*args, **kwargs):
result = __builtin__open(*args, **kwargs)
handle = msvcrt.get_osfhandle(result.fileno())
if filename in args: # which filename handle you don't want to give to child process.
windll.kernel32.SetHandleInformation(handle, 1, 0)
return result
__builtin__.open = __open_inheritance_hack

Python error: one of the arguments is required

I'm trying to run a code from github that uses Python to classify images but I'm getting an error.
here is the code:
import argparse as ap
import cv2
import imutils
import numpy as np
import os
from sklearn.svm import LinearSVC
from sklearn.externals import joblib
from scipy.cluster.vq import *
# Get the path of the testing set
parser = ap.ArgumentParser()
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("-t", "--testingSet", help="Path to testing Set")
group.add_argument("-i", "--image", help="Path to image")
parser.add_argument('-v',"--visualize", action='store_true')
args = vars(parser.parse_args())
# Get the path of the testing image(s) and store them in a list
image_paths = []
if args["testingSet"]:
test_path = args["testingSet"]
try:
testing_names = os.listdir(test_path)
except OSError:
print "No such directory {}\nCheck if the file exists".format(test_path)
exit()
for testing_name in testing_names:
dir = os.path.join(test_path, testing_name)
class_path = imutils.imlist(dir)
image_paths+=class_path
else:
image_paths = [args["image"]]
and this is the error message I'm getting
usage: getClass.py [-h]
(- C:/Users/Lenovo/Downloads/iris/bag-of-words-master/dataset/test TESTINGSET | - C:/Users/Lenovo/Downloads/iris/bag-of-words-master/dataset/test/test_1.jpg IMAGE)
[- C:/Users/Lenovo/Downloads/iris/bag-of-words-master/dataset]
getClass.py: error: one of the arguments - C:/Users/Lenovo/Downloads/iris/bag-of-words-master/dataset/test/--testingSet - C:/Users/Lenovo/Downloads/iris/bag-of-words-master/dataset/test/test_1.jpg/--image is required
can you please help me with this? where and how should I write the file path?
This is an error your own program is issuing. The message is not about the file path but about the number of arguments. This line
group = parser.add_mutually_exclusive_group(required=True)
says that only one of your command-line arguments (-t, -i) is permitted. But it appears from the error message that you are supplying both --testingSet and --image on your command line.
Since you only have 3 arguments, I have to wonder if you really need argument groups at all.
To get your command line to work, drop the mutually-exclusive group and add the arguments to the parser directly.
parser.add_argument("-t", "--testingSet", help="Path to testing Set")
parser.add_argument("-i", "--image", help="Path to image")
parser.add_argument('-v',"--visualize", action='store_true')

Resources