AlignIO gives 'AssertionError' when reading emboss alignment files - bioinformatics

I have been stuck on a problem for three days... searched everywhere, posted on Biostar, still waiting for EMBL to respond to emails... would make a bounty if I had more rep.
After aligning sequences with EMBOSSwin needle() (pairwise global alignments) I get alignment files in pair format, with a .needle file extension. I want to use Biopython to read these alignments for later analysis.
I use AlignIO.read(open('alignment.needle'),'emboss') following the instructions in Biopython's AlignIO wiki but I keep getting an AssertionError.
My code:
>>> from Bio import AlignIO
>>> alignment = AlignIO.read(open("data/all/out/pair1_alignment.needle"), "emboss")
My error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python27\lib\Bio\AlignIO\__init__.py", line 423, in read
first = next(iterator)
File "C:\Python27\lib\Bio\AlignIO\__init__.py", line 370, in parse
for a in i:
File "C:\Python27\lib\Bio\AlignIO\EmbossIO.py", line 150, in __next__
assert seq.replace("-", "") != ""
AssertionError
Example Alignment File:
Download the alignment file here
Versions:
Windows 7
Python version 2.7.3
Biopython version 1.63
EMBOSS version 2.10.0-0.8
Clues:
I suspect this may be related to a warning message I kept getting when actually making the alignments, which was outputted by EMBOSS needle() function:
Warning: Sequence character string not found in ajSeqCvtKS

Duplicate post on BioStars, http://www.biostars.org/p/87226/#87399
This appears to be down to a subtle change in the EMBOSS output. You have an extremely old version, EMBOSS version 2.10.0 (February 2005), and your output file has lines like this:
gag 1288 -------------------------------------------------- 1287
Using a newer version of EMBOSS (e.g. 6.3.0), gives lines like this:
gag 1287 -------------------------------------------------- 1287
The Biopython parser is expecting the latter for alignment sections with no letters (e.g. when one sequence is much longer than the other), where the start and end coordinates agree. Please update your copy of EMBOSS, and then the parser should be happy. The current EMBOSS release is version 6.5.0.

The problem is that you're passing the wrong format file to Biopython. An explanation follows.
Formatting
The format of the file you've linked to is srspair (see the header of pair1_aligned.fasta). It's worth noting that this is not the FASTA format - that's an entirely different format.
Delving into the source of Biopython's EmbossIO, we can see that the EmbossIterator (which is called by AlignIO.read when the format is 'emboss') is only meant to handle the formats pair and simple (see Alignment formats for an explanation of the various formats).
Solution
If you export EMBOSS's output in the pair format (then call AlignIO.read as you have before), that should solve your problem.

Related

How to resemble the PDF-output of AsciidocFX using a Gradle build-script?

I have the following Asciidoc-document:
= Test
:doctype: article
:notitle:
:!toc:
AsciidocFX shows links in PDFs as footnotes http://stackoverflow.com[SO].
.Asciidoc in PDF does not work in Asciidoctor, but works in AsciidocFX.
[cols="2,5a"]
|===
|Line with Asciidoc code
|here comes a list:
* item 1
* item 2
* item 3
http://stackoverflow.com[Get Answers]!
|Line
|with a footnotefootnote:[footnotes do work in AsciidocFX's PDF output (but not in the preview).]
|===
When generating a PDF using asciidoctor, the output is as follows:
The problems are:
footnotes are shown inline (see: https://github.com/asciidoctor/asciidoctor-pdf/issues/73)
Asciidoc-content in tables cells is not interpreted: https://github.com/asciidoctor/asciidoctor-pdf/issues/6
Link targets are not shown as Footnotes (this would be nice to have)
Using https://github.com/asciidocfx/AsciidocFX shows everything correctly:
Now, I'd like to have the same output that AsciidocFX produces, but still like to use my Gradle build-script.
From https://github.com/asciidoctor/asciidoctor-pdf/issues/73#issuecomment-224327058 I learned, that AsciidoctorFX uses https://github.com/asciidoctor/asciidoctor-fopub[asciidoctor-fopub] under the hood. But how can I this pipeline in my build.gradle. Do I have to generate epub in a first task and use the output in another task? Or is there a direkt way?
Sorry that I am a tad late (almost 7 years!!) to answer your question, but perhaps it will help others.
Perhaps you need to upgrade. When I run your .adoc verbatim, the foootnotes come out perfectly. In fact the output is exactly as you posted correct version of output. Here is the syntax that I use:
asciidoctor-pdf -a pdf-themesdir=/path/to/themes -a pdf-theme=your-pdf-theme-file.yml -a pdf-fontsdir=/path/to/your/fonts/directory/ your_test_file.adoc
I put this syntax in a bash script with the adoc file as an argument.
I am using:
linux Pop!_OS 22.04 LTS (close derivative of ubuntu)
ruby 3.1.2p20
asciidoctor-pdf-2.3.0b
Ironically, I am amazed with is your AsciidoctorFX output. AsciidoctorFX pdf output looks horrible for me and there is no simple way of changing the output style, like editing the asciidoctor-pdf yaml.
Cheers, Joe

IDL READFITS() syntax error

I'm trying to use the READFITS() function on IDL 8.3 on Mac 10.9.3
My input on the IDL promt:
readfits('image.fits',h, /EXTEN, /SILENT)
Result:
readfits('image.fits',h, /EXTEN, /SILENT)
^
% Syntax error.
*note: the '^' is below '/EXTEN'
Maybe it will help, so here is a link to the IDL help page on using READFITS() --> http://www.exelisvis.com/docs/readfits.html
I tried using the brackets like they show on that help page, but it still didn't work, so I'm stuck now. Didn't know if anyone here has experience reading .fits files in IDL.
ok, so it turns out the readfits procedure isn't included in IDL's original library, so I just had to download AstroLib (contains lots of useful astronomy procedures - including Readfits). The original syntax then worked.
I'm using IDL 8.2.2 on OS X 10.9.4.
Try keeping it simple first. Do these work?
readfits('image.fits')
readfits('image.fits', header)
Next try this:
readfits('image.fits', header, EXTEN_NO=0)
I suspect you really want extension number 0, not 1. See (e.g.) http://www.stsci.edu/documents/dhb/web/c02_datafiles.fm2.html.

OCaml bugs during why3 usage

I'm trying to compile why3ide (why3-0.81) with krakatoa & jessie (why-2.33) for Windows (Cygwin). Everything went fine except I can't make right bottom textbox to show notations (it is always empty), moreover I get the error (highlighted in the picture) every time when I try to select the item to proof.
Image: https://dl.dropboxusercontent.com/u/39984835/why3ide/error_capture.jpg
Here is this error:
Apply transformation introduce_premises
Why3ide callback raised an exception:
anomaly: End_of_file
Backtrace:
Raised at file "format.ml", line 197, characters 41-52
Called from file "format.ml", line 425, characters 8-33
Called from file "format.ml", line 440, characters 6-24
How can I debug this error?
(I'm newbie for OCaml)
format.ml file is here:
cygwin/lib/ocaml/format.ml
Files that refers to introduce_premises transformation are here:
why3-0.81/drivers/gappa.drv
why3-0.81/src/ide/gmain.ml
why3-0.81/src/transform/introduction.ml
why3-0.81/drivers/mathematica.drv
P.S. I tried to add why3 & why3ide tags for this post, but my reputation is not enough for that yet.

Error in nchar() when reading in stata file in R on Mac

I'm learning R and am simply trying to read in a stata data file but am getting the error below:
X <- Stata.file(Stata_File)
Error in nchar(varlabs) : invalid multibyte string 253
Multiple Mac users here are encountering this error with the program but it works fine on a PC. A google search of this error seems to say it has something to do with the R package but I can't find a solution. Any ideas? Thanks for your help!!
The R code up to the error point is below:
Root <- "/Users/Desktop/R_Training"
PathIn <- paste(Root,"Data/Example_0",sep="/")
# The 2007 Dominican Republic household member file (96 MB)
Stata_File <- "drpr51fl.dta"
# Load the memisc package:
library(memisc)
# Set the working directory:
setwd(PathIn)
# (1) Determine which variables we want:
# The Stata.file function (from memisc) reads the "header"
# of our Stata file so you can see what it contains
# and choose the variables you want.
X <- Stata.file(Stata_File)
**Error in nchar(varlabs) : invalid multibyte string 253**
Below is my session info:
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] memisc_0.95-33 MASS_7.3-13 lattice_0.19-30
This is what worked for me. You can force R to recognize every character by issuing the following command:
Sys.setlocale('LC_ALL','C')
Now run the previous command and all should be fine.
It seems like the encoding of strings in the file isn't what the program thinks it is...
I guess the file was generated on a PC? Does it contain non-ACII column names or data strings?
Since you seem to have UTF-8 encoding, and (US/western europe) PC:s typically have latin-1, that could be the problem. I'd expect the same problem on Linux then (also UTF-8).
Possible work-arounds:
Does the Stata.file method have an "encoding" option? Then you might try 'latin1' and hope for the best...
Another possibility is to start R with the --encoding=latin1 option.

ocaml Unix.system call to pdflatex

I'm having a problem calling an outside application from a compiled ocaml application, pdflatex. I'm using the proper string as an argument, when I run it from the toplevel I get the expected results,
Unix.system "pdflatex -interaction batchmode -output-directory res ALGO_GEN.tex";;
And it generates the proper output,
This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian)
restricted \write18 enabled.
entering extended mode
(/usr/share/texmf-texlive/tex/latex/base/article.cls
Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
(/usr/share/texmf-texlive/tex/latex/base/size10.clo))
(/usr/share/texmf-texlive/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/usr/share/texmf-texlive/tex/latex/amsmath/amstext.sty
(/usr/share/texmf-texlive/tex/latex/amsmath/amsgen.sty))
(/usr/share/texmf-texlive/tex/latex/amsmath/amsbsy.sty)
(/usr/share/texmf-texlive/tex/latex/amsmath/amsopn.sty))
(/usr/share/texmf-texlive/tex/latex/algorithms/algorithmic.sty
(/usr/share/texmf-texlive/tex/latex/base/ifthen.sty)
(/usr/share/texmf-texlive/tex/latex/graphics/keyval.sty))
No file ALGO_GEN.aux.
[1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]
(maze.html.res/ALGO_GEN.aux) )</usr/share/texmf-texlive/fonts/type1/public/a
msfonts/cm/cmbx10.pfb></usr/share/texmf-texlive/fonts/type1/public/amsfonts/cm/
cmmi10.pfb></usr/share/texmf-texlive/fonts/type1/public/amsfonts/cm/cmr10.pfb><
/usr/share/texmf-texlive/fonts/type1/public/amsfonts/cm/cmsy10.pfb>
Output written on res/ALGO_GEN.pdf (1 page, 36816 bytes).
Transcript written on res/ALGO_GEN.log.
- : Unix.process_status = Unix.WEXITED 0
From the compiled application, the log indicates that,
*** (job aborted, no legal \end found)
It has been confusing me for some time. I've used other system calls from the Unix module, and other command line options. I'm wondering if anyone can give some advice on how to proceed. The application generates a few tex documents, and they need to be converted to pdf. From the toplevel, calling a map over a list of them generates the pdfs properly; only compiled (byte code) does it not work.
I wasn't closing the channel to the tex file previously written, so no data would potentially be written. Thanks to Gilles for suggesting I inspect the files during runtime.

Resources