RDflib results printing error on windows7

RDflib results printing error on windows7 - debugging

I use SPARQLWrapper to create SPARQL queries, but I don't know how to debug the following error message:
Warning (from warnings module):
File "D:\Python27\lib\site-packages\sparqlwrapper-1.5.2-py2.7.egg\SPARQLWrapper\Wrapper.py", line 550
RuntimeWarning: unknown response content type, returning raw response...
Traceback (most recent call last):
File "D:\Python27\testwrapper.py", line 31, in <module>
if (len(results["results"]["bindings"]) == 0):
AttributeError: addinfourl instance has no attribute '__getitem__'
This is my code:
from SPARQLWrapper import SPARQLWrapper,JSON
sparql = SPARQLWrapper('http://thedatahub.org/dataset/semanticquran');
queryString = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX olia-ar: <http://purl.org/olia/arabic_khoja.owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX lexvo: <http://lexvo.org/id/iso639-3/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX gold: <http://purl.org/linguistics/gold/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX qvoc: <http://www.nlp2rdf.org/quranvocab#>
SELECT DISTINCT ?wordText ?pos
WHERE
{ ?wordPart rdf:type qvoc:LexicalItem .
?wordPart gold:Root "smw" .
?wordPart dcterms:isPartOf ?word .
?wordPart gold:PartOfSpeechProperty ?pos .
?word rdf:type qvoc:Word .
?word skos:prefLabel ?wordText
}
"""
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
if (len(results["results"]["bindings"]) == 0):
print "No results found."
else:
for result in results["results"]["bindings"]:
print result["wordText"]["value"]
Any Help?

The return format is not JSON, so the setReturnFormat call is not working as expected:
Encode the return value depending on the return format:
in the case of XML, a DOM top element is returned;
in the case of JSON, a simplejson conversion will return a dictionary;
in the case of RDF/XML, the value is converted via RDFLib into a Graph instance.
In all other cases the input simply returned.

Related

Unable to find element by attribute with lxml

I'm using a European Space Agency API to query (result can be viewed here) for satellite image metadata to parse into python objects.
Using the requests library I can successfully get the result in XML format and then read the content with lxml. I am able to find the elements and explore the tree as expected:
# loading the response into an ElementTree
tree = etree.fromstring(response.content)
root = tree.getroot()
ns = root.nsmap
# get the first entry element and its summary
e = root.find('entry',ns)
summary = e.find('summary',ns).text
print summary
>> 'Date: 2018-11-28T09:10:56.879Z, Instrument: OLCI, Mode: , Satellite: Sentinel-3, Size: 713.99 MB'
The entry element has several date descendants with different values of the attriubute name:
for d in e.findall('date',ns):
print d.tag, d.attrib
>> {http://www.w3.org/2005/Atom}date {'name': 'creationdate'}
{http://www.w3.org/2005/Atom}date {'name': 'beginposition'}
{http://www.w3.org/2005/Atom}date {'name': 'endposition'}
{http://www.w3.org/2005/Atom}date {'name': 'ingestiondate'}
I want to grab the beginposition date element using XPath syntax [#attrib='value'] but it just returns None. Even just searching for a date element with the name attribute ([#attrib]) returns None:
dt_begin = e.find('date[#name="beginposition"]',ns) # dt_begin is None
dt_begin = e.find('date[#name]',ns) # dt_begin is None
The entry element includes other children that exhibit the same behaviour e.g. multiple str elements also with differing name attributes.
Has anyone encountered anything similar or is there something I'm missing? I'm using Python 2.7.14 with lxml 4.2.4

It looks like an explicit prefix is needed when a predicate ([#name="beginposition"]) is used. Here is a test program:
from lxml import etree
print etree.LXML_VERSION
tree = etree.parse("data.xml")
ns1 = tree.getroot().nsmap
print ns1
print tree.find('entry', ns1)
print tree.find('entry/date', ns1)
print tree.find('entry/date[#name="beginposition"]', ns1)
ns2 = {"atom": 'http://www.w3.org/2005/Atom'}
print tree.find('atom:entry', ns2)
print tree.find('atom:entry/atom:date', ns2)
print tree.find('atom:entry/atom:date[#name="beginposition"]', ns2)
Output:
(4, 2, 5, 0)
{None: 'http://www.w3.org/2005/Atom', 'opensearch': 'http://a9.com/-/spec/opensearch/1.1/'}
<Element {http://www.w3.org/2005/Atom}entry at 0x7f8987750b90>
<Element {http://www.w3.org/2005/Atom}date at 0x7f89877503f8>
None
<Element {http://www.w3.org/2005/Atom}entry at 0x7f8987750098>
<Element {http://www.w3.org/2005/Atom}date at 0x7f898774a950>
<Element {http://www.w3.org/2005/Atom}date at 0x7f898774a7a0>

Analyzing protein sequences with the ProtParam module

I'm fairly new with Biopython. Right now, I'm trying to compute protein parameters from several protein sequences (more than 100) in fasta format. However, I've found difficult to parse the sequences correctly.
This is the code im using:
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis
input_file = open ("/Users/matias/Documents/Python/DOE.fasta", "r")
for record in SeqIO.parse(input_file, "fasta"):
my_seq = str(record.seq)
analyse = ProteinAnalysis(my_seq)
print(analyse.molecular_weight())
But I'm getting this error message:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site- packages/Bio/SeqUtils/__init__.py", line 438, in molecular_weight
weight = sum(weight_table[x] for x in seq) - (len(seq) - 1) * water
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/SeqUtils/__init__.py", line 438, in <genexpr>
weight = sum(weight_table[x] for x in seq) - (len(seq) - 1) * water
KeyError: '\\'
Printing each sequence as string shows me every seq has a "\" at the end, but so far I haven't been able to remove it. Any ideas would be very appreciated.

That really shouldn't be there in your file, but if you can't get a clean input file, you can use my_seq = str(record.seq).rstrip('\\') to remove it at runtime.

words dos not exist in a doc2vec model when a document is added iteratively to the model

I wrote the following code to build a Doc2vec Model iteratively. As I read in this page, if the number of tokens is more than 10000 in a document then we need to split tokens and repeat the label(s) for each segment.
The length of tokens is more than 10000 for most of my documents.I try to split my tokens by writing the following code.But I got error which shows the tokens after 10000 is not considered in my model.
def iter_documents(top_directory):
mapDocName_Id=[]
label=1
for root, dirs, files in os.walk(top_directory):
for fname in files:
print fname
inputs=[]
tokens=[]
with open(os.path.join(root, fname)) as f:
for i, line in enumerate(f):
if line.startswith('clueweb09-en00'):
if tokens:
i=0
if len(tokens)<10000:
yield LabeledSentence(tokens[:],[label])
else:
tLen=len (tokens)
times= int(math.floor(tLen/10000))
for i in range(0,times):
s=i*10000
e=(i*10000)+9999
yield LabeledSentence(tokens[s:e],[label])
start=times*10000
yield LabeledSentence(tokens[start:tLen],[label])
label+=1
tokens=[]
else:
tokens=tokens+line.split()
yield LabeledSentence(tokens[:],[label])
class docIterator(object):
def __init__(self,top_directory):
self.top_directory = top_directory
def __iter__(self):
return iter_documents(self.top_directory)
allDocs = docIterator(inputPath)
model = Doc2Vec(allDocs, size = 300, window = 5, min_count = 2, workers = 4)
model.save('my_model.doc2vec')
I test my model with the following code then I got this error:
model= Doc2Vec.load('my_model.doc2vec')
#print model['school']
print model['philadelphia']
I got a vector as result of school but I got this error for philadelphia. philadelphia is in tokens after index 10000.
2017-02-27 13:59:36,751 : INFO : loading Doc2Vec object from /home/fl/Desktop/newInput/tokens/my_model.doc2vec
2017-02-27 13:59:36,765 : INFO : loading docvecs recursively from /home/fl/Desktop/newInput/tokens/my_model.doc2vec.docvecs.* with mmap=None
2017-02-27 13:59:36,765 : INFO : setting ignored attribute syn0norm to None
2017-02-27 13:59:36,765 : INFO : setting ignored attribute cum_table to None
Traceback (most recent call last):
File "/home/fl/git/doc2vec_annoy/Doc2Vec_Annoy/KNN/CreateAnnoyIndex.py",
line 31, in <module>
print model['philadelphia'] File "/home/flashkar/anaconda/lib/python2.7/site-packages/gensim/models/word2vec.py",
line 1504, in __getitem__
return self.syn0[self.vocab[words].index]
KeyError: 'philadelphia'

I solve my problem by dividing documents to documents with length 10,000 but with same document identifier. Therefore, I do not check the length of tokens.

Output gene positions from GenBank file

Is it possible to output the gene location for a CDS feature or do I need to parse the 'location' or 'complement' field myself?
For example,
seq = Sequence.read(genbank_fp, format='genbank')
for feature in seq.metadata['FEATURES']:
if feature['type_'] == 'CDS':
if 'location' in feature:
print 'location = ', feature['location']
elif 'complement' in feature:
print 'location = ', feature['complement']
else:
raise ValueError('positions for gene %s not found' % feature['protein_id'])
would output:
location = <1..206
location = 687..3158
for this sample GenBank file.
This functionality is possible in BioPython (see this thread) where I can output the positions already parsed (ex. start = 687, end = 3158).
Thanks!

For the example, you can get the Sequence object for the feature only, using the following code:
# column index in positional metadata
col = feature['index_']
loc = seq.positional_metadata[col]
feature_seq = seq[loc]
# if the feature is on reverse strand
if feature['rc_']:
feature_seq = feature_seq.reverse_complement()
Note: the GenBank parser is newly added in the development branches.

SPARQL INSERT query in Protege

I'm trying to map my Data Properties in Protege to owl:Class with Sparql but it doesn't work. If any one has an example, please don't hesitate to give me an answer.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX myOnto: <http://imis/SNI/9.4.3#>
INSERT DATA
{
?s rdf:type owl:Class
}
WHERE { ?s ?o owl:DatatypeProperty}
When i was running it i got this message . Please help me to resolve this problem.
Error 61 Logged at Wed Oct 15 23:11:26 CEST 2014
SparqlReasonerException: org.openrdf.query.MalformedQueryException: Encountered " "insert" "INSERT "" at line 7, column 1.
Was expecting one of:
"base" ...
"prefix" ...
"select" ...
"construct" ...
"describe" ...
"ask" ...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

RDflib results printing error on windows7 - debugging

Related

Unable to find element by attribute with lxml

Analyzing protein sequences with the ProtParam module

words dos not exist in a doc2vec model when a document is added iteratively to the model

Output gene positions from GenBank file

SPARQL INSERT query in Protege

Categories

Resources