blast given GI NCBI nucleotide database - bioinformatics

I have local database driven website which contains a few sequences which are uniquely identified via their GI number. Is it possible to link to the 'NCBI blast site' directly given the GI. For example the sequence for GI 903049 has this link:
http://www.ncbi.nlm.nih.gov/nuccore/903049
It links to this blast page:
http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&QUERY=U22848.1&DATABASE=nr&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&LINK_LOC=nuccore&PAGE_TYPE=BlastSearch
I would like to link to this site directly without having to go to NCBI. Thanks.

you can use ncbi efetch to download all your gis:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=903049&rettype=fasta&retmode=text
and run a local version of blast: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
you can also ask Biostar: http://www.biostars.org/show/questions/

Related

Similarity measures for WordNet in Prolog

I would like to be able to check if two words are similar (using the path similarity) in WordNet with Prolog.
I found on the internet this article doing exactly what I want.
I'll paste here the steps needed to let this work:
Download WordNet 3.0 Prolog version from
http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
and unzip it in a directory of your choice. For example:
c:\wn_prologDB
Set the environment variable WNDB to this newly created directory.
For example, use either the system dialog box in Control Panel
(Environment Variables), or write in a Command Prompt (cmd.exe):
set WNDB=c:\wn_prologDB
Download the modules of WN_CONNECT from
https://dectau.uclm.es/bousi-prolog/applications and unzip
it in a directory of your choice. For example:
c:\wn
Add to the PATH environment variable the directory where this tool
is located (similar to step 2 above):
set PATH=c:\wn;%PATH%
Open a terminal and execute the shell script:
wn.sh
I followed those steps, and run wn.bat (since I'm using windows).
As you can see from the picture, the wn_word_info predicate is working, while I cannot understand why wn_path is not working.
Here is the signature:
wn_path(+Word1, +Word2, -Degree)
Any tips on how I could get it to work? Or any solution to calculate path similarity?
The signature was a bit different:
wn_path(+Word1:SS_type1:W1_Sense_num, +Word2:SS_type2:W2_Sense_num, -Degree)
So this actually works:
wn_path(cat:A:B, dog:C:D, E).
Here I'm specifying only the word and the other parameters are variables, but one can specify all three the parameters.
I paste here some of the docs I found in the WN_CONNECT folder:
** wn_path(+Word1:SS_type1:W1_Sense_num, +Word2:SS_type2:W2_Sense_num, -Degree):
This predicate implements the PATH similarity measure.
Takes two concepts (terms -- Word:SS_type:Sense_num) and returns the degree of similarity between them. Note that we do not explicitly require information about the synset type and sense number of a word (that can be variables).

How to download full article text from Pubmed?

I am working on a project that requires to work with Genia corpus. According to the literature Genia Corpus is made from articles extracted by searching 3 Mesh terms : “transcription factor”, “blood cell” and “human” on Medline/Pubmed. I want to extract full text article(which are freely available) for the articles in Genia corpus from Pubmed. I have tried many approaches but I am not able to find a way to download full text in text or XML or Pdf format.
Using Entrez utils provided by NCBI :
I have tried using the approach mentioned here -
http://www.hpa-bioinformatics.org.uk/bioruby-api/classes/Bio/NCBI/REST/EFetch/Methods.html#M002197
which uses the Ruby gem Bio like this to get the information for a given PubMed ID -
Bio::NCBI::REST::EFetch.pubmed(15496913)
But, it doesn't return the full text for the PMID.
Internally, it makes a call like this -
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=1372388&retmode=text&rettype=medline
But, both the Ruby gem and the above call don't return the full text.
On further Internet search, I found that the allowed values for PubMed for rettype and retmode don't have an option to get the full text, as mentioned in the table here -
http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly
All the examples and other scripts I have seen on the Internet are only about extracting abstracts. authors etc. and none of them discuss extracting the full text.
Here is another link that I found that uses Python package Bio, but only accesses the information about authors -
https://www.biostars.org/p/172296/
How can I download full text of the article in text or XML or Pdf format using Entrez utils provided by NCBI? Or are there already available scripts or web crawlers that I can use?
You can use biopython to get articles which are on PubMedCentral and then get PDF from it. For all articles which are hosted somewhere else, it is difficult to get a generic solution to get the PDF.
It seems that PubMedCentral does not want you to download articles in bulk. Requests via urllib are blocked, but the same URL works from a browser.
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
#id is a string list with pubmed IDs
#two of have a public PMC article, one does not
handle = Entrez.efetch("pubmed", id="19304878,19088134", retmode="xml")
records = Entrez.parse(handle)
#checks for all records if they have a PMC identifier
#prints the URL for downloading the PDF
for record in records:
if record.get('MedlineCitation'):
if record['MedlineCitation'].get('OtherID'):
for other_id in record['MedlineCitation']['OtherID']:
if other_id.title().startswith('Pmc'):
print('http://www.ncbi.nlm.nih.gov/pmc/articles/%s/pdf/' % (other_id.title().upper()))
I'm working on the exact same problem using ruby. So far I was able to achieve moderate success by doing the following with ruby:
use the Mechanize+esearch from eutils to get an XML of your pubmed search, and then use Mechanize/Nokogiri to parse the PMIDs from the XML
use the Mechanize+ID converter to convert the PMIDs to PMCIDs (when available). If you really are only interested in the papers available on PMC, you can set up the esearch to return PMCIDs as well.
once you have the PMCIDs, you can use Mechanize to access the webpage, click on the pdf click on the page, and use Mechanize to save the file.
It's by no means straightforward but still not that bad. There is a gem that claims to do the same (https://github.com/billgreenwald/Pubmed-Batch-Download). I plan to test that out soon.
If you want XML or JSON by PubMed ID or PMC, then you want to use the "BioC API" to access PubMed Central (PMC) Open Access articles.
(see https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/ )
Here an code-example:
https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_xml/19088134/ascii

Is there an easy way in java to find out the autor who and/or the computer that produced a certain file ( e.g. image)?

Now that my tool can load and display images as well as their metadata - using metadata extractor - i'd also like to see information about who made the file and what was the name of that person's computer.
Target is: Changing or deleting the original information.
I guess that reading the information is not necessary if i could generate a new image - looking exactly like the source-file but having new - changed - metadata.
Example:
There is an image A onwned by John Smith and was produced at John's Computer.
Now i want to make an image B looking like image A but saying it was made by Cate Smith at Cate's Computer.
Hope some one can help!
Thanks!

NCBI gene database question

I m trying to find gene_info file with genenames and chromosomal location. However, I can't seem to locate it on NCBI FTP site. Can anyone give me a pointer?
See: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/README for details of what is in what files at the NCBI ftp site.
If you want to get the data from NCBI itself you will need to combine multiple files, probably a gene2accession (which also includes position information) and a gene_info file which maps ids to symbols and names etc.
It is probably more convenient to go to the UCSC site for this information, they also provide a public mysql database if you want to explore what is available:
http://workshops.arl.arizona.edu/sql1/sql_workshop/mysql/mysqlclient.html
If you just want human, mouse or rat data then the Rat Genome Database has already compiled the data you want (fresh from the NCBI and Ensembl sources):
ftp://rgd.mcw.edu/pub/data_release
e.g. for human data look at: ftp://rgd.mcw.edu/pub/data_release/GENES_HUMAN.txt

Searching location in a sentence

I'm working on a location extraction algorithm but haven't achieved anything considerable yet. For example in this sentence
Riders on the B and Q lines will get some relief from construction as stations reopen, and a major project will soon begin at the Dyckman Street station.
"Dyckman Street" is location information. How we extract this information from a given sentence. (I tried to extract the words from a sentence starting with a Capital letter and search it against a db having city names, but it doesn't work always).
From where i can find an algorithm to extract this information?
Thanks..
I remember having seen this library when I was playing with Named Entity Recognition.
This Google search might be a useful source of information as well.
There are also a number of web services designed to parse geo-locations from text. For example Yahoo's PlaceMaker service at http://developer.yahoo.com/geo/placemaker/

Resources