Is it possible to transform pmc-ids (pubmed central ids) to pmids (pubmed ids) via a ncbi api? You can do it via the web form but I would like to use a program - of course I can always write a screen scraper ... thanks
You can convert pubmed central ids to pubmed ids with EFetch, from the NCBI Entrez Programming Utilities (E-utilities). It is possible to use EFetch from any programming language that can read data from HTTP and parse XML.
For example, if one of the articles in your list is:
Wang TT, et al. J Biol Chem. 2010 Jan 22;285(4):2227-31.
PubMed PMID: 19948723 PubMed Central PMCID: PMC2807280
You can get an XML document from the following EFetch url:
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=2807280&rettype=medline&retmode=xml"
The XML document contains the PubMed ID:
<pmc-articleset>
<article>
<front>
<article-meta>
<article-id pub-id-type="pmc">2807280</article-id>
<article-id pub-id-type="pmid">19948723</article-id>
One way to convert a pmcid to a pmid in perl is:
#!/usr/bin/perl
# pmcid2pmid.pl -- convert a pubmed central id to a pubmed id with EFetch
# http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchlit_help.html
use strict;
use warnings;
use LWP::UserAgent; # send request to eutils.ncbi.nlm.nih.gov
use XML::Smart; # parse response
# check parameter
my ($id) = #ARGV;
if ( not(defined($id)) ) {
print STDERR "must provide a pmcid as 1st parameter...\n";
exit(-1);
}
$id =~ s/PMC//;
sleep(3); # recommended delay between queries
# build & send efetch query
my $efetch= "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
my $efetch_query = "db=pmc&id=$id&rettype=medline&retmode=xml";
my $url = $efetch.$efetch_query;
my $xml = XML::Smart->new($url);
##print $xml->dump_tree(),"\n";
# parse the response
$xml = $xml->{'pmc-articleset'}->{'article'}->{'front'}{'article-meta'};
my $pmid = $xml->{'article-id'}('pub-id-type','eq','pmid')->content;
print STDOUT "PMID = $pmid";
>perl pmcid2pmid.pl PMC2807280
PMID = 19948723
Related
The follow is on MacOS High Sierra with Perl v5.28.2. The version of Net::MAC::Vendor is 1.265. This sort of a repeat post. My prior post lacked a lot of detail.
I am trying to put together a script to list ip addresses, MAC addresses, and vendors on a network. The Net::MAC::Vendor::lookup function is returning a timeout error amongst other things. I have checked a few IEEE links that were supposed to have the OUI data but they are all dead returning no data. I have seen a number of mentions stating the file can be found in some installations of linux. I have search high and low and have not found an oui.txt file on my system. If I downloaded a copy I wouldn't know where to put it or how I could have the Net::MAC::Vendor function to locate it. Also, if I did find a link I still wouldn't know how to direct the vendor lookup function to use it.
The errors I am getting are as follows:
Use of uninitialized value in concatenation (.) or string at /Users/{username}/perl5/perlbrew/perls/perl-5.28.2/lib/site_perl/5.28.2/Net/MAC/Vendor.pm line 320.
Failed fetching [https://services13.ieee.org/RST/standards-ra-web/rest/assignments/download/?registry=MA-L&format=html&text=D8-D7-75] HTTP status []
message [Connect timeout] at simplemacvendor.pl line 23.
Could not fetch data from the IEEE! at simplemacvendor.pl line 23.
The sample code:
#!/usr/bin/perl;
use strict;
use feature qw(say);
use Data::Dumper qw(Dumper);
use Net::MAC::Vendor;
open(ARP, "arp -na|") || die "Failed $!\n";
my #arp_table;
while (<ARP>) {
if ($_ =~ m/incomplet/) {next;}
if ($_ =~ m/Address/) {next;}
my #line = split(' ',$_);
my $computer = {};
$line[1] =~ s/(\()([0-9\.]*)(\))/\2/;
$computer->{ip} = $line[1];
$computer->{mac} = $line[3];
$computer->{if} = $line[5];
say Dumper($computer);
# Get vendor info
my $vendor_info = Net::MAC::Vendor::lookup( $computer->{mac} ); # line 23
$computer->{vendor} = $vendor_info->[0];
push #arp_table , $computer;
}
print "ARP Table with vendors:\n";
for my $i (0 .. $#arp_table) {
print "$arp_table[$i]{ip}\t";
print "$arp_table[$i]{if}\t";
print "$arp_table[$i]{mac}\t";
print "$arp_table[$i]{vendor}";
print "\n";
}
I have a mining application that shows the data I need but the application doesn't have an api to grab it. How can I extract the string to parse the data with powershell or equivalent?
The data drops a line like below every second,
ID(grabbed from board) hash:(label) hashrate(variable) errors:(label) #(variable) temp(variable) volts(variable) solutions:(label) #(variable) shares:(label) #(variable)
Example:
ABC1000234 hash: 9.8Gh/s errors: 0.000% 26.3C 0.74V solutions: 539/539
shares: 33
I need the hashrate, temp and volts or even better a way to send every string out to a port I can listen on to a url like "strings". If I can get the string to post to a port such as 4068. Then I could use powershell and netcat to listen to the port on http://127.0.0.1:4068.
Here is what I was going to do for powershell:
$serveraddress = '127.0.0.1'
$serverport = '4068'
$threadinfo = echo 'strings' | nc $serveraddress $serverport
$mineridstring = $stringsinfo.Split(';')[1] $minderid =
$mineridstring.Split('=')[0]
$hashstring = $stringsinfo.Split(';')[2] $hash =
$hashstring.Split('=')[1]
$tempstring = $stringsinfo.Split(';')[4] $tempc =
$tempstring.Split('=')[0]
$voltstring = $stringsinfo.Split(';')[5] $volts =
$voltsstring.Split('=')[0]
Invoke-RestMethod -Uri https://www.rigmanager.xyz/rig.php -Method Post `
-Body #{minerid = $minerid; hashrate = $hashrate; tempc = $temp; $volts = $volts} -UseBasicParsing
Push them to a message queue, and then you can subscribe any number of users/applications to that stream.
Check out Apache Kafka or any of the cloud-based equivalents on AWS, IBM Cloud, GCP, etc.
Parsing your string is something regex can handle, although unless you need the data indexed for querying/searching, you can pushing that off to the end user/application and just serve them the whole message.
An easy way to do this is with named captures in a regex.
PS C:\src\t> type exttext.ps1
$s = 'ABC1000234 hash: 9.8Gh/s errors: 0.000% 26.3C 0.74V solutions: 539/539 shares: 33'
$doesit = $s -match '^(?<id>.*) hash: (?<hashrate>.*) errors: (?<errors>[0-9.]+%) (?<temp>.*) (?<volts>.*) solutions: .* shares: \d+$'
$Matches.id
$Matches.hashrate
$Matches.errors
$Matches.temp
$Matches.volts
PS C:\src\t> .\exttext.ps1
ABC1000234
9.8Gh/s
0.000%
26.3C
0.74V
I simply want to have a rest api server which I can call to update a file via a URL, that's it.
Here is the file:
mytextfile:
key1 = value1
key2 = value2
On the client, a script will be run which sends a string or strings to the API server.
The API server will receive them, for example /update.script?string1="blah"&string2="fun" (pretend its url encoded)
The server should then parse these strings, and then call an exec function, or another script even on the system which does some sed command to update a file
Language or implementation doesn't matter.
Looking for fresh ideas.
All suggestions are appreciated.
I don't get it: What exactly is your problem/question?
My approach to the problem "modifying a file from inside a cgi script using url-encoded arguments" would be:
Pick a language you like and start coding, in my case with Perl.
#!/usr/bin/perl
use strict; use warnings;
Fetch all your arguments. I will use the CGI module of Perl here:
use CGI::Carp;
use CGI;
my $cgi = CGI->new;
# assuming we don't have multivalued fields:
my %arguments = $cgi->Values; # handles (almost) *all* decoding and splitting
# validate arguments
# send back CGI header to acknowledge the request
# the server will make a HTTP header from that
Now either call a special subroutine / function with them …
updateHandler(%arguments);
...;
my $filename = 'path to yer file name.txt';
sub updateHandler {
my %arguments = #_;
# open yer file, loop over yer arguments, whatever
# read in file
open my $fileIn, '<', $filename or die "Can't open file for reading";
my #lines = <$fileIn>;
close $fileIn;
# open the file for writing, completely ignoring concurrency issues:
open my $fileOut, '>', $filename or die "Can't open file for writing";
# loop over all lines, make substitutions, and print it out
foreach my $line (#lines) {
# assuming a file format with key-value pairs
# keys start at the first column
# and are seperated from values by an '=',
# surrounded by any number of whitespace characters
my ($key, $value) = split /\s*=\s*/, $line, 2;
$value = $arguments{$key} // $value;
# you might want to make sure $value ends with a newline
print $fileOut $key, " = ", $value;
}
}
Please don't use this rather insecure and suboptimal code! I just wrote this as a demonstration that this isn't really complicated.
… or contrieve a way to send your arguments to another script (although Perl is more than well suited for file manipulation tasks). Choose one of the qw{}, system or exec commands, depending on what output you need from your script, or decide to pipe your arguments to the script using the open my $fh, '|-', $command mode of open.
As for the server to run this script on: Apache looks fine to me, unless you have very special needs (your own protocol, single-threading, low security, low performance) in which case you might want to code your own server. Using the HTTP::Daemon module you might manage <50 lines for a simplicistic server.
When using Apache, I'd strongly suggest using mod_rewrite to put the /path into the PATH_INFO environment variable. When using one script to represent your whole REST API, you could use the PATH_INFO to choose one of many methods/subroutines/functions. This also eliminates the need to name the script in the URL.
For example, turn the URL
http://example.com/rest/modify/filename?key1=value1
into
/cgi-bin/rest-handler.pl/modify/filename?key1=value1
Inside the Perl script, we would then have $ENV{PATH_INFO} containing /modify/filename.
This is a bit Perl-centric, but just pick any language you are comfortable with and start coding, leveraging whatever module you can use on the way.
I would use a newer Perl framework, like Mojolicious. If I make a file (test.pl):
#!/usr/bin/env perl
use Mojolicious::Lite;
use Data::Dumper;
my $file = 'file.txt';
any '/' => sub {
my $self = shift;
my #params = $self->param;
my $data = do $file;
$data->{$_} = $self->param($_) for #params;
open my $fh, '>', $file or die "Cannot open $file";
local $Data::Dumper::Terse = 1;
print $fh Dumper $data;
$self->render( text => "File Updated\n" );
};
app->start;
Then run morbo test.pl
and visit http://localhost:3000/?hello=world (or run ./test.pl get /?hello=world)
then I get in file.txt:
{
'hello' => 'world'
}
and so on.
I'm writing a program in Ruby that downloads a file from an RSS feed to my local hard drive. Previously, I'd written this application in Perl and figured a great way to learn Ruby would be to recreate this program using Ruby code.
In the Perl program (which works), I was able to download the original file directly from the server it was hosted on (keeping the original file name) and it worked great. In the Ruby program (which isn't working), I have to sort of "stream" the data from the file I want into a new file that I've created on my hard drive. Unfortunately, this isn't working and the "streamed" data is always coming back empty. My assumption is that there is some sort of redirect that Perl can handle to retrieve the file directly that Ruby cannot.
I'm going to post both programs (they're relatively small) and hope that this helps solve my issue. If you have questions, please let me know. As a side note, I pointed this program at a more static URL (a jpeg) and it downloaded the file just fine. This is why I'm theorizing that some sort of redirect is causing issues.
The Ruby Code (That Doesn't Work)
require 'net/http';
require 'open-uri';
require 'rexml/document';
require 'sqlite3';
# Create new SQLite3 database connection
db_connection = SQLite3::Database.new('fiend.db');
# Make sure I can reference records in the query result by column name instead of index number
db_connection.results_as_hash = true;
# Grab all TV shows from the shows table
query = '
SELECT
id,
name,
current_season,
last_episode
FROM
shows
ORDER BY
name
';
# Run through each record in the result set
db_connection.execute(query) { |show|
# Pad the current season number with a zero for later user in a search query
season = '%02d' % show['current_season'].to_s;
# Calculate the next episode number and pad with a zero
next_episode = '%02d' % (Integer(show['last_episode']) + 1).to_s;
# Store the name of the show
name = show['name'];
# Generate the URL of the RSS feed that will hold the list of torrents
feed_url = URI.encode("http://btjunkie.org/rss.xml?query=#{name} S#{season}E#{next_episode}&o=52");
# Generate a simple string the denotes the show, season and episode number being retrieved
episode_id = "#{name} S#{season}E#{next_episode}";
puts "Loading feed for #{name}..";
# Store the response from the download of the feed
feed_download_response = Net::HTTP.get_response(URI.parse(feed_url));
# Store the contents of the response (in this case, XML data)
xml_data = feed_download_response.body;
puts "Feed Loaded. Parsing items.."
# Create a new REXML Document and pass in the XML from the Net::HTTP response
doc = REXML::Document.new(xml_data);
# Loop through each in the feed
doc.root.each_element('//item') { |item|
# Find and store the URL of the torrent we wish to download
torrent_url = item.elements['link'].text + '/download.torrent';
puts "Downloading #{episode_id} from #{torrent_url}";
## This is where crap stops working
# Open Connection to the host
Net::HTTP.start(URI.parse(torrent_url).host, 80) { |http|
# Create a torrent file to dump the data into
File.open("#{episode_id}.torrent", 'wb') { |torrent_file|
# Try to grab the torrent data
data = http.get(torrent_url[19..torrent_url.size], "User-Agent" => "Mozilla/4.0").body;
# Write the data to the torrent file (the data is always coming back blank)
torrent_file.write(data);
# Close the torrent file
torrent_file.close();
}
}
break;
}
}
The Perl Code (That Does Work)
use strict;
use XML::Parser;
use LWP::UserAgent;
use HTTP::Status;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=fiend.db", "", "", { RaiseError => 1, AutoCommit => 1 });
my $userAgent = new LWP::UserAgent; # Create new user agent
$userAgent->agent("Mozilla/4.0"); # Spoof our user agent as Mozilla
$userAgent->timeout(20); # Set timeout limit for request
my $currentTag = ""; # Stores what tag is currently being parsed
my $torrentUrl = ""; # Stores the data found in any node
my $isDownloaded = 0; # 1 or zero that states whether or not we've downloaded a particular episode
my $shows = $dbh->selectall_arrayref("SELECT id, name, current_season, last_episode FROM shows ORDER BY name");
my $id = 0;
my $name = "";
my $season = 0;
my $last_episode = 0;
foreach my $show (#$shows) {
$isDownloaded = 0;
($id, $name, $season, $last_episode) = (#$show);
$season = sprintf("%02d", $season); # Append a zero to the season (e.g. 6 becomes 06)
$last_episode = sprintf("%02d", ($last_episode + 1)); # Append a zero to the last episode (e.g. 6 becomes 06) and increment it by one
print("Checking $name S" . $season . "E" . "$last_episode \n");
my $request = new HTTP::Request(GET => "http://btjunkie.org/rss.xml?query=$name S" . $season . "E" . $last_episode . "&o=52"); # Retrieve the torrent feed
my $rssFeed = $userAgent->request($request); # Store the feed in a variable for later access
if($rssFeed->is_success) { # We retrieved the feed
my $parser = new XML::Parser(); # Make a new instance of XML::Parser
$parser->setHandlers # Set the functions that will be called when the parser encounters different kinds of data within the XML file.
(
Start => \&startHandler, # Handles start tags (e.g. )
End => \&endHandler, # Handles end tags (e.g.
Char => \&DataHandler # Handles data inside of start and end tags
);
$parser->parsestring($rssFeed->content); # Parse the feed
}
}
#
# Called every time XML::Parser encounters a start tag
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub startHandler {
my($parseInstance, $element, %attributes) = #_;
$currentTag = $element;
}
#
# Called every time XML::Parser encounters anything that is not a start or end tag (i.e, all the data in between tags)
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub DataHandler {
my($parseInstance, $element, %attributes) = #_;
if($currentTag eq "link" && $element ne "\n") {
$torrentUrl = $element;
}
}
#
# Called every time XML::Parser encounters an end tag
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub endHandler {
my($parseInstance, $element, %attributes) = #_;
if($element eq "item" && $isDownloaded == 0) { # We just finished parsing an element so let's attempt to download a torrent
print("DOWNLOADING: $torrentUrl" . "/download.torrent \n");
system("echo.|lwp-download " . $torrentUrl . "/download.torrent"); # We echo the "return " key into the command to force it to skip any file-overwite prompts
if(unlink("download.torrent.html")) { # We tried to download a 'locked' torrent
$isDownloaded = 0; # Forces program to download next torrent on list from current show
}
else {
$isDownloaded = 1;
$dbh->do("UPDATE shows SET last_episode = '$last_episode' WHERE id = '$id'"); # Update DB with new show information
}
}
}
Yes, the URLs you are retrieving appear to be returning a 302 (redirect). Net::HTTP requires/allows you to handle the redirect yourself. You typically use a recursive techique like AboutRuby mentioned (although this http://www.ruby-forum.com/topic/142745 suggests you should not only look at the 'Location' field but also for META REFRESH in the response).
open-uri will handle redirects for you if you're not interested in the low-level interaction:
require 'open-uri'
File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read}
get_response will return a class from the HTTPResponse hierarchy. It's usually HTTPSuccess, but if there's a redirect, it will be HTTPRedirection. A simple recursive method can solve this, that follows redirects. How to handle this correctly is in the docs under the heading "Following Redirection."
Preferably I'd like to do so with some bash shell scripting, maybe some PHP or PERL and a MySQL db. Thoughts?
Here is a solution using Perl, with the help of (of course!) a bunch of modules.
It uses SQLite so you can run it easily (the definition of the (simplistic) DB is at the end of the script). Also it uses Perl hashes and simple SQL statements, instead of proper objects and an ORM layer. I found it easier to parse the XML directly instead of using an RSS module (I tried XML::Feed), because you need access to specific tags (name, preview...).
You can use it as a basis to add more features, more fields in the DB, a table for genre... but at least this way you have a basis that you can expand on (and maybe you can then publish the result as open-source).
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig; # to parse the RSS
use DBIx::Simple; # DB interaction made easy
use Getopt::Std; # always need options for a script
use PerlIO::gzip; # itunes sends a gzip-ed file
use LWP::Simple 'getstore'; # to get the RSS
my %opt;
getopts( 'vc:', \%opt);
# could also be an option, but I guess it won't change that much
my #URLs= (
'http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/topsongs/limit=10/xml',
);
# during debug, it's nice to use a cache of the feed instead of hitting hit every single run
if( $opt{c}) { #URLs= ($opt{c}); }
# I like using SQLite when developping,
# replace with MySQL connect parameters if needed (see DBD::MySQL for the exact syntax)
my #connect= ("dbi:SQLite:dbname=itunes.db","","", { RaiseError => 1, AutoCommit => 0 }) ;
my $NS_PREFIX='im';
# a global, could be passed around, but would make the code a bit more verbose
my $db = DBIx::Simple->connect(#connect) or die "cannot connect to DB: $DBI::errstr";
foreach my $url (#URLs)
{ add_feed( $url); }
$db->disconnect;
warn "done\n" if( $opt{v});
sub add_feed
{ my( $url)= #_;
# itunes sends gziped RSS, so we need to unzip it
my $tempfile= "$0.rss.gz"; # very crude, should use File::Temp instead
getstore($url, $tempfile);
open( my $in_feed, '<:gzip', $tempfile) or die " cannot open tempfile: $!";
XML::Twig->new( twig_handlers => { 'feed/title' => sub { warn "adding feed ", $_->text if $opt{v}; },
entry => \&entry,
},
map_xmlns => { 'http://phobos.apple.com/rss' => $NS_PREFIX },
)
->parse( $in_feed);
close $in_feed;
}
sub entry
{ my( $t, $entry)= #_;
# get the data
my %song= map { $_ => $entry->field( "$NS_PREFIX:$_") } qw( name artist price);
if( my $preview= $entry->first_child( 'link[#title="Preview"]') )
{ $song{preview}= $preview->att( 'href'); }
# $db->begin_work;
# store it
if( ($db->query( 'SELECT count(*) FROM song WHERE name=?', $song{name})->flat)[0])
{ warn " skipping $song{name}, already stored\n" if $opt{v};
}
else
{
warn " adding $song{name}\n" if $opt{v};
if( my $artist_id= ($db->query( 'SELECT id from ARTIST where name=?', $song{artist})->flat)[0])
{ warn " existing artist $song{name} ($artist_id)\n" if $opt{v};
$song{artist}= $artist_id;
}
else
{ warn " creating new artist $song{artist}\n" if $opt{v};
$db->query( 'INSERT INTO artist (name) VALUES (??)', $song{artist});
# should be $db->last_insert_id but that's not available in DBD::SQLite at the moment
$song{artist}= $db->func('last_insert_rowid');
}
$db->query( 'INSERT INTO song ( name, artist, price, preview) VALUES (??)',
#song{qw( name artist price preview)});
$db->commit;
}
$t->purge; # keeps memory usage lower, probably not needed for small RSS files
}
__END__
=head1 NAME
itunes2db - loads itunes RSS feeds to a DB
=head1 OPTIONS
-c <file> uses a cache instead of the list of URLs
-v verbose
=head1 DB schema
create table song ( id INT PRIMARY KEY, name TEXT, artist INT, price TEXT, preview TEXT);
create table artist (id INT PRIMARY KEY, name TEXT);
From what I can tell, it's not actively maintained, but Scriptella could be of some assistance. Very simple xml script, running on Java.
Example of how to suck RSS into a database:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<connection id="in" driver="xpath" url="http://snippets.dzone.com/rss"/>
<connection id="out" driver="text" url="rss.txt"/>
<connection id="db" driver="hsqldb" url="jdbc:hsqldb:db/rss" user="sa" classpath="hsqldb.jar"/>
<script connection-id="db">
CREATE TABLE Rss (
ID Integer,
Title VARCHAR(255),
Description VARCHAR(255),
Link VARCHAR(255)
)
</script>
<query connection-id="in">
/rss/channel/item
<script connection-id="out">
Title: $title
Description: [
${description.substring(0, 20)}...
]
Link: $link
----------------------------------
</script>
<script connection-id="db">
INSERT INTO Rss (ID, Title, Description, Link)
VALUES (?rownum, ?title, ?description, ?link);
</script>
</query>
</etl>
Well, I'm not really sure what sort of answer you're looking for, but I don't think you need to do any sort of shell scripting. Bother PHP and Perl would be perfectly capable of downloading the RSS feed and insert the data into MySQL. Set the PHP or Perl script up to run every X number of hours/days/whatever with a cronjob and you'd be done.
Not really much else to tell you, with how vague your question was.
I'm scraping Stack Overflow's feed to perform some additional filtering using PHP's DOMDocument and then DOM methods to access what I want. I'd suggest looking into that.