i try to embed a link to an external website into a pdf-file, generated with iText.
Heres the code:
Phrase modul = new Phrase ("some text");
Chunk www = new Chunk ("www.arzneimittelinfoservice.de");
www.setAction(new PdfAction(new URL("http://www.arzneimittelinfoservice.de")));
Phrase xref = new Phrase(www);
Phrase link = new Phrase("goto link: " + xref);
...
Problem: in the resulting pdf-doc the link referres to http://www.arzneimittelinfoservice.de], I can't get rid of the closing square bracket.
Perhaps someone can help me with this.
Thanks, Frank
trying your code, i've got the following result :
goto link:
[www.arzneimittelinfoservice.de]
adding the www chunk as such, i got the following result:
goto link
:www.arzneimittelinfoservice.de
you do this by adding the chunck to the phrase, and not by converting it to string, which is what the "+" is doing in the "goto link: " + xref part.
...
Phrase link = new Phrase("goto link: ");
link.add(www);
Regards
Guillaume
Related
[Disclaimer: I have published this question 3 weeks ago in biostars, with no answers yet. I really would like to get some ideas/discussion to find a solution, so I post also here.
biostars post link: https://www.biostars.org/p/447413/]
For one of my projects of my PhD, I would like to access all variants, found in ClinVar db, that are in the same genomic position as the variant in each row of the input GSVar file. The language constraint is Python.
Up to now I have used entrezpy module: entrezpy.esearch.esearcher. Please see more for entrezpy at: https://entrezpy.readthedocs.io/en/master/
From the entrezpy docs I have followed this guide to access UIDs using the genomic position of a variant: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html in code:
# first get UIDs for clinvar records of the same position
# credits: credits: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html
chr = variants["chr"].split("chr")[1]
start, end = str(variants["start"]), str(variants["end"])
es = entrezpy.esearch.esearcher.Esearcher('esearcher', self.entrez_email)
genomic_pos = chr + "[chr]" + " AND " + start + ":" + end # + "[chrpos37]"
entrez_query = es.inquire(
{'db': 'clinvar',
'term': genomic_pos,
'retmax': 100000,
'retstart': 0,
'rettype': 'uilist'}) # 'usehistory': False
entrez_uids = entrez_query.get_result().uids
Then I have used Entrez from BioPython to get the available ClinVar records:
# process each VariationArchive of each UID
handle = Entrez.efetch(db='clinvar', id=current_entrez_uids, rettype='vcv')
clinvar_records = {}
tree = ET.parse(handle)
root = tree.getroot()
This approach is working. However, I have two main drawbacks:
entrezpy fulls up my log file recording all interaction with Entrez making the log file too big to be read by the hospital collaborator, who is variant curator.
entrezpy function, entrez_query.get_result().uids, will return all UIDs retrieved so far from all the requests (say a request for each variant in GSvar), thus this space inefficient retrieval. That is the entrez_uids list will quickly grow a lot as I process all variants from a GSVar file. The simple solution that I have implenented is to check which UIDs are new from the current request and then keep only those for Entrez.fetch(). However, I still need to keep all seen UIDs, from previous variants in order to be able to know which is the new UIDs. I do this in code by:
# first snippet's first lines go here
entrez_uids = entrez_query.get_result().uids
current_entrez_uids = [uid for uid in entrez_uids if uid not in self.all_entrez_uids_gsvar_file]
self.all_entrez_uids_gsvar_file += current_entrez_uids
Does anyone have suggestion(s) on how to address these two presented drawbacks?
I'm trying to process this image provided by messenger-platform API (send-api-reference)
I used:
url = "https://scontent-lht6-1.xx.fbcdn.net/v/t34.0-12/20916840_10214193209010537_198030613_n.jpg?_nc_ad=z-m&oh=3eab9a3a400c7e05fb5b74c391852426&oe=5998B9A8"
#app.route('/photobot/<path:photo_url>')
def tensor_photobot(photo_url):
file = cStringIO.StringIO(urllib.urlopen(photo_url).read())
img = Image.open(file)
if img:
list_elements = process_image(img)
return json.dumps(list_elements)
But the image is not recognized. Any idea?
Message:
{u'mid': u'mid.$cAAbv-uhIfdVkIn9OVld8TqA6u2Hz', u'seq': 40125,
u'attachments': [{u'type': u'image', u'payload': {u'url':
u'https://scontent-lht6-1.xx.fbcdn.net/v/t34.0-12/20916840_10214193209010537_198030613_n.jpg?_nc_ad=z-m&oh=3eab9a3a400c7e05fb5b74c391852426&oe=5998B9A8'}}]}
[Reference][1] python 2.x
[1]:
https://developers.facebook.com/docs/messenger-platform/send-api-reference/image-attachment
Edit: following comment recommendations, I detected the problem is from url-string truncation.
I added all the implementation for more context.
From my comment in case the answer is needed by anyone in the future:
The query string is being truncated from the URL. To load the image, the entire URL including the query string is required.
I want to retrieve bibtex data (for building a bibliography) by sending a DOI (Digital Object Identifier) to http://www.crossref.org from within matlab.
The crossref API suggests something like this:
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1038/nrd842
based on this source.
Another example from here suggests the following in ruby:
open("http://dx.doi.org/10.1038/nrd842","Accept" => "text/bibliography; style=bibtex"){|f| f.each {|line| print line}}
Although I've heard ruby rocks I want to do this in matlab and have no clue how to translate the ruby message or interpret the crossref command.
The following is what I have so far to send a doi to crossref and retrieve data in xml (in variable retdat), but not bibtex, format:
clear
clc
doi = '10.1038/nrd842';
URL_PATTERN = 'http://dx.doi.org/%s';
fetchurl = sprintf(URL_PATTERN,doi);
numinputs = 1;
www = java.net.URL(fetchurl);
is = www.openStream;
%Read stream of data
isr = java.io.InputStreamReader(is);
br = java.io.BufferedReader(isr);
%Parse return data
retdat = [];
next_line = toCharArray(br.readLine)'; %First line contains headings, determine length
%Loop through data
while ischar(next_line)
retdat = [retdat, 13, next_line];
tmp = br.readLine;
try
next_line = toCharArray(tmp)';
if strcmp(next_line,'M END')
next_line = [];
break
end
catch
break;
end
end
%Cleanup java objects
br.close;
isr.close;
is.close;
Help translating the ruby statement to something matlab can send using a script such as that posted to establish the communication with crossref would be greatly appreciated.
Edit:
Additional constraints include backward compatibility of the code (back at least to R14) :>(. Also, no use of ruby, since that solves the problem but is not a "matlab" solution, see here for how to invoke ruby from matlab via system('ruby script.rb').
You can easily edit urlread for what you need. I won't post my modified urlread function code due to copyright.
In urlread, (mine is at C:\Program Files\MATLAB\R2012a\toolbox\matlab\iofun\urlread.m), as the least elegant solution:
Right before "% Read the data from the connection." I added:
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
The answer from user2034006 lays the path to a solution.
The following script works when urlread is modified:
URL_PATTERN = 'http://dx.doi.org/%s';
doi = '10.1038/nrd842';
fetchurl = sprintf(URL_PATTERN,doi);
method = 'post';
params= {};
[string,status] = urlread(fetchurl,method,params);
The modification in urlread is not identical to the suggestion of user2034006. Things worked when the line
urlConnection.setRequestProperty('Content-Type','application/x-www-form-urlencoded');
in urlread was replaced with
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
I have a script, VBS or Ruby, that saves a Word document as 'Filtered HTML', but the encoding parameter is ignored. The HTML file is always encoded in Windows-1252. I'm using Word 2007 SP3 on Windows 7 SP1.
Ruby Example:
require 'win32ole'
word = WIN32OLE.new('Word.Application')
word.visible = false
word_document = word.documents.open('C:\whatever.doc')
word_document.saveas({'FileName' => 'C:\whatever.html', 'FileFormat' => 10, 'Encoding' => 65001})
word_document.close()
word.quit
VBS Example:
Option Explicit
Dim MyWord
Dim MyDoc
Set MyWord = CreateObject("Word.Application")
MyWord.Visible = False
Set MyDoc = MyWord.Documents.Open("C:\whatever.doc")
MyDoc.SaveAs "C:\whatever2.html", 10, , , , , , , , , , 65001
MyDoc.Close
MyWord.Quit
Set MyDoc = Nothing
Set MyWord = Nothing
Documentation:
Document.SaveAs: http://msdn.microsoft.com/en-us/library/bb221597.aspx
msoEncoding values: http://msdn.microsoft.com/en-us/library/office/aa432511(v=office.12).aspx
Any suggestions, how to make Word save the HTML file in UTF-8?
Hi Bo Frederiksen and kardeiz,
I also encountered the problem of "Word Document.SaveAs ignores encoding" today in my "Word 2003 (11.8411.8202) SP3" version.
Luckily I managed to make msoEncodingUTF8(namely, 65001) work in VBA code. However, I have to change the Word document's settings first. Steps are:
1) From Word's 'Tools' menu, choose 'Options'.
2) Then click 'General'.
3) Press the 'Web Options' button.
4) In the popping-up 'Web Options' dialogue, click 'Encoding'.
5) You can find a combobox, now you can change the encoding, for example, from 'GB2312' to 'Unicode (UTF-8)'.
6) Save the changes and try to rerun the VBA code.
I hope my answer can help you. Below is my code.
Public Sub convert2html()
With ActiveDocument.WebOptions
.Encoding = msoEncodingUTF8
End With
ActiveDocument.SaveAs FileName:=ActiveDocument.Path & "\" & "file_name.html", FileFormat:=wdFormatFilteredHTML, Encoding:=msoEncodingUTF8
End Sub
Word can't do this as far as I know.
However, you could add the following lines to the end of your Ruby script
text_as_utf8 = File.read('C:\whatever.html').encode('UTF-8')
File.open('C:\whatever.html','wb') {|f| f.print text_as_utf8}
If you have an older version of Ruby, you may need to use Iconv. If you have special characters in 'C:\whatever.html', you'll want to look into your invalid/undefined replacement options.
You'll also probably want to update the charset in the HTML meta tag:
text_as_utf8.gsub!('charset=windows-1252', 'charset=UTF-8')
before you write to the file.
My solution was to open the HTML file using the same character set, as Word used to save it.
I also added a whitelist filter (Sanitize), to clean up the HTML. Further cleaning is done using Nokogiri, which Sanitize also rely on.
require 'sanitize'
# ... add some code converting a Word file to HTML.
# Post export cleanup.
html_file = File.open(html_file_name, "r:windows-1252:utf-8")
html = '<!DOCTYPE html>' + html_file.read()
html_document = Nokogiri::HTML::Document.parse(html)
Sanitize.new(Sanitize::Config::RESTRICTED).clean_node!(html_document)
html_document.css('html').first['lang'] = 'en-US'
html_document.css('meta[name="Generator"]').first.remove()
# ... add more cleaning up of Words HTML noise.
sanitized_html = html_document.to_html({:encoding => 'utf-8', :indent => 0})
# writing output to (new) file
sanitized_html_file_name = word_file_name.sub(/(.*)\..*$/, '\1.html')
File.open(sanitized_html_file_name, 'w:UTF-8') do |f|
f.write sanitized_html
end
HTML Sanitizer: https://github.com/rgrove/sanitize/
HTML parser and modifier: http://nokogiri.org/
In Word 2010 there is a new method, SaveAs2: http://msdn.microsoft.com/en-us/library/ff836084(v=office.14).aspx
I haven't tested SaveAs2, since I don't have Word 2010.
This is my first time posting here - it's a great resource, I keep seeming to find solutions on here. I'm writing code to display an image gallery of YouTube videos on a website. I'm using Classic ASP to parse the RSS feed, and so far I've successfully got the thumbnail of the YouTube video. Now I'm trying to display only one of the 4 Jpegs - the URL of the YouTube RSS for thumbnails seems to be in the following format:
http://i.ytimg.com/vi/oh_OMkstzMQ/0.jpg
http://i.ytimg.com/vi/oh_OMkstzMQ/1.jpg
http://i.ytimg.com/vi/oh_OMkstzMQ/2.jpg
http://i.ytimg.com/vi/oh_OMkstzMQ/3.jpg
So, I was wondering if someone could suggest a way to only get 0.jpg from the feed? I'll post my code below:
<%
Dim xml, xhr, ns, YouTubeID, TrimmedID, GetJpeg, GetJpeg2, GetJpeg3, thumbnailUrl, xmlList, nodeList, TrimmedThumbnailUrl
Set xml = Server.CreateObject("MSXML2.FreeThreadedDOMDocument")
xml.async = False
xml.setProperty "ServerHTTPRequest", True
xml.Load("http://gdata.youtube.com/feeds/api/users/Shuggy23/favorites?orderby=updated")
If xml.parseError.errorCode <> 0 Then
Response.Write xml.parseError.reason
End If
Set xmlList = xml.getElementsByTagName("entry")
Set nodeList = xml.SelectNodes("//media:thumbnail")
For Each xmlItem In xmlList
YouTubeID = xmlItem.getElementsByTagName("id")(0).Text
TrimmedID = Replace(YouTubeID, "http://gdata.youtube.com/feeds/api/videos/", "")
For Each xmlItem2 In nodeList
thumbnailUrl = xmlItem2.getAttribute("url")
Response.Write thumbnailUrl & "<br />"
Next
Next
%>
Hope someone can help.
Thanks very much.
Douglas
If you just want to get 0.jpg from the thumbnail URL, try:
Right(thumbnailUrl, Len(thumbnailUrl) - InStrRev(thumbnailUrl, "/"))
If you want to just get the first thumbnail, you could use Exit For to bail out of the loop.