Good morning. I have downloaded the Yahoo Flickr Creative Commons 100M (14G) Dataset from the official website. When i extracted it i got a 48 GB file wiithout extension. I also have a file .txt where it explains how the dataset is composed and it says that is formed with a lot of record: for any image are registered some information like the link to download, Photo/video identifier,Photo/video hash,User nickname, Date taken and other fields.
Now, i only need the images and the associated hash, so the question is: how do i get it? I have litteraly no idea. Thank you everyone for the help
Blockquote
EDIT: I have managed to open the file with Word, but not all of it because is too big and i have over 10000 record like this, for example:
0 6985418911 4e2f7a26a1dfbf165a7e30bdabf7e72a 39089491#N00 nino63004 2012-02-16 09:56:37.0 1331840483 Canon+PowerShot+ELPH+310+HS IMG_0520 canon,canon+powershot+hs+310,carnival+escatay,cruise,elph,hs+310,key+west+florida,powershot -81.804885 24.550558 12 (link to flickr that i can't post) (other link) Attribution-NonCommercial-NoDerivs License (other link) 7205 8 df7747990d 692d7e0a7f jpg 0
Blockquote
Related
Whenever I try to upload my dataset to the AutoML Natural Language Web UI, I get the error
Something is wrong, please try again.
The documentation is not very insightful about how my CSV file is supposed to look, but I tried to make a simple sample file just to make sure it works at all, it looks like this:
text,label
asdf,cat
asodlkao,dog
asdkasdsadksafask,cat
waewq23,cat
dads,cat
saiodjas,cat
skdoaskdoas,dog
hgfkgizk,dog
fzdrgbfd,cat
otiujrhzgf,cat
vchztzr,dog
aksodkasodks,dog
sderftz,dog
dsoakd,dog
qweqweqw,cat
asdqweqe,cat
dkawosdkaodk,dog
ewqeweq,cat
fdsffds,dog
bvcghh,cat
rthnghtd,dog
sdkosadkasodk,cat
sdjidghdfig,cat
kfodskdsof,dog
saodsadok,dog
ksaodksaod,dog
vncvb,cat
I chose this formatting according to the Google suggested Syntax
But even with this formatting I still get the same error
I've seen this question Format of the input dataset for Google AutoML Natural Language multi-label text classification but according to the answers there it seems my formatting should work, so I do not know why I get the error
I've just copied the CSV file and uploaded it to my own project and the dataset created worked. One problem is that an extra label was created "label" - this is because the header is not expected to be in the csv file (probably this should get fixed).
Based on that it seems the problem isn't the CSV file format. I would recommend to check if your project is setup correctly. You can open a bug to get someones help. Either you can open a bug in public issue tracker or send feedback using the UI (there is 'Feedback' option in the menu on top right side of the page).
I have found the problem! As Michal K said, there was nothing wrong with the formatting, the real problem was I was not assigned the role of Storage Object Creator, which is necessary because the Data is uploaded in Cloud Storage first
I am working on a project that requires to work with Genia corpus. According to the literature Genia Corpus is made from articles extracted by searching 3 Mesh terms : “transcription factor”, “blood cell” and “human” on Medline/Pubmed. I want to extract full text article(which are freely available) for the articles in Genia corpus from Pubmed. I have tried many approaches but I am not able to find a way to download full text in text or XML or Pdf format.
Using Entrez utils provided by NCBI :
I have tried using the approach mentioned here -
http://www.hpa-bioinformatics.org.uk/bioruby-api/classes/Bio/NCBI/REST/EFetch/Methods.html#M002197
which uses the Ruby gem Bio like this to get the information for a given PubMed ID -
Bio::NCBI::REST::EFetch.pubmed(15496913)
But, it doesn't return the full text for the PMID.
Internally, it makes a call like this -
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=1372388&retmode=text&rettype=medline
But, both the Ruby gem and the above call don't return the full text.
On further Internet search, I found that the allowed values for PubMed for rettype and retmode don't have an option to get the full text, as mentioned in the table here -
http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly
All the examples and other scripts I have seen on the Internet are only about extracting abstracts. authors etc. and none of them discuss extracting the full text.
Here is another link that I found that uses Python package Bio, but only accesses the information about authors -
https://www.biostars.org/p/172296/
How can I download full text of the article in text or XML or Pdf format using Entrez utils provided by NCBI? Or are there already available scripts or web crawlers that I can use?
You can use biopython to get articles which are on PubMedCentral and then get PDF from it. For all articles which are hosted somewhere else, it is difficult to get a generic solution to get the PDF.
It seems that PubMedCentral does not want you to download articles in bulk. Requests via urllib are blocked, but the same URL works from a browser.
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
#id is a string list with pubmed IDs
#two of have a public PMC article, one does not
handle = Entrez.efetch("pubmed", id="19304878,19088134", retmode="xml")
records = Entrez.parse(handle)
#checks for all records if they have a PMC identifier
#prints the URL for downloading the PDF
for record in records:
if record.get('MedlineCitation'):
if record['MedlineCitation'].get('OtherID'):
for other_id in record['MedlineCitation']['OtherID']:
if other_id.title().startswith('Pmc'):
print('http://www.ncbi.nlm.nih.gov/pmc/articles/%s/pdf/' % (other_id.title().upper()))
I'm working on the exact same problem using ruby. So far I was able to achieve moderate success by doing the following with ruby:
use the Mechanize+esearch from eutils to get an XML of your pubmed search, and then use Mechanize/Nokogiri to parse the PMIDs from the XML
use the Mechanize+ID converter to convert the PMIDs to PMCIDs (when available). If you really are only interested in the papers available on PMC, you can set up the esearch to return PMCIDs as well.
once you have the PMCIDs, you can use Mechanize to access the webpage, click on the pdf click on the page, and use Mechanize to save the file.
It's by no means straightforward but still not that bad. There is a gem that claims to do the same (https://github.com/billgreenwald/Pubmed-Batch-Download). I plan to test that out soon.
If you want XML or JSON by PubMed ID or PMC, then you want to use the "BioC API" to access PubMed Central (PMC) Open Access articles.
(see https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/ )
Here an code-example:
https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_xml/19088134/ascii
I found this plug-in called YouMax which embeds your youtube channel into your website, the only problem that I'm having with this plug-in is changing the amount of video results that are collected, it default is 25 videos I want to chane this to another value like 12 or 24.
http://www.codehandling.com/2013/03/youmax-20-complete-youtube-channel-on.html?m=1
There seems to be 3 sections to this plug-ins results: Featured, Uploads, and Playlists.
I edited the youmax.min.js file for the Featured section because it is the first results page that loads. My edit was very small. Essentially, I added the following:
&start-index=1&max-results=2
at the end of the string var apiFeaturedPlaylistVideosURL
This var is located inside the function: function getFeaturedVideos(playlistId)
You can change the result from 2 to 12 or whatever you want and that will be the max amount of results you get back from youtube.
Also- you can add this same argument (&start-index=1&max-results=2) to the Uploads and Playlists function in the youmax.min.js file if thats where you want to limit your results instead (or in addition to Featured section).
I created a copy of my edited youmax.min.js file in jsfiddle. My edit comes on line 152 on jsfiddle. Try downloading it and giving it a try. I hope it helps:
http://jsfiddle.net/wCKKU/
Youmax 2.0 (free version) has been upgraded which has the maxResults option builtin - http://demos.codehandling.com/youmax/home.html
You already get a "maxResults" option with the plugin and a "Load More" functionality.
Regarding the timestamps, you can try the PRO version which has options to display relative timestamps (2 hours ago) or fixed timestamps (23 March 2016)
Cheers :)
I’m new to EE and trying to learn the basics. Some questions about the File Manager:
I upload a photo and put “cat, kitten” in the description. When I do a search for “kitten”, it finds the photo. But when I do a search for “cat”, I get nothing. Any ideas what’s going on?
The file metadata are: file title, file name, description, credit, and location. What if I wanted to add custom fields? How do I do that?
In the template files, how do I access a particular manipulation (I call this “rendition”) of an image? Say I define a rendition “thumbnail” to be 100x100. How do I access that particular rendition in a template?
Is there a way to randomize the file names of the files being uploaded?
After uploading an image and testing it against PageSpeed, it turns out that the image can still be optimized via losslessly compressing it. How can this problem be addressed?
Ah, the file manager. Not EE's brightest spot.
It would not surprise me if the search in the File Manager was not
very robust. I'd try more variations to narrow it down (what kind of
characters affect the results - commas, dashes, spaces, etc ... do
partial terms match?)
You cannot currently add custom metadata to files in the file manager.
Use this syntax: {field_name:rendition}, e.g.,
{my_image:thumbnail} (docs).
Nope.
EE just uses the GD library available in your PHP install to resize
images. If you want the highest possible optimization, you'll have
to do your image manipulations yourself.
Given your queries, I would suggest you have a look at Assets by Pixel and Tonic. It offers a far superior file management experience on most of these fronts.
Now that my tool can load and display images as well as their metadata - using metadata extractor - i'd also like to see information about who made the file and what was the name of that person's computer.
Target is: Changing or deleting the original information.
I guess that reading the information is not necessary if i could generate a new image - looking exactly like the source-file but having new - changed - metadata.
Example:
There is an image A onwned by John Smith and was produced at John's Computer.
Now i want to make an image B looking like image A but saying it was made by Cate Smith at Cate's Computer.
Hope some one can help!
Thanks!