Get real path for google images - image

I am trying to fetch google trends page and the images that are shown are something in the code are actually like this
http://t2.gstatic.com/images?q=tbn:ANd9GcReZ5l9k236B8fRJQo2XuoaB30s-4wsUPZEYOWurvMjArDatu0vN_z2pHt4VAn_7Za_6xozCU3W
Since i need the real path to the image with its extension, is there any way to retrieve it?
Thank you.

First off, this sounds like you're trying to screenscrape. Your life might be easier if you can review the API for google-trends and use that instead (but I don't know that api well enough to help you on that front; I did add the google-trend (in a to be peer reviewed edit) to your post to try to attract who do.)
Having said that, can you download the image and look at the content-type (assuming that image doesn't redirect to the underlying image in which case your problem should be solved.) You didn't specify what language you were using, so I'm going to assume you're using the right one :).
Example code (python, using requests library):
import requests
r = requests.get("http://gstatic.foo.com/blah/randomkeyboardtypingdetected/")
ctype = r.headers.get("content-type",None)
lookup = {"image/jpeg":"jpg","image/png":"png"} # add others as needed
if ctype and lookup.get(ctype,None):
print lookup.get(ctype)
else:
print "Error, server didn't specify.

It is a real path, image is retrieved dynamically so path is not like regular one with file-name, extension etc.
You can check content type from response headers if you want to determine type of image.

Related

What is the behavior of an image URL which contains 2 occurrences of protocol://?

I find on the odysee.com video site that the URLs for thumbnail images contain 2 occurrences of https://, eg:
https://image-processor.vanwanet.com/optimize/s:390:220/quality:85/plain/https://thumbnails.lbry.com/cW11rfzDIDA
Is the second https: treated as the name of a directory, and the following // simply collapse into a single / (as they normally would?)
Or does the server interpret this as something different?
I am examining these URLs because I find that in Firefox, the images do not cache, while in Chrome they do. This happens even when I create my own minimal test page with several images using the same odysee.com thumbnail image URLs, bringing me to the conclusion that the issue is not related to the odysee.com page in any way.
(Another thing I observe is that these images load extremely slow, sometimes failing to load, sometimes not. But that may just be a shortcoming of the hosting website. This is true on either browser.)
Only the first protocol is used as a protocol and the other is just part of the URL. The image-processor.vanwanet.com server will receive a request that looks something like this:
GET /optimize/s:390:220/quality:85/plain/https://thumbnails.lbry.com/cW11rfzDIDA HTTP/1.1
And it can choose to do whatever it wants with that information. In this case it's probably using everything after the "plain/" as a URL and using that for something. A reason to do it like that instead of putting the thumbnails.lbry.com URL in a query parameter is that a query parameter needs percent encoding but a URL in a URL doesn't. This makes the full thing shorter and maybe easier to process.

Get Github Issue based only on title

I need to modify the body of an existing GitHub issue in a Project. All I'll be passed is the title of the issue, and a word (the word exists in the body, and I'll just need to fill the checkbox next it).
It looks like to do this I'll need to use the GET API to get the body of the issue, modify it, and then use the EDIT API to swap in the new body. However the GET API can only be called with the issue number. I need to do all this as quickly as possible. Is there some way to search via an API call?
Thoughts much appreciated!
Edit: All my issues are in the same project (and issue titles will be unique there). I've also recently discovered Github's GraphQL API, which may be applicable here.
You can use the issue search endpoint with the in and repo¹ keywords:
GET /search/issues?q=text+to+search+in:title+repo:some/repo
Of course, issue titles aren't guaranteed to be unique. You'll have to request each of the issues that comes back and see if its body contains the word you're looking for. Even in that case you could get multiple positive results.
It would be much better if you could search by issue number.
¹I've assumed that you really mean "repository" when you say "project". But if you're actually talking about GitHub Project Boards you can use the project keyword as well or instead.

Using Scrapy to download images from a google search

I am trying to download google images for a particular search.
Currently, if i have the url, my code will download the first 10 images.
However, my question is: How would i get the url for a particular search on google?
When i look at the url for any search on google, it looks very complicated and it seems hard to understand how the url was created
http://www.google.com/m/search?q=hello&site=images
This URL pulls up the mobile website, which is static and is easier to harvest images off of. All parts of the query are self-explanatory
The &q= part of the url is the actual search string. Note that some characters are converted such as space becoming plus etc.
Easy enough to fake by doing https://www.google.com/search?q=a+search
For image search https://www.google.com/search?q=a+search&tbm=isch

Import data from URL

The St. Louis Federal Reserve Bank has a great set of data available on a variety of their web pages, such as:
http://research.stlouisfed.org/fred2/series/OILPRICE/downloaddata?cid=32217
http://www.federalreserve.gov/releases/h10/summary/default.htm
http://research.stlouisfed.org/fred2/series/DGS20
The data sets get updated, some as often as daily. I tend to have an interest in the daily data (see the above settings on the URLS)
I'd like to import these kinds of price or rate data streams (accessible as CSV or Excel files at the above URLs) directly into Mathematica.
I've looked at the documentation on Importing[] but I find scant documentation (actually none) on how to go about something like this.
It looks like I need to navigate to the pages, send some data to select specific files and formats, trigger the download, then access the downloaded data from my own machine. Even better if I could access the data directly from the sites.
I had hoped Wolfram Alpha might make this sort thing easy, but I haven't had any success.
FinancialData[] would seem natural for this sort of thing, but I don't see anyway to do it. Financial data has lots of features, but I don't see a way yo get this sort of thing.
Does anyone have any experience with this or can someone point me in the right direction?
You can Import directly from a URL. For example, the data from federalreserve.gov can be obtained and visualized as follows.
url = "http://www.federalreserve.gov/datadownload/Output.aspx?";
url = url<>"rel=H10&series=a660e724c705cea4b7bd1d1b85789862&lastObs=&";
url = url<>"from=&to=&filetype=csv&label=include&layout=seriescolumn";
data = Import[url, "CSV"];
DateListPlot[data[[7 ;;]], Joined -> True]
I broke up url for convenience, since it's so long. I had to examine the contents of data before I knew exactly how to plot it - a step that is typically necessary. I'm sure that the data from stlouisfed.org can be obtained in a similar way, but it requires the use of an API with key to access it.
As Mark said, you can get the data directly from a URL. Your oil data can be imported from a different URL than you had:
http://research.stlouisfed.org/fred2/data/OILPRICE.txt
With that URL, you can do this:
oil = Import["http://research.stlouisfed.org/fred2/data/OILPRICE.txt",
"Table", "HeaderLines" -> 12, "DateStringFormat" -> {"Year", "Month", "Day"}];
DateListPlot[oil, Joined -> True, PlotRange -> All]
Note that "HeaderLines"->12 option strips off the header text in the first 12 lines (you have to count the header lines to know how many to remove). I've also specified the date format.
To find that URL, do as you did before, but click on a data series and then choose View Data from the menu on the left when you see the chart.
The documentation has a short example on extracting data out of a webpage:
http://reference.wolfram.com/mathematica/howto/CleanUpDataImportedFromAWebsite.html
Of course, what actually needs to be done will vary significantly from page to page.
discussion on how to do this with your API key here:
http://library.wolfram.com/infocenter/MathSource/7583/
the function is based on the API documentation. I haven't looked at the code for a couple of years and from memory I put it together rather quickly but I have used it regularly for over 2 years without problems. Here is an example for monthly non seasonally adjusted retail sales from early 1992 to now:
wolfram alpha also uses FRED data so you could use that as an alternative to direct import but it is more tricky to get the query right. I prefer to use FRED directly. Also from memory the data is only available on alpha the day after the release, which is not what you would typically want.

opengraph - not reading my og-tags

I am looking for a solution for 3 days now, but cannot find it ...
I have this page with build in "send-button" to Facebook.
http://www.architectura.be/nieuwsdetail_new_fb_1.asp?id_tekst=2570
I have my og tags specified, but it is not working. (random image is showing, random title, ...)
Any help would be enormously apreciated !
Thanks
Andy is right. Let me put it another way; Facebook does not read all of the page that you specify (2570), it reads the og:url and then it looks for metadata at that address (2586); because that is meant to be the "canonical" or "reference" version of the page.
I don't know what you intend, so I can't tell you how to fix that.
But I will also point out that the HTML coding on your page is extremely bad and even when you get the URL sorted out, it's possible that Facebook will be unable to read the page correctly.
David
There is something wrong, so here is a list of possibilities:
locale: Do you really need to set a locale?.
type: Try to use the right type. Maybe use 'article' or something like that instead of using 'movie'.
url: The facebook crawler is checking the page with the url you set, 'http://...?id_tekst=2586'. When i look at the source code on that page, there are no opengraph parameters. You´re setting them on page 'http://...?id_tekst=2570'
image: Use an absolute path to your image, e.g. 'http://domain/image.jpg'
fb:admins: USER_ID has to be a valid userId or username (optional)
Hope that helps a little bit?

Resources