How can I scrape an image that doesn't have an extension?

How can I scrape an image that doesn't have an extension? - image

Sometimes I come across an image that I can't scrape so that it can be saved. An example of this is:
https://s3.amazonaws.com/plumdistrict.com-production/perks/12321/image/original.?1325898487
When I hit the url from Internet Explorer I see the image but when I try to get it from the code below I get the following error message "System.Net.WebException The remote server returned an error: (403) Forbidden" error with GetResponse:
string url = "https://s3.amazonaws.com/plumdistrict.com-production/perks/12321/image/original.?1325898487";
WebRequest request = WebRequest.Create(url);
WebResponse response = request.GetResponse();
Any ideas on how to get this image?
Edit:
I am able to get to save images that do have extensions. For example I can scrape the following image just fine:
https://s3.amazonaws.com/plumdistrict.com-production/perks/12659/image/original.jpg?1326828951

Although HTTP is originally supposed to be stateless, there are a lot of implementations that rely on it being stateless. I could configure my webserver to only accept requests for "http://mydomain.com/sexy_avatar.jpg" if you provide a cookie proving you were logged in. If not, I send you a redirect 303 to "http://mydomain.com/avatar_for_public_use.jpg".
Amazon could be doing the same. Try to load the web page using Chrome, and look at the Network view in developer mode (CTRL+SHIFT+J) to see all headers supplied to the website. Maybe you even need to do a full navigation in the same session before you are allowed to see the image. This is certainly the case in many web applications I have developed :-)

Well, it looks like it's being generated from a script (possibly being retrieved from a database). The server should be sending a file/content type to go along with that... but it doesn't seem to be, which I believe is a violation of standards.
My Linux box knows full well that that's a JPEG image once it's on my hard drive, because it examines file headers rather than relying on extensions. Perhaps there is a tool to do the same in Windows?
Edit: Actually, on further contemplation, it seems odd that you'd get a 403 for that. Perhaps the server is actually blocking you from retrieving the file in that manner.

Related

Google Drive API Console: Error saving Drive UI integration page

I have a webapp in production that interacts with Google Drive through Google Drive API.
I need to change some settings in Drive interaction but I can't save.
When I save the Drive UI integration page, I receive this error:
There's a problem at our end.
Please try again. If the problem persists, please let us know using
the "Send feedback" link below. Thanks!
(spying Network console: there is an Internal Server Error in a POST call)
I tried to send feedback for months: nobody answers and the bug is still there.
I tried also to create another project: I can save the first time but then the bug returns.
How can I do? Has someone the same problem?
Is there a way to receive a reply from Google? Is there some workaround?
Thank you.

i think that problem must be Client ID
before adding Client ID, go to the Credentials -> OAuth 2.0 Client IDs
then select edit your Client ID. after that your production site url add to Authorized JavaScript origins and Authorized redirect URIs.
then enter your Client ID in Drive UI integration page

For myself trying to get the Drive UI configured I noticed a couple of errors (that don't have any specific error messages)
When adding in an Open URL it has to be a valid domain, so for instance I tried to test it out with local host, to no avail. However something like https://devbox.app.com worked, but something like https://localhost:8888 does not. Even though https://localhost is a valid javascript origin in the client_id configuration (at least for the app I am working on, not sure about other apps), localhost doesn't work as an open URL.
When adding in the mimeTypes it needs to be in the format */* and can include custom mimeTypes like application/custom+xml and application/custom-name+json not sure for other custom types that are not in a particular format like xml or json. Also not sure about wildcards.
When adding in file extensions do not add in the '.' just the name of the file extension.
The app icon I found only failed to upload the image when the image wasn't the exact dimensions, I actually ended up editing some icons in photoshop to change the pixel x pixel values as a quick work around during dev.
That worked for me to get it to save and I tested it with a file that had a custom mimeType (application/custom-name+xml specifically) and custom file extension!

How can I validate http response headers?

It's the first time I am doing something with headers. I am mainly concerned with Cache-Control but there may be others I will need to check as well. For example, I try to send the following header to the browser (based on tutorials I just read):
Cache-Control:private, max-age=2011-12-30 11:40:56
Google Chrome displays it this way in Network -> Headers -> Response headers, but how do I know if it's correct, that there aren't any typos, syntax errors and such? Will it really work? Will the browser behave like I want it to, or will it treat it like a gibberish (something like "unknown header/value")? I've tried sending nonsensical headers on purpose but they got displayed with the rest. Is there any Chrome tool / addon for that, or any other way? Thank you in advance!

I'm afraid you won't be able to check if the resource has been cached by proxies en route, but you can check if your browser has cached it.
While in the Network panel of Chrome DevTools, hit F5 to reload your page. You should see something like "304 Not Modified" in the status field for the resource you are treating (which means the resource has not been modified and its contents were not received from the server but rather loaded from the browser's cache.)

Cross domain ajax POST in chrome

There are several topics about the problem with cross-domain AJAX. I've been looking at these and the conclusion seems to be this:
Apart from using somthing like JSONP, or a proxy sollution, you should not be able to do a basic jquery $.post() to another domain
My test code looks something like this (running on "http://myTestdomain.tld/path/file.html")
var myData = {datum1 : "datum", datum2: "datum"}
$.post("http://External-Ip:port", myData,function(return){alert(return);});
When I tried this (the reason I started looking), chrome-console told me:
XMLHttpRequest cannot load
http://External-IP:port/page.php. Origin
http://myTestdomain.tld is not allowed
by Access-Control-Allow-Origin.
Now this is, as far as I can tell, expected. I should not be able to do this. The problem is that the POST actually DOES come trough. I've got a simple script running that saves the $_POST to a file, and it is clear the post gets trough. Any real data I return is not delivered to my calling script, which again seems expected because of the Access-control issue. But the fact that the post actually arrived at the server got me confused.
Is it correct that I assume that above code running on "myTestdomain" should not be able to do a simple $.post() to the other domain (External-IP)?
Is it expected that the request would actually arrive at the external-ip's script, even though output is not received? or is this a bug. (I'm using Chrome 11.0.696.60 )

I posted a ticket about this on the WebKit bugtracker earlier, since I thought it was weird behaviour and possibly a security risk.
Since security-related tickets aren't publicly viewable, I'll quote the reply from Justin Schuh here:
This is implemented exactly as required by the spec. For simple cross-origin requests http://www.w3.org/TR/cors/#simple-method> there is no pre-flight check; the request is made and the response cannot be read if the appropriate headers do not authorize the requesting origin. Functionally, this is no different than creating a form and using script to make an off-origin POST (which has always been possible).
So: you're allowed to do the POST since you could have done that anyway by embedding a form and triggering the submit button with javascript, but you can't see the result. Because you wouldn't be able to do that in the form scenario.
A solution would be to add a header to the script running on the target server, e.g.
<?php
header("Access-Control-Allow-Origin: http://your_source_domain");
....
?>
Haven't tested that, but according to the spec, that should work.
Firefox 3.6 seems to handle it differently, by first doing an OPTIONS to see whether or not it can do the actual POST. Firefox 4 does the same thing Chrome does, or at least it did in my quick experiment. More about that is on https://developer.mozilla.org/en/http_access_control

The important thing to note about the JavaScript same-origin policy restriction is that it is something built into modern browsers for security - it is not a limitation of the technology or something enforced by servers.
To answer your question, neither of these are bugs.
Requests are not stopped from reaching the server - this gives the server the option to allow these cross-domain requests by setting the appropriate headers1.
The response is also received back by the browser. Before the use of the access control headers 1, responses to cross-domain requests would be stopped dead in their tracks by a security conscious browser - the browser would receive the response but it would not hand it off to the script. With the access control headers, the server has the option of setting the appropriate headers indicating to a compliant browser that it would like to allow certain origin URLs to make cross domain requests.
The exact behaviour on response might differ between browsers - I can't recall for sure now but I think Chrome calls the success callback function when using jQuery's ajax() but the response is empty. IIRC, Firefox will not invoke the success function.

I get the same thing happening for me. You are able to post across domains but are not able to receive a response. This is what I expected to be able to do and happens for me in Firefox, Chrome, and IE.
One way to kind of get around this caveat is having a local php file with will call the data via curl and respond the response to your javascript. (Kind of restated what you said you knew already.)

Yes, it's correct and you won't be able to do that unless you use any proxy.
No, request won't go to the external IP as soon as there is such limitation.

BITS error codes

I'm writing an application updater that pulls installation package from our distribution web site to the user's PC using the background intelligent download service facility.
More or less everything is working fine now but I'm having a bit of problem getting the application react well to all recoverable errors. Specifically, I'd like the application to handle properly the case of proxy authentication.
In HTTP, it's simple: make a request, get a "407" HTTP response code, prompt for user name/password and repeat until you ether go through or the user press "cancel".
With BITS, it's not that simple. I don't get the HTTP status code. I get a couple of codes: the context (which should be BG_ERROR_CONTEXT_REMOTE_FILE in my case) and an "ErrorCode" that is supposed to depend on the context.
If I request the textual description of the error through GetErrorDescription, I get the correct "407 proxy authentication require" text. But the error code I have is 0x80190197 which is nowhere near 407.
So, does anyone know where I can get a full list of the BITS error code ? Failing that, partial list with the most common errors would be nice.

0x80190197 is not strictly speaking a BITS error, it's an HTTP stack error. The list is available here: Errors (019) FACILITY_HTTP

Ajax call from local html page in Webkit Qt

I'm trying to perform an Ajax/XMLHTTPrequest from within a local HTML file in QT 4.7RC QWebview. It consistently fails with an empty responseText and status 0. I've set the follwing
page->settings()->setAttribute(QWebSettings::LocalContentCanAccessRemoteUrls,true);
but it has no effect (I can load remote images without problems though).
It seems to be a known issue and I'm not sure if there is a solution already.
https://bugs.webkit.org/show_bug.cgi?id=31875
Any ideas for a workaround would be very helpful. Basically what I'm trying to do is running a HTML/Javascript WebApp in QWebview that talks to a local server at 127.0.0.0 and this problem is kind of a show-stopper. Interestingly, the actual query is sent and my server responds with 200 and the requested data. But the response never arrives in my Javascript callbacks.

Not sure about your question but are you sure that you are inside an AJAX security sandbox that works with webkit? In Firefox, IE and others using AJAXin different domains does not work. In fact, http://demo1.demo.com is different than demo2.demo.com

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I scrape an image that doesn't have an extension? - image

Related

Google Drive API Console: Error saving Drive UI integration page

How can I validate http response headers?

Cross domain ajax POST in chrome

BITS error codes

Ajax call from local html page in Webkit Qt

Categories

Resources