Unable to init from given url while reading S3 image through Cloudfront - laravel

I have Laravel backend with Intervention package to handle images.
All images are stored in S3 bucket and are accessed through CDN - Cloudfront distribution.
When user requests image from my app I use something like this line to fetch image and return it to user.
return Image::make("https://*********.cloudfront.net/S3_image_path_and_filename.jpg")->response()->header('Cache-Control', 'max_age=3600');
This code worked without any troubles for more than a year without any changes but recently I started to see that some images on client side are empty and there are errors saying
Intervention\Image\Exception\NotReadableException Unable to init from given url
I looked up for issues with missing images but they exist in S3 bucket and are accessible through Cloudfront. I even see them on client side after clearing cache or waiting for some time - so this issue is flacky, can't reproduce it. I even don't see any error associated with images in Cloudfront distribution.
My questions are
what could be the reasons behind this issue?
how I can debug it?
maybe I need to handle the exception above but I am not sure what should be returned to user in that case?

Related

Google Drive API Console: Error saving Drive UI integration page

I have a webapp in production that interacts with Google Drive through Google Drive API.
I need to change some settings in Drive interaction but I can't save.
When I save the Drive UI integration page, I receive this error:
There's a problem at our end.
Please try again. If the problem persists, please let us know using
the "Send feedback" link below. Thanks!
(spying Network console: there is an Internal Server Error in a POST call)
I tried to send feedback for months: nobody answers and the bug is still there.
I tried also to create another project: I can save the first time but then the bug returns.
How can I do? Has someone the same problem?
Is there a way to receive a reply from Google? Is there some workaround?
Thank you.
i think that problem must be Client ID
before adding Client ID, go to the Credentials -> OAuth 2.0 Client IDs
then select edit your Client ID. after that your production site url add to Authorized JavaScript origins and Authorized redirect URIs.
then enter your Client ID in Drive UI integration page
For myself trying to get the Drive UI configured I noticed a couple of errors (that don't have any specific error messages)
When adding in an Open URL it has to be a valid domain, so for instance I tried to test it out with local host, to no avail. However something like https://devbox.app.com worked, but something like https://localhost:8888 does not. Even though https://localhost is a valid javascript origin in the client_id configuration (at least for the app I am working on, not sure about other apps), localhost doesn't work as an open URL.
When adding in the mimeTypes it needs to be in the format */* and can include custom mimeTypes like application/custom+xml and application/custom-name+json not sure for other custom types that are not in a particular format like xml or json. Also not sure about wildcards.
When adding in file extensions do not add in the '.' just the name of the file extension.
The app icon I found only failed to upload the image when the image wasn't the exact dimensions, I actually ended up editing some icons in photoshop to change the pixel x pixel values as a quick work around during dev.
That worked for me to get it to save and I tested it with a file that had a custom mimeType (application/custom-name+xml specifically) and custom file extension!

Why does scrapy gives 404 for images that are available?

This is an example for an image that I add to the image_urls field.
http://static.zara.net/photos//2014/I/0/2/p/5875/309/800/2/w/1920/5875309800_1_1_1.jpg
Yet I get this warning and the image is not uploaded.
[zara_com] WARNING: File (code: 404): Error downloading image from http://static.zara.net/photos//2014/I/0/2/p/5875/309/800/2/w/1920/5875309800_1_1_1.jpg> referred in
Though an image like this one:
http://static.zara.net/photos//2014/V/1/3/p/1280/303/105/2/w/1920/1280303105_2_1_1.jpg
is uploaded normally.
What might be the problem? what should I check?
As far as I can see, they seem to be filtering requests made with the default scrapy user agent:
'User-Agent': 'Scrapy/0.24.2 (+http://scrapy.org)'
When I changed the USER_AGENT setting in settings.py of my project, it started returning 200 on all requests. The strange thing is that before that it returned 404 even on the image, which you said is returned normally.
P.S. It's not very good to scrape content from a site, if they are not allowing it, but well it's not like they are disallowing it in their robots.txt. Still you should probably enable the RobotsTxtMiddleware and the AutoThrottle extension to ensure you are playing fairly.

upload huge numbers of files to blob

Do you know best way to upload too many files to Azure Blob container?
I am currently do something to upload multiple files to Azure blob storage. The number of files may be huge, like 30,000 or more(each file could be sized of 10KB~1MB). Firstly, I have a list of files locations, then I would use Parallel.Foreach to upload the files. code snippet like this:
`List locations=...
Parallel.Foreach(locations, location=>
{
...
UploadFromStream(...);
...
});`
The codes run to inconsistence results.
Sometimes it runs well, I can see all files uploaded to the Azure blob container.
Sometimes, I will got exceptions like this:
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature., Inner Exception: The remote server returned an error: (403) Forbidden
Sometimes, I got a timeout exception.
I have worked against the issue for several days, unfortunatly, I havn't got a perfect solution yet. So I want to know how do you do when you handling similar scenario, how do you do when upload too many files to Azure blob storage?
Finally, I have not found what's wrong with my code.
However, I have found solution for this issue. I expire Parallel.Foreach, just use common foreach. Then I use BeginuploadFromStream method instead of UploadFromStream, it actually upload files asynchronously.
So far, it runs prefectly, more stable, without any exception happens.

How can I scrape an image that doesn't have an extension?

Sometimes I come across an image that I can't scrape so that it can be saved. An example of this is:
https://s3.amazonaws.com/plumdistrict.com-production/perks/12321/image/original.?1325898487
When I hit the url from Internet Explorer I see the image but when I try to get it from the code below I get the following error message "System.Net.WebException The remote server returned an error: (403) Forbidden" error with GetResponse:
string url = "https://s3.amazonaws.com/plumdistrict.com-production/perks/12321/image/original.?1325898487";
WebRequest request = WebRequest.Create(url);
WebResponse response = request.GetResponse();
Any ideas on how to get this image?
Edit:
I am able to get to save images that do have extensions. For example I can scrape the following image just fine:
https://s3.amazonaws.com/plumdistrict.com-production/perks/12659/image/original.jpg?1326828951
Although HTTP is originally supposed to be stateless, there are a lot of implementations that rely on it being stateless. I could configure my webserver to only accept requests for "http://mydomain.com/sexy_avatar.jpg" if you provide a cookie proving you were logged in. If not, I send you a redirect 303 to "http://mydomain.com/avatar_for_public_use.jpg".
Amazon could be doing the same. Try to load the web page using Chrome, and look at the Network view in developer mode (CTRL+SHIFT+J) to see all headers supplied to the website. Maybe you even need to do a full navigation in the same session before you are allowed to see the image. This is certainly the case in many web applications I have developed :-)
Well, it looks like it's being generated from a script (possibly being retrieved from a database). The server should be sending a file/content type to go along with that... but it doesn't seem to be, which I believe is a violation of standards.
My Linux box knows full well that that's a JPEG image once it's on my hard drive, because it examines file headers rather than relying on extensions. Perhaps there is a tool to do the same in Windows?
Edit: Actually, on further contemplation, it seems odd that you'd get a 403 for that. Perhaps the server is actually blocking you from retrieving the file in that manner.

GET request to mp3 in S3 bucket failing to download file with 206 partial content?

I have an mp3 file in an S3 bucket. I am fetching this file via ajax GET request for html5 audio playback. Intermittently, the get request will fail to download the file and thus the track will not play. The request returns "206 partial content." Oddly, it will work several times before failing and then continuing to fail.
If I disable caching in my browser (chrome), the file will download and play appropriately.
Have I configured s3 incorrectly? How can I get this mp3 file to download and play consistently?
specific file is located here: https://s3.amazonaws.com/1m40s_dev/assets/music/walden.mp3
thanks!
I've found this often relates to the MIME type set on the S3 hosted file.
Setting the correct MIME type seems to fix things.
On a side note, I struggled with a single binary file always breaking in IE. Its MIME type was application/octet-stream. I changed the MIME to binary/octet-stream and that seemed to fix downloads from IE. Not sure why.
use amazon cloudfront solve the problem
I solved this by appending a timestamp to the end of the mp3 url on page load. This forced a new download of the content each time and eliminated the caching error.
This feels more like a work around than a fix. I still don't know the root cause of the issue but if you find yourself having a similar problem and just need to move on, add a timestamp or random number as a param at the end of the url
.../assets/music/walden.mp3?[timestamp]
One other workaround I've found is, if you're using rails, turning off turbolinks makes this go away on chrome. I'll add more to my answer as I discover more.

Resources