I plan to store images on Amazon S3 how to retrieve from Amazon S3 :
file size
image height
image width ?
You can store image dimensions in user-defined metadata when uploading your images and later read this data using REST API.
Refer to this page for more information about user-defined metadata: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
Getting the file size is possible by reading the Content-Length response header to a simple HEAD request for your file. Maybe your client can help you with this query. More info on the S3 API docs.
Amazon S3 just provides you with storage, (almost) nothing more. Image dimensions are not accessible through the API. You have to get the whole file, and calculate its dimensions yourself. I'd advise you to store this information in the database when uploading the files to S3, if applicable.
On Node, it can be really easy using image-size coupled with node-fetch.
async function getSize(imageUrl) {
const response = await fetch(imageUrl);
const buffer = await response.buffer();
return imageSize(buffer);
}
Related
Hi I'm building a program in Ruby to generate alt attributes for images on a webpage. I'm scraping the page for the images then sending their src, in other words a URL, to google-cloud-vision for label detection and other Cloud Vision methods. It takes about 2-6 seconds per image. I'm wondering if there's any way to reduce response time. I first used TinyPNG to compress the images. Cloud Vision was a tad faster but the time it took to compress more than outweighed the improvement. How can I improve response time? I'll list some ideas.
1) Since we're sending a URL to Google Cloud, it takes time for Google Cloud to receive a response, that is from the img_src, before it can even analyze the image. Is it faster to send a base64 encoded image? What's the fastest form in which to send (or really, for Google to receive) an image?
cloud_vision = Google::Cloud::Vision.new project: PROJECT_ID
#vision = cloud_vision.image(#file_name)
#vision.labels #or #vision.web, etc.
2) My current code for label detection. First question: is it faster to send a JSON request rather than call Ruby (label or web) methods on a Google Cloud Project? If so, should I limit responses? Labels with less than a 0.6 confidence score don't seem of much help. Would that speed up image rec/processing time?
Open to any suggestions on how to speed up response time from Cloud Vision.
TL;DR - You can take advantage of the batching supporting in the annotation API for Cloud Vision.
Longer version
Google Cloud Vision API supports batching multiple requests in a single call to the images:annotate API. There are also these limits which are enforced for Cloud Vision:
Maximum of 16 images per request
Maximum 4 MB per image
Maximum of 8 MB total request size.
You could reduce the number of requests by batching 16 at a time (assuming you do not exceed any of the image size restrictions within the request):
#!/usr/bin/env ruby
require "google/cloud/vision"
image_paths = [
...
"./wakeupcat.jpg",
"./cat_meme_1.jpg",
"./cat_meme_2.jpg",
...
]
vision = Google::Cloud::Vision.new
length = image_paths.length
start = 0
request_count = 0
while start < length do
last = [start + 15, length - 1].min
current_image_paths = image_paths[start..last]
printf "Sending %d images in the request. start: %d last: %d\n", current_image_paths.length, start, last
result = vision.annotate *current_image_paths, labels: 1
printf "Result: %s\n", result
start += 16
request_count += 1
end
printf "Made %d requests\n", request_count
So you're using Ruby to scrape some images off a page and then send the image to Google, yeah?
Why you might not want to base64 encode the image:
Headless scraping becomes more network intensive. You have to download the image to then process it.
Now you also have to worry about adding in the base64 encode process
Potential storage concerns if you aren't just holding the image in memory (and if you do this, debugging becomes somewhat more challenging
Why you might want to base64 encode the image:
The image is not publicly accessible
You have to store the image anyway
Once you have weighed the choices, if you still want to get the image into base64 here is how you do it:
require 'base64'
Base64.encode(image_binary)
It really is that easy.
But how do I get that image in binary?
require 'curb'
# This line is an example and is not intended to be valid
img_binary = Curl::Easy.perform("http://www.imgur.com/sample_image.png").body_str
How do I send that to Google?
Google has a pretty solid write-up of this process here: Make a Vision API Request in JSON
If you can't click it (or are too lazy to) I have provided a zero-context copy-and-paste of what a request body should look like to their API here:
request_body_json = {
"requests":[
{
"image":{
"content":"/9j/7QBEUGhvdG9...image contents...eYxxxzj/Coa6Bax//Z"
},
"features":[
{
"type":"LABEL_DETECTION",
"maxResults":1
}
]
}
]
}
So now we know what a request should look like in the body. If you're already sending the img_src in a POST request, then it's as easy as this:
require 'base64'
require 'curb'
requests = []
for image in array_of_image_urls
img_binary = Curl::Easy.perform(image).body_str
image_in_base64 = Base64.encode(image_binary)
requests << { "image" => { "content" : image_in_base64 }, "imageContext" => "<OPTIONAL: SEE REFERENCE LINK>", "features" => [ {"type" => "LABEL_DETECTION", "maxResults" => 1 }]}
end
# Now just POST requests.to_json with your Authorization and such (You did read the reference right?)
Play around with the hash formatting and values as required. This is the general idea which is the best I can give you when your question is SUPER vague.
I am Working on an Android App, for which I use Parse as a backend[Parse-Heroku-Mlab (sandbox Plan)], this app provides different Services in the city, & the service owners complete information is listed in the App which also has User Images and icons.
Issue : When i try to upload images it doesn't gets uploaded(until last day it was working fine), but the text fields in the dashboard works and gets uploaded.
Parse Logs says :
2017-05-28T04:30:17.793Z - Could not store file.
2017-05-28T04:30:17.790Z - quota exceeded
Screenshots of Mongodb stats is attached.
Could it be that the issue is with Mlab.The sandbox plan gives the storage upto 512 mb. I tried freeing the space, but no go.
Try using a CDNfor storage like GCSAdapter or de S3Adapter. It is plain simple to set up. Old Images will continue work as normal.
var GCSAdapter = require('parse-server-gcs-adapter');
var gcsAdapter = new GCSAdapter('project',
'keyFilePath',
'bucket' , {
bucketPrefix: '',
directAccess: false
});
var api = new ParseServer({
appId: 'my_app',
masterKey: 'master_key',
filesAdapter: gcsAdapter
})
Mlab calculates your quota by the fileSize value output from MongoDB. The size of an image can be from 3 to 12mb in modern phones.
I'm uploading fairly large objects (~500 MB) using the v2 aws-sdk gem as follows:
object = bucket.object("#{prefix}/#{object_name}")
raise RuntimeError, "failed to upload: #{object_name}" unless object.upload_file("#{object_name}", storage_class: "STANDARD_IA")
The uploads succeed and I can see the new objects in the console, but they all have a storage class of "Standard".
When I run this same code with smaller objects they're correctly created with storage class = "STANDARD_IA".
Is this a factor of the file size? Or the fact that it's a multi-part upload? Or something else? I didn't see anything in the documentation, but its pretty "expansive" so I may just have missed it.
Was caused by a bug in aws-sdk-ruby. Pull request:
https://github.com/aws/aws-sdk-ruby/pull/1108
I'm using a JMeter (2.13 r1665067) to test a site with Google Kaptcha on log in and registration until they can be disabled in a test environment. I've recorded a session and set up a Save Responses to a file sampler to extract the kaptcha image. I then have a Beanshell sampler display it so I can enter the code as needed (thanks to this post).
The problem I'm running into now is the first image retrieved from the server is displayed repeatedly. I've tried setting any objects created in the Beanshell to null post-use, and checking the "Reset bsh.Interpreter before each call".
I was able to get around to issue by using the $__(Random) function to append a unique id to each image when created in the Save Responses to a file sampler, but results in a lot of files being created. I can verify the saved image file is changing on the filesystem. I can also restart JMeter, or clear the file from the filesystem for it to appear properly, but only the first time. Adding a timestamp via the Save Responses to a file sampler isn't unique enough, but creates additional files anyway.
I'd like to find out why JMeter seems to be caching the images and if there's a way to have a single file written and read each time, avoiding the slew of them I'd get by appending a unique ID. I imagine it has to do with my config.
Beanshell sampler code:
filenameOrURL = new URL("file://${FILE2}");
image = Toolkit.getDefaultToolkit().getImage(filenameOrURL);
icon = new javax.swing.ImageIcon(image);
pane = new JOptionPane("Enter Captcha", 0, 0, null);
String captcha = (String)pane.showInputDialog(null, "Captcha", "Captcha", 0, icon, null,null);
filenameOrURL = image = pane = icon = null;
log.info(captcha);
vars.putObject("captcha",captcha);
Save Responses to a file sampler parameters:
Filename prefix: /response/response_
Variable name: FILE
Thread Group:
I'd post an image if my reputation preceded me. :blush:
Recording Controller
login.html (GET)
Save Responses to a file
BeanShell Sampler
login.html (POST)
logout.html (GET)
Your issue is not really with JMeter, but with the Toolkit.getImage() function. From its documentation:
Returns an image which gets pixel data from the specified file, whose format can be either GIF, JPEG or PNG. The underlying toolkit attempts to resolve multiple requests with the same filename to the same returned Image.
Since the mechanism required to facilitate this sharing of Image objects may continue to hold onto images that are no longer in use for an indefinite period of time, developers are encouraged to implement their own caching of images by using the createImage variant wherever available. If the image data contained in the specified file changes, the Image object returned from this method may still contain stale information which was loaded from the file after a prior call. Previously loaded image data can be manually discarded by calling the flush method on the returned Image.
agent = Mechanize.new
url = "---------------------------"
page = agent.get(url)
Now, I want to know the KB(kilobytes) of data that has been used by my internet service provider to scrape that data.
More specifically, whats the size in KB, of the variable "page"?
page.content.bytesize / 1024.0
It's really two separate things. The size of unzipped response body and the amount of bytes that were transferred. You can get the first by inspecting page.body, for the second you would need to measure response and request headers as well as account for things like gzip and redirects. Not to mention dns lookups, etc.