Data Labelling in Azure - Images not sorted - azure-blob-storage

I am labelling image data using Azure Data Labelling service.
My images are sorted in a blob container.
I would like to tag the images in the order they are sorted (which is the chronological order). However, Azure seems to mix them before showing them up.
Does anyone know how to configure that?
Thanks in advance.

What ever may be the name of the image we uploaded, but when we upload image into the datastore it will directly rename all the images and it will show in an ascending order by creation date. The order of images will be based on the size of the image which we uploaded. The final set of images will be based on the ascending order of the labeled duration taken to label the data.
It is not an issue of sorting. It will be taken into consideration according to the size of image when the time of upload and then the time taken to label the dataset images.

This is by design to reduce bias by the labeler/analyst

Related

How to manage different image sizes on a website

I have a website that is going to host thousands of images. Not only that, but each image has different size depending on where you are in the webpage - on the list page, the image is shown as 350x200 rectangle, on the sidebar pictures are 100x100 etc.
So when a user uploads an image to the website, I keep the original and make 4 resized copies for each size. So if 100 users upload an image, the result will be 500 images. I can't event think of what will happen with the different sizes of the different mobile devices...
I started using CloudFront to optimize speed. And when a user uploads an image I upload the original and the resized copies to Amazon S3 bucket.
But if tomorrow I decide to add another size, or change an existing size I have to run a script that deletes the old sized images and to uplaod the newly sized images, which means "get the original from S3, resize it, upload the new size". That is not practical at all. Imagine when I do a complete redesign of the website and the image sizes change completely, I would have to run a script that resizes each image to the new requirements and to delete the old images.
Is there a more practical way to acheive that?
I wanted to acheive the following scenario:
When the user uploads an image, I resize it and upload it to S3
CloudFront will take the images which are directly resized
When I add another size, I want CloudFront to see that this size is missing on S3 and pull it from an origin from my website.
I cannot think of a way to implement this. Any help or sharing best practices will be appretiated.
Rather than doing all this yourself, you could use an image resizing service such as:
Cloudinary
Imgix
They can resize images on-the-fly so you do not have to create and store them yourself. They can also manipulate them (rotate, watermark, colorize) images on-the-fly. Videos too!
If you choose not to use such a service, you could create your own virtual resizing service. The major choice is whether to:
Resize on-the-fly (using CloudFront for caching but not requiring any storage of the resized images), or
Resize upon request and store the result for future access (less processing cost, but involves storage cost)
It is not possible to have CloudFront "see that this size is missing on S3 and pull it from an origin from my website". (You might be able to do fancy stuff with 404 pages, but it wouldn't be worth the effort.)
For anyone doing this now. It's not for everyone, but due cost-saving efforts we just switched from Imgix to using AWS's Serverless Image Handler. Works great if your images are on S3 and maybe a good alternative solution if it lacks the features of Imgix.

Get dimensions of objects within an image

i'm planning to build a web app where following feature is used:
Imagine uploading an image,
and the dimensions within the image need to be retrieved
e.g. I would like to know the height and width of the input field.
given the fact I can provide the base image sizes and aspect ratio's.
How would one go about getting the marked dimensions out of the given image?
is there an open API that could do this?
Is this even a possiblity?

Image similarity detection

I've been playing around writing a scraper that scrapes Deviantart.com. It saves a copy of new images locally, and also creates a record in a Postgresql DB for the image. My problem: as new images come in, how do I know if this new image corresponds to an image I've seen before? Dupes are fairly rare on DA, but at the same time, this is an interesting problem in a more general sense.
Thoughts on ways to proceed?
Right now the Postgresql DB is populated as I scrape images, and which has a table which looks like:
CREATE TABLE Image
(
id SERIAL PRIMARY KEY NOT NULL,
url varchar(5000) UNIQUE NOT NULL,
dateadded timestamp without time zone default (now() at time zone 'utc'),
width int,
height int
);
Where url is the link to the image as I scraped it from DA (ex: http://th05.deviantart.net/fs70/PRE/f/2014/222/2/3/sketch_dump_56_by_lilaira-d7uj8pe.png), dateadded is the datetime the scraper found the image, and width & height are the image dimensions.
I currently don't store the image itself in the database, but I do keep a local mirror -- I take the url for the image and wget -r -nc the file. So for a url: http://th05.deviantart.net/fs70/PRE/f/2014/222/2/3/sketch_dump_56_by_lilaira-d7uj8pe.png I keep a local copy at <somedir>/th05.deviantart.net/fs70/PRE/f/2014/222/2/3/sketch_dump_56_by_lilaira-d7uj8pe.png
Now, image recognition in the general case is quite hard. I want to be able to handle things like slight resizes, which I could account for by normalizing all images kept to a specific resolution, and normalize the query image to that same resolution at query time. I want to be able to handle things like change of format (PNG vs JPG vs etc) which I could do by reading an image file into a normalized format (ex: uncompressed RGB values for each pixel, though ideally some "slack" would be tolerated here).
Nice to haves (would be willing to give up for simplification/better accuracy):
I'd like to be able to handle cropping an image (ex: I've previously seen imageA, and somebody takes imageA and crops it and uploads it as imageB I'd like to notice that as a duplicate).
I'd like to be able to handle watermarking an image with a logo
I'd like to be able to handle cropping in a case where the new image to classify is a subimage of a previously seen image (ie - I have imageA stored, somebody takes imageA and crops it, I'd like to be able to map that cropped image to imageA)
Constraints/extra info:
I'm not at all interested in finding images that are different yet similar (ex: two distinct photos of the same Red Bus should be reported as two distinct images)
while I'm not entirely opposed to using metadata (ex: artist, image category, etc), I'd like to keep this as constrained to just the image data (EXIF data, resolution, RBG colour values) as possible.
an image that is sized down and appears in a new larger image I wish to consider as different. Ex: I have imageA, I resize it to 50x50, and that 50x50 grid appears in a new image, I would not consider the new image "the same" as imageA (though I suppose by the criteria outlined previously I would consider imageA a duplicate of the new image)
It would be nice but not required if one could detect "minor" revisions in the image (ex: a blanket change to the the gamma value in an image, etc)
Thoughts? Suggestions?
For my use case I'm far more concerned about false positives than false negatives, and as such a "fuzzy match" approach should err on the side of caution.
In case it matters I'm writing all of this in Python, though TBH I'm happy to use an alternate tech if it solves my problem elegantly/efficiently.
I would grab a small subimage somewhere not near the edges, and cross correlate this within the vicinity of its source location in your database images. You can resample it prior to cross correlation to account for small resizes, and you can choose the size of the vicinity that you match against to account for asymmetrical crops of a certain percentage.
To avoid percect fits on featureless regions (e.g. the sky) you could use local image variation as a selection criterion for the subimage location.
This would still be quite slow, so it will be necessary to use a global image metric to first select candidate duplicates from the database (e.g. the color histograms mentioned by danf).

Magento Images are rotating when uploaded

This only happens on the product pages with images with a larger height than 500px approximately. Caching is disabled. Products display correctly at smaller sizes but i need a solution that doesn't require image resizing before uploading.
I believe its something to do with using multiple image resizer program and some of the meta information in the image.
Thanks
It sounds like there is EXIF data in the jpeg which records which way 'up' is. Either this info is getting ignored when you upload but is not ignored on your PC - explaining why the image looks the right way up when you view it on your desktop, but the wrong way up in Magento, or vice versa.
Can you use an art program or bulk convertor like XnView to either apply or remove the EXIF data before uploading? Then you might need to manually rotate some images.

Uploading image to S3 and then resizing in node.js

I am building a web-site where a user will upload an image and then the image has to ultimately be stored in Amazon S3. Now, given the possibility of uploading directly from browser, I considered the following options:
Resize to all three sizes in the browser and then upload to S3 directly - this works but the problem is that I am uploading multiple images, which I doubt is a good way to this.
Upload the original image to S3, resize in node.js - I was thinking of actually uploading only original image but then using node.js to resize.
What is the best way to do 2) so that I can minimize the footprint of the service I need to deploy.
Appreciate any pointers!
Cheers
Vishal
i wouldn't do option 1 just because it's bad for user experience.
i suggest the following: only upload the original image to s3, then when a resized variant gets requested by a client, retrieve it from s3, resize, and deliver to the client through a CDN.
this beats resizing them all at once because:
maybe you want to change the sizes later on. resizing on request lets you change sizes at will. otherwise, you would have to go through every image and redo all the resizes.
the user doesn't have to wait until all the images are resized. instead, the user only waits for the image to be uploaded to s3 from the node.js server, which is fast assuming you're on aws.
resizing all at once creates a memory spike. if you're on a low memory platform like heroku, you can hit the 512mb memory limit pretty quickly and your app will swap and become ridiculously slow. better to spread out the resizing operations.
you only store the originals on S3. the resized versions are just derivatives - no need to store them.
i've written a library https://github.com/mgmtio/simgr to help me do this in my app.
G'day Vishal,
I imagine your largest images are going to be much larger than your smaller sizes, which are probably thumbnails or whatever. Because image file size more-or-less scales as the area of the image, this effect will be even more pronounced.
So any service you create which pulls the largest image back out of S3 to rescale it is likely to use more S3 bandwidth than the version which just uploads all three from the client! So it might be more sensible to just do option 1.
EDIT: If you really want to avoid uploading multiple files, I still wouldn't use the upload-directly-to-S3 option, since a) you'll have to detect new files on S3 and b) the total amount of data getting shipped around will still be greater than if you accept the image into your node app, resize it there and then save the various sizes to S3.
This question discusses image manipulation in Node.
-----Nick
i suggest using an Image CDN to Speed Up Image Delivery. Check here
Amazon S3 is Slow and Images are Heavy
Amazon S3 is known to be slow when serving web content directly. One
reason why it is slow is that a bucket is only located in one
geographical location. The location is selected when you create the
bucket. For example, if your bucket is created on Amazon servers in
California, but your users are in India, then images will still be
served from California. This geographical distance causes slow image
loading on your website.
Further, it is not uncommon to see very heavy images on S3, with large
dimensions and high byte size. One can only speculate on the reasons,
but it is probably related to the publication workflow and the
convenience of S3 as a storage space.
using image delivery like ImageEngine while keeping S3’s workflow
and convenience level.
Trust me when I say its very simple to implement...
very very simple I repeat.
Once you do the nessary settings after creating account all you need is to
Reference images using the ImageEngine domain
http://wq77sh2y.cdn.imgeng.in/path/image.jpg
OR
Reference images using the prefixing feature
http://wq77sh2y.cdn.imgeng.in/http://example.com/path/image.jpeg

Resources