What's the best-practice method of storing a user's uploaded pictures and it's corresponding thumbnails. I noticed Flickr uses filename distinctions like: http://farm5.static.flickr.com/1234/789456123a_s.jpg where _s.jpg describes the size of the image (_s.jpg = small, _m.jpg = medium...). However, does storing images like the following make sense?
/images/123456.jpg
/images/small/123456.jpg
/images/medium/123456.jpg
...therefore, it's easy to access different sizes by simply pre-pending the folder-size name
Whatever works for you - pick a scheme and stick to it. As long as you're consistent, and document what you do, you should be fine.
Related
Users will be able to upload images and the name will be changed so it doesn't have the same name as another file. Using a simple convention like calling them 1.jpg, 2.jpg, 3.jpg and so on will mean other users can simple type in 4.jpg and see someone else's image.
Is there a way or a convention for naming images different while still ensuring guessing an image name is hard?
You could just write a PHP script to generate some pseudo random image name each time, like 21412adfs.jpg.
Better yet, take the name of the file being uploaded, and append a 6 digit random number to it or something, say, 19353--toy--car.png, you could even replace the 6 digit number with a number representing the date of the image uploaded.
Naming conventions can be in any form you want really, whatever works best for your setup and archive purposes. Including the date in the image name can be good, as you could easily sort images into different upload folders depending on their dates, etc.
Your best bet is to use a hashing function to generate a random string that you can use. For example in PHP you could use the following.
$filename = md5('SOME RANDOM STRING'.rand(0,200000).time());
MD5 can sometimes generate the same random string which would cause a filename collision, but the likelyhood of this happening is quite small, and if so ... all you have to do is to run the name generation again - a collision twice in a row is extremely unlikely.
Make sure you change 'SOME RANDOM STRING' to something that only you know and use on your site. It's what's known as a "site salt", it means that outsiders will have a much harder job guessing the names of your generated filenames, because they wont be able to predict and reverse engineer everything that you've put in to the mix to generate it.
Hope that helps?
I am new on this, and my objection is to build some web application that implement the user to store an image on a database as a storage, and all I want is to reduce if there is a couple or some image that stored twice or more.
So, all I need is how to find duplicate or similar images that already stored on a database, or even better when the user try to import it on the first step, and if their image are similar with an images that already been stored on a database, the system can gave a warn not to store that image.
I just want to develop how to find some similar or duplicate image on a specific directory on a database. Can you give me some explanation from the first about how to build it, and what should I learn to accomplished this from the basic step, like a tutorial or something. I'd like to learn a lot, if it's possible.
Thanks in advance, I really need this help, thanks.
The solution for finding similar images is much more complex so I will stick to the finding duplicate images first. The easiest thing to do is to take a SHA1 hash of image bits. Here is some code in C# to accomplish this (see below). As for storing the hash in a database, I would recommend that you use a binary(20) datatype to store the results of the hash. This allows your SQL server to index and query much faster than storing this hash as a string or some other format.
private static byte[] GetHashCodeForFile(string file)
{
int maxNumberOfBytesToUse = 3840000;
using (Stream sr = File.OpenRead(file))
{
byte[] buffer = (sr.Length > maxNumberOfBytesToUse) ? new byte[maxNumberOfBytesToUse]: new byte[sr.Length];
int bytesToReadIn = (sr.Length < maxNumberOfBytesToUse) ? (int)sr.Length : maxNumberOfBytesToUse;
sr.Read(buffer, 0, bytesToReadIn);
System.Security.Cryptography.HashAlgorithm hasher = System.Security.Cryptography.SHA1.Create();
byte[] hashCode = hasher.ComputeHash(buffer);
return hashCode;
}
}
Searching for similar images is a difficult problem currently undergoing much research. And it kind of depends on how you define similar. Some prominent methods for finding similar images are:
Check the metadata (EXIF or similar) tags in the image file for creation date, similar images can be taken at times that are similar to each other. This may not be the best thing for what you want.
Calculate the relative historgram of both images and compare them for deltas in each color channel. This has the benefit of allowing an SQL query to be written and is invariant to image size. An image that has been converted to a thumbnail will be found with this method.
Performing an image subtraction between two images and seeing how close the image gets to pure black (all zeros). I don't know of a method to do this with a TSQL query and this code can get tricky with images that need to be resized.
Calculating the contours of the image (through Sobel, canny or other edge detectors) then subtract the two images to see how many of their contours overlap. Again I don't think this can be handled in SQL.
This is a question about organising lots of images in a web project. Say you had the following two icons in a web project that represeneted, for example, a product selected or a product not selected:
What would you name them?
Seems a simple question, but I suspect naming images is something of an art.
For example:
star_active.png and star_inactive.png: Seems fair enough but what if you want to replace the star at a later date with a circle say. Then your name is misleading so you would have to rename it and then update all your css etc.
product_selected.png and product_unselected.png: Great for the when used for the specific action of selecting a product but what if I wanted to use the same image for a different purpose. Then the name is confusing and too specific.
Should the image size be part of the image name? eg. someImage_16.png
What is the best naming convention you have found for naming images?
You're asking for a naming convention that predicts future attributes and applications of the file so that you never have to update the file name. That is impossible. You have to rely on your own intuition when you initially name the files.
There is no way around it. If you end up changing either a file or it's application so drastically that the file name no longer accurately reflects its use, then you will either need to keep the misleading name or replace it throughout your files.
Most decent text editors should be able to easily do the latter across multiple files.
The only alternative is to assign names which are not descriptive from the start, which is obviously not a good idea.
Listen to Kobi and look into sprites, or if you're averse to sprites, do it the way Arvin said for the reasons given.
I am designing a website which will involve too many photos.
There are two modules Restaurants and Dishes. which is the best way to create the directory strcuture ?
images/Restaurants/ID
images/Dishes/ID
am using the following to create the filename
function imgName($imgExtension)
{
return time() . substr(md5(microtime()), 0, 12) . ".".$imgExtension;
}
ii want two different sized thumbnails. which is the best way to name the thumbnails ?
since the db will hold only the main pictures filename with extension.
I wouldn't worry too much about the directory structure, what you have seems good, even better might be to use S3 buckets.
As for the filename part, I've found the simplest way is to prepend thumb_ to the thumb filename. So: somefilename.jpg -> thumb_somefilename.jpg
This way you can store somefilename.jpg in the database and simply add the extra part to the front when you want the thumb.
Is there any reason for randomising the filenames like this? It's my personal opinion that you shouldn't be giving them random names in the first place, they should relate to what's actually in the picture - simply because it's nicer for users and it's more meaningful to search engines.
In a perfect world you'd have a logical structure like
/images/dishes/moules-de-mariniere.jpg
where the dish / restaurant name is a unique slug. That's pretty much impossible to implement in the real world, though, so
/images/dishes/id/moules-de-mariniere.jpg
is a fair compromise to avoid collisions.
Thumbs I generally put under their own thumbs/ directory so I can use the same filenames in all locations (laziness more than anything):
/images/dishes/id/thumbs/moules-de-mariniere.jpg
but thenduks' prepending suggestion works too, it's really just personal preference.
I would like to create a image uploading service (yes, i am aware of imageshack, photobucket, flickr...etc) :)
I have seen only imageshack show the directory names ("img294", "1646") of where the image is located, in the same way - i would like to do this.
http://img294.imageshack.us/img294/1646/**jquerykd5**.jpg
1) Are there any security issues I should be aware if i take this implementation?
2) How do these sites come up with short unique identifiers ("kd5")?
Thanks all for any advice and help.
Well for starters, unless you would like the directory to be public, put dummy index.html files in there or just restrict access to public users for those directories.
As for the unique identifiers there are many ways of going about this... some of my favourite chunks of information to use:
UNIX time (if running a unix based server)
chunks of the md5 of the file
pseudo random numbers
piece of the original filename
With these and many other pieces of information at your fingertips it should be easy to prevent duplicate image names conflicting on your server as well, you can gather as many as you like and concatenate them into a string for the filename. The md5 can be placed in a database as well to aid in a method of duplicate image detection, which could save you disk space as well.
I can promise you they all use URL rewriting. This will help with security issues, too.