My question is not removing duplicated/similar images. I need a tool to process a more complex process:
Find pictures I have manually removed in a folder
Apply this removal in another folder
Replace low-resolution pictures in a folder by High-resolution ones from another folder
I use Linux, but please propose solutions compatible with several OS if possible. I would also appreciate Free/Libre/OpenSource tools.
The below three examples explain the requirements
-1- Basic example: I have an old copy of my SD card on my computer (where I have already removed failed pictures) and I want these failed pictures (worst ones) be automatically removed from my Camera's SD card.
Folder "My-Computer" Folder "SD-Card" ACTION
I23001.JPG I23001.JPG keep duplicate
I23002.JPG I23002.JPG keep duplicate
I23003.JPG remove missing
I23004.JPG remove missing
I23005.JPG I23005.JPG keep duplicate
I23006.JPG remove missing
I23007.JPG remove missing
I23008.JPG I23008.JPG keep duplicate
I23009.JPG copy new picture
I23010.JPG copy new picture
I23011.JPG copy new picture
In real life, pictures are also copied on mobile phones, web gallery, cloud, backup... and failed pictures may also be removed on different devices...
-2- More complex example: I take picture using camera/smartphone/tablet. I also manually remove failed pictures (worst ones) on computer/camera/smartphone/tablet. We want best pictures being copied an all devices to view/show them.
"My-Computer" "SD-Card" "Smartphone" ACTION
I23001.JPG I23001.JPG I23001.JPG keep duplicate
I23002.JPG I23002.JPG ask user
I23003.JPG I23003.JPG ask user
I23004.JPG remove missing
I23005.JPG I23005.JPG I23005.JPG keep duplicate
P89001.JPG P89001.JPG keep duplicate
P89002.JPG P89002.JPG keep duplicate
P89003.JPG remove missing
P89004.JPG P89004.JPG keep duplicate
I23006.JPG I23006.JPG remove missing
I23007.JPG I23007.JPG remove missing
I23008.JPG I23008.JPG I23008.JPG keep duplicate
I23009.JPG copy new picture
I23010.JPG copy new picture
I23011.JPG copy new picture
P89005.JPG P89005.JPG keep duplicate
P89006.JPG copy new picture
P89007.JPG copy new picture
P89008.JPG copy new picture
-3- Very complex: I copy pictures from my camera to my smartphone using the camera's Wi-Fi access point, but the pictures are reduced (similar image content but not exact duplicate file). I also copy pictures to my friend's smartphone. We also take photos using our smartphone and copy some ones (best ones) to the other smartphone. And we also process the same with tablets. Manual removal is done on any device.
example too messed to be displayed here!
Analysis
List of duplicates finders:
Duplicate file finders on Wikipedia
Search word "duplicate" in image viewer comparison on Wikipedia
Interesting tools:
findimagedups from Jonathan H N Chin, perl script (and C lib) storing image fingerprints into a Berkley DB file and printing together filenames of images matching more than xx% similarity (pictures taken in burst mode may be flagged as similar)
findimagedupes version in Go
gThumb can also find/remove duplicates
Geeqie
imgSeek
digiKam and its Find Duplicate Images Tool
Visipics
dupeGuru Picture Edition
Tools lacking of similar image recognition:
fslint
duff
fdups
rmlint
Coding a new tool
As I did not (yet) find any solution I was thinking of developing a new software:
Modify a command line tools like findimagedups in order to provide the matching distance between images (similarity percentage)
This output may be a graph
each file is a node
edges (relations between files):
content matching (similarity percentage, crop, similar region)
in same folder, in a neighborhood folder
similar filename, successive filename numbering
similar date/time
similar metadata
similar resolution
For each group of content-similar nodes
one folder = one column
one file by raw, if duplicates in same folder (e.g. burst mode)
missing file in one folder = blank
similarity of files are provided in horizontal/vertical neighborhood only
Automatically selection of:
files to be replaced (low resolution replaced by high resolution, except in folder called "small")
files to be removed
User can display pictures and check connection properties
Related
Noticed that images sometimes are sliced up in PDFs.
Steps:
insert an image with a high resoultion (3000x1800) into a .docx
use "Microsoft Print to PDF" option of Word to convert to PDF
extracting all images with pdfimages or pymupdf
Result:
Image is sliced horizontally into three images
Questions:
What exactly happens in the in the transition from .docx to pdf (or in generell in the process to pdf) that makes the converter slice it up into three images instead of one?
Do the individuell XObjects of the sliced images contain information which says that these three images belong to originally one?
How do I know how the images are sliced (horizontally / vertically) and what if originally there were two images inserted into the .docx file and both of them are sliced. Can you tell if slice x belongs to original image y or z?
So, as you have found out: because the code which generates the PDF choose to do so.
The technical reasons may be various - it could be that historically there were printers which would only have so much memory, and would need to get limiterd size-images when printing, and someone at some point when writing the PDF export code present in Microsoft Office choose to apply this limit.
Anyway, technically, as put in the comments, an image in a PDF file could be composed of unlimited smaller images collated together.
Now, the second part, and your actual question: to know whether images ibn a PDF file belong together in a single original image one would need a custom extractor tool to check the geometry of all images in the document and find out which images have no margins or boundaries with others - it would not be that hard to do for well behaved files (which we can't know if MS Office generated files are: there are ways to obfuscate image positioning by making it indirectly). The metadata in the image-parts may or may not contain information that would allow one to recompose the original image: it would be up to the code generating the PDF to include this metadata or not - but the geometry can't lie in this case: if the final document presents a single image visually, it is possible to detect that when fetching the images.
I'm creating a multipage publication with many ads that haven't been built yet. I know their size, and filename, but the image/pdf doesn't exist yet.
Is there an existing script or a possible way to link an image that doesn't exist? Another way to look at this would be kind of the reverse of how the missing links (relink) button works. Where I know what the file path will be, but the file is missing.
Publishing industry standard practice:
Rather than a script, just create a blank image at the exact size. Make it florescent magenta with the letters "FPO" huge and dead-centre so no one can mistake it for the real thing.
Importantly: give this FPO image the exact file name of the file which will eventually be used/placed.
When your production image is finalized and approved, cut-and-paste the exact FPO file name into the new file. Drop the production file into your working directory overwriting the FPO file, and refresh it in InDesign. Bob's your uncle.
If this is being done to hundreds of images, you can develop your own batch process to handle this with some time-saving automation. However, this is a good example of an issue that can be solved at the production-management level, rather than at the coding level.
Hoping this helps!
I am creating a vb6 application now and most of my command buttons were graphical style. Do the background images still show up even if I remove them from the app folder?
This is part of what goes into .FRX, .CTX, etc. files. Those are resource files created in a private "property bag" type format and are used to hold things like binary data, images, long strings, and so on.
But don't discard your source files, because you may need them down the road. Treat such things as valuable parts of the program source. They are not needed at run time though.
As far a I know it doesn't remove the picture from the command button when you delete it from the app folder, i suggest making a copy of your image and then delete the original and see if it works in case it doesn't you have the backup image, good luck.
So I've been working on this simple program to look up specs for drawings. One of the specs is a thumbnail image of the drawing. I didn't want to try managing what could be 1000's of images (that could be changed at anytime) in the database so I created a true/false field instead. If true for that record, then the corresponding thumbnail is displayed in a picture box.
It works beautifully.... if I type the entire path in the code. But what if I give the program to a friend? So I've been trying to find a way that the program would find the images to no avail. I've been searching for answers all day and am finally at this point.... asking for help.
If CBool(db.getField("PDF")) = True Then
'pbxPDFThumb.ImageLocation = ("C:\Users\Reed Havoc\Desktop\RCM Database\RCM DataBase 2015 8g\RCM Plans Tool\RCM Plans Tool\Thumbnails\PDFs\" & db.getField("Plan_Num") & "pdf.jpg")
pbxPDFThumb.ImageLocation = ("\Thumbnails\PDFs\" & db.getField("Plan_Num") & "pdf.jpg")
End If
The first line is the entire path commented out to try the second, foreshortened path.
I've been adding my image thumbnail directory to the project in all the different directories that VS establishes on start-up but none seems to be a default so that my added image directory will be recognized.
Not so clear so I'll have 2 answers.
If drawing are not going to be added but you only use the ones you already have, simply store all of the pictures in the project's resource file, which is accessible through code and builds into the EXE
If the user adds drawings themselves there are two options. There's the more user friendly method: Save the files in a Local appdata folder for the application.
Or, less user friendly and ultimately more work for you. when the application starts, ask the user to identify a directory that does / will contain images.
For software development one often needs images. But when I start working on an image I very fast end up with dozens of versions, like so
Start with a nice large scale image, let's say a photo from my camera(x.nef)
I do some adjustments on exposure correction and white balance, convert it to a x.jpg
start to add some little stuff by copying in various pieces from two other images. (a.jpg, b.jpg resulting in a layered image x.pdn
now I scale it to the required size and save it as x_small.jpg
By now I have 6 different image files floating around, and nobody but me knows the process behind them.
So the question is: How do you handle images in the development process?
Edit:
Thx for all the great input. I combined various questions to my own personal best answer. But I accepted jiinx0r's answer because it really contained the missing idea for me to apply a naming convention for the kind of changes done.
You could just put your images under source control.
That would handle the revision history and notes. If you really need to keep all the transitional versions of the image around and don't want that in your project folder, most source control trees have a 'tools' area for that type of thing.
EDIT:
If what you're after is keeping track of the various sizes (thumbnails, etc), I would go with convention over configuration and implement a uniform file (or directory) naming system.
For instance, I would probably have seperate folders for the 100px and 500px versions of the same image. Or maybe I would put them in the same folder with a special naming convention: logo-100.jpg, and logo-500.jpg ...Either way is probably fine, just make a decision and be sure to stay consistent throughout the project.
One last thought: some folks like to include a ton of metadata in the file name. To me it depends on the scope of your operation and your individual needs. I would personally default to a less is more approach -- if you're thinking about investing in maintaining something like that (or creating a tool to do it for you), make sure it's actually a net gain of time and not just something for your OCD to filddle with!
As developers, we do tend to make glaring mistakes in this area. I know I've been guilty a bunch of times.
file naming should be handled via a naming convention.
{name}-{mod type}-{size}-{version}-{create date}.png
{name}-final.png
e.g.
file-white_balance-800x600-v01-20090831.png
file-white_balance-800x600-v02-20090831.png
file-final.jpg
the real point is to create an agreed on convention that people see the value in following
(however simple/complex is necessary for your group). In my organization we do this for input/output datafiles, images, scripts, etc. (not the same convention necessarily for all, but that they follow something that was agree upon)
Hope that helps.
I try hard to have only a single "source" image and then pour all the changes into a short Python script or some other piece of code so that I can recreate the effects and/or adjust them any time later.
The original image is saved either as PNG or TIFF (to avoid quality loss due by saving) and converted into the final type as the very last step. That's also the time when I do the scaling and other lossy operations.
We developed a downloadable and a web game with a few hundred graphic assets, most of which were stored as psd files during development. We needed jpg and and png versions for the release version of the game and lower quality jpg and png versions for the web version.
We checked the originals into source control to handle versioning.
In order to remain flexible and able to alter the original without having to re-pack the image twice after each update, we had a Perl / ImageMagick script that would update the packed images automatically.
The file name remained the same, but the compressed images would go to different directories, depending which version of the game each image was packed for.
We typically have the image title and resolution appended together in the name.
myimage_800_600.png
this way all of the like images are grouped together in the folder view and you can easily select the size you want without having to wander what "medium" means.
I agree in that source control might be your best bet for this. However conventional source control doesn't really fit images.
Have you looked at http://www.alienbrain.com ?
It's commercial but may be something that could help. I was also looking and saw something about Photoshop or Imageready having version control in it too. You could look into that.
I put all the bits and pieces together from the various answers, for a system that fits my needs:
Images go into source control. This includes images of or intermediate steps.
If multiple images are needed based on one source image, but with different transformations, this can be integrated into automatic builds (scaling, compressing, tinting)
Based on a naming convention or folder structure files can get categorized into: source (e.g. original photo), intermediate (for the various processing steps), base (an image that is actually used in the software or possible after automatic processing as in step 2)
For the processing steps, a naming convention should ensure that the kind of processing can be recognised, and also the order of steps. So one would be able to move from the source image through the various processing steps to the final image.