I've been playing around writing a scraper that scrapes Deviantart.com. It saves a copy of new images locally, and also creates a record in a Postgresql DB for the image. My problem: as new images come in, how do I know if this new image corresponds to an image I've seen before? Dupes are fairly rare on DA, but at the same time, this is an interesting problem in a more general sense.
Thoughts on ways to proceed?
Right now the Postgresql DB is populated as I scrape images, and which has a table which looks like:
CREATE TABLE Image
(
id SERIAL PRIMARY KEY NOT NULL,
url varchar(5000) UNIQUE NOT NULL,
dateadded timestamp without time zone default (now() at time zone 'utc'),
width int,
height int
);
Where url is the link to the image as I scraped it from DA (ex: http://th05.deviantart.net/fs70/PRE/f/2014/222/2/3/sketch_dump_56_by_lilaira-d7uj8pe.png), dateadded is the datetime the scraper found the image, and width & height are the image dimensions.
I currently don't store the image itself in the database, but I do keep a local mirror -- I take the url for the image and wget -r -nc the file. So for a url: http://th05.deviantart.net/fs70/PRE/f/2014/222/2/3/sketch_dump_56_by_lilaira-d7uj8pe.png I keep a local copy at <somedir>/th05.deviantart.net/fs70/PRE/f/2014/222/2/3/sketch_dump_56_by_lilaira-d7uj8pe.png
Now, image recognition in the general case is quite hard. I want to be able to handle things like slight resizes, which I could account for by normalizing all images kept to a specific resolution, and normalize the query image to that same resolution at query time. I want to be able to handle things like change of format (PNG vs JPG vs etc) which I could do by reading an image file into a normalized format (ex: uncompressed RGB values for each pixel, though ideally some "slack" would be tolerated here).
Nice to haves (would be willing to give up for simplification/better accuracy):
I'd like to be able to handle cropping an image (ex: I've previously seen imageA, and somebody takes imageA and crops it and uploads it as imageB I'd like to notice that as a duplicate).
I'd like to be able to handle watermarking an image with a logo
I'd like to be able to handle cropping in a case where the new image to classify is a subimage of a previously seen image (ie - I have imageA stored, somebody takes imageA and crops it, I'd like to be able to map that cropped image to imageA)
Constraints/extra info:
I'm not at all interested in finding images that are different yet similar (ex: two distinct photos of the same Red Bus should be reported as two distinct images)
while I'm not entirely opposed to using metadata (ex: artist, image category, etc), I'd like to keep this as constrained to just the image data (EXIF data, resolution, RBG colour values) as possible.
an image that is sized down and appears in a new larger image I wish to consider as different. Ex: I have imageA, I resize it to 50x50, and that 50x50 grid appears in a new image, I would not consider the new image "the same" as imageA (though I suppose by the criteria outlined previously I would consider imageA a duplicate of the new image)
It would be nice but not required if one could detect "minor" revisions in the image (ex: a blanket change to the the gamma value in an image, etc)
Thoughts? Suggestions?
For my use case I'm far more concerned about false positives than false negatives, and as such a "fuzzy match" approach should err on the side of caution.
In case it matters I'm writing all of this in Python, though TBH I'm happy to use an alternate tech if it solves my problem elegantly/efficiently.
I would grab a small subimage somewhere not near the edges, and cross correlate this within the vicinity of its source location in your database images. You can resample it prior to cross correlation to account for small resizes, and you can choose the size of the vicinity that you match against to account for asymmetrical crops of a certain percentage.
To avoid percect fits on featureless regions (e.g. the sky) you could use local image variation as a selection criterion for the subimage location.
This would still be quite slow, so it will be necessary to use a global image metric to first select candidate duplicates from the database (e.g. the color histograms mentioned by danf).
Related
Noticed that images sometimes are sliced up in PDFs.
Steps:
insert an image with a high resoultion (3000x1800) into a .docx
use "Microsoft Print to PDF" option of Word to convert to PDF
extracting all images with pdfimages or pymupdf
Result:
Image is sliced horizontally into three images
Questions:
What exactly happens in the in the transition from .docx to pdf (or in generell in the process to pdf) that makes the converter slice it up into three images instead of one?
Do the individuell XObjects of the sliced images contain information which says that these three images belong to originally one?
How do I know how the images are sliced (horizontally / vertically) and what if originally there were two images inserted into the .docx file and both of them are sliced. Can you tell if slice x belongs to original image y or z?
So, as you have found out: because the code which generates the PDF choose to do so.
The technical reasons may be various - it could be that historically there were printers which would only have so much memory, and would need to get limiterd size-images when printing, and someone at some point when writing the PDF export code present in Microsoft Office choose to apply this limit.
Anyway, technically, as put in the comments, an image in a PDF file could be composed of unlimited smaller images collated together.
Now, the second part, and your actual question: to know whether images ibn a PDF file belong together in a single original image one would need a custom extractor tool to check the geometry of all images in the document and find out which images have no margins or boundaries with others - it would not be that hard to do for well behaved files (which we can't know if MS Office generated files are: there are ways to obfuscate image positioning by making it indirectly). The metadata in the image-parts may or may not contain information that would allow one to recompose the original image: it would be up to the code generating the PDF to include this metadata or not - but the geometry can't lie in this case: if the final document presents a single image visually, it is possible to detect that when fetching the images.
I am trying to modify the default I-beam cursor image. I'm using [[[NSCursor IBeamCursor] image] representations], passing each one through a CIFilter and adding it to a new image. However, the resulting cursor looks as though it is rendering the low-resolution images.
The High Resolution Guidelines say:
For custom cursors, you can pass a multirepresentation TIFF to the NSCursor class method initWithImage:hotSpot:.
So I would expect this to work. Additionally, if I get the -TIFFRepresentation of the original image and my modified image, and write them to disk, they both look like multi-page TIFF files with the same size images. What could I be doing wrong?
I have a somewhat-temporary solution: manually call -setSize: on each image representation, dividing the pixel height and width by the screen's scale factor. However, this technique doesn't seem like it will work ideally with multiple screens.
You're right on. I've been debugging this all day and I'm pretty sure I've got it nailed. I'm not doing exactly the same thing you are (my images are loaded from a file) but the end result is exactly the same.
The trick is to set the first representation of the multi-representation image to the non-retina size. If you are loading your cursors from an image file, you must take this extra step to adjust the size of the representations to match. It doesn't work 'out-of-the-box' as you would expect.
I've tested this on a machine with two monitors and dragging the window from the retina display to the non-retina display acts as it should, displaying the high/low resolution images for the cursor.
I had a similar problem a while ago: I had my cursor in a PDF, and it always drew as if it was a pixel image at 1:1 size, blown up. There's a solution to that in NSCursor: Using high-resolution cursors with cursor zoom (or retina).
Maybe someone can use that technique to solve this problem? My guess is creating an image with the same size but a different CTM marks it as the same size but Retina. What #jtbrandes is doing probably marks it as a different size and non-Retina. So you're effectively losing the scale factor information. If you create an image with a CTM in the hints, maybe you can draw the filtered images into it and it'll be detected right.
In our web application, the users need to review a large number of images. This is my current layout. 20 images will be displayed at a time, with a pagination bar above the thumbnails. Clicking a thumbnail will show the enlarged image to the left. The enlarged image will follow the scrollbar so it's always visible. Quite simple actually.
I was wondering what the best interface would be in this scenario:
One option is to implement an infinite scroll script which will lazy load thumbnails as the user scrolls. The thumbnails not visible will be removed from the DOM. But my concern with this approach is the number of changes in the DOM slowing down the page.
Another option could be something like Google's Fastflip.
What do you think is the best approach for this application? Radical ideas welcomed.
I think the question you have to ask is: what action is user supposed to do? What's the purpose of the site?
If "review images" entails rating every image, I'd rather go with a Fastflip approach where the focus is on the single image. A thumbnail gallery will distract from the desired action and might result in a smaller amount of pics rated/reviewed.
If the focus should rather be on the comparison of a given image against others, I'd say try the gallery approach, although I wouldn't impement an infinite scroll with thumbnails because user can quickly get lost in the abundance of choices. I think a standard pagination (whether static or ajaxified) would be better if you choose to go this route.
Just my 2c.
If you paginate thumbnails, you can pre-generate a single image containing all thumbnails for each page, then use an image map to handle mouseover text and clicking. This will reduce the number of HTTP requests and possibly lead to fewer bytes sent. The separation distance between images should be minimized for this to be most efficient. This would have some disadvantages.
To reduce image download size at the expense of preprocessing, you can try to save each image in the format (PNG or JPG) most efficient for its contents using an algorithm like the one in ImageGuide. Similarly, if the images are poorly compressed (like JPEGs from a cell phone camera), they can be recompressed at the cost of some quality.
Once the site has some testers, you can analyze patterns in which images tend to be clicked (if a pattern exists) and preload the full-size images, or even pre-load all of them once the thumbs are loaded.
You might play with JPEG2000 images (you did say "radical ideas welcomed"), which thumb very easily, because the thumbnail and main image needn't be sent as if they are separate files. This is an advantage of the compression format -- it isn't the same as the hack of telling the browser to resize the full size image to represent its own thumbnail.
You can take a look at Google's WebP image format.
At the server side, a separate image server optimized for static content delivery, perhaps using NginX or the Tux webserver.
I would show the thumbnails, since the user might want to skip some of the pictures. I would also stay away of pagination in the terms of
<<first <previuos n of x next> last>>
and go for something more easy to implement and efficient. A
load x more pictures.
No infinite scroll whatsoever and why not, even no scroll at all. Just load x more, previous x.
Although this answer might be a bit unradical and boring, I'd go with exactly your suggestion of asynchronously loading the thumbnails (and of course main picture), if they come into view. A similar technique is used by Google+ in the pane to add persons to circles. This way, you keep the server resources and bandwidth on the pictures that are needed by the client. As Google+ shows, the operations on the DOM tree are fast enough and don't slow down a computer of the past years.
You might also prebuilt a few lines of the thumbnail table ahead with a dummy image (e.g. a "loading circle" animated gif) and replace the image. That way, the table in view is already built and does not need to be rerendered, as the flow elements following the table would have to be, if no images are in there during scrolling.
Instead of paginating the thumbnails (as suggested by your layout scheme), you could also think about letting users filter the images by tag, theme, category, size or any other way to find their results faster.
Is there any way in Crystal Reports(v11, if this matters) to prevent the images from stretching itself to fit in the whole OLE-object? I'm loading the images dynamically from a database and don't know their aspect ratio.
Thanks for help.
i found the answer here
To make the image resize properly, you must perform the following steps in order:
Set the image's EnableCanGrow to true
Calculate and set Width and Height to the needed size
Set the image's EnableCanGrow to false
Fill the DataSet's image object with data
Continue with normal report processing.
If you get these items in the wrong order, or skip an item, you will
find that Crystal Reports scales the image in unexpected and unrecoverable ways.
The Crystal OLE object, which shows pictures from files, can only be set programmatically so for a 'pull' type report, where you are supplying a dynamic image name, of either portrait or landscape orientation, at least one of those orientations will get squashed to fit. It is better, IMHO, to show thumbnails and then have a calculated hyperlink to show the real picture in some decent viewer. You will spend an unreasonable amount of time trying to get the OLE object that shows pictures to understand that your image has a different aspect ratio. As long as it is at least reasonably legible that may have to suffice.
I don't think Crystal can help you. Try looking for some kind of command-line based app on the internet which can automically resize pictures (add black space, reduce in a 1:1 ratio, etc). I'm sure they're out there.
We would like to display very large (50mb plus) images in Internet Explorer. We would like to avoid compression as compression algorithms are not what CSI would have us believe that they are and the resulting files are too lossy.
As a result, we have come up with two options: Silverlight Deep Zoom or a Flash based solution (such as Zoomify). The issue is that both of these require conversion to a tiled output and/or conversion to a specific file type (Zoomify supports a single proprietary file type, PFF).
What we are wondering is if a solution exists which will allow us to view the image without a conversion before hand.
PS: I know that you can write an application to tile the images (as needed or after the load process) and output them; however, we would like to do this without chopping up the file.
The tiled approach really is the right way to do it.
Your users don't want to download a 50mb file before they can start viewing the image. You don't want to spend the bandwidth to serve 50 megs to every user who might only view a fraction of your image.
If you serve the whole file, users will eventually be able to load and view it, but it won't run smoothly for most of them.
There is no simple non-tiled way to serve just a portion of an image unless you want to use a server-side library like imagemagik or PIL to extract a specific subset of the image for each user. You probably don't want to do that because it will place a significant load on your server.
Alternatively, you might use something like google's map tool to provide zooming and scaling. Some comments on doing that are available here:
http://webtide.wordpress.com/2008/08/27/custom-google-maps/
Take a look at OpenSeadragon. To make a image can work with OpenSeadragon, you should generate a zoomable image format which mentioned here. Then follow starting guide here
The browser isn't going to smoothly load a 50 meg file; if you don't chop it up, there's no reasonable way to make it not lag.
If you dont want to tile, you could have the server open the file and render a screen sized view of the image for display in the browser at the particular zoom resolution requested. This way you arent sending 50 meg files across the line when someone only wants to get an overview of the image. That is, the browser requests a set of coordinates and an output size in pixels, the server opens the larger image and creates a smaller image that fits the desired view, and sends that back to the web browser.
As far as compression, you say its too lossy, but if thats what you are seeing you are probably using the wrong compression algorithm or setting for the type of image you have. The jpg format has quality settings to control lossiness, and PNG compression is lossless (the pixels you get after decompressing are the exact values you had prior to compression). So consider changing what you are using as compression, and dont just rely on the default settings in an image editor.