I am starting a project just for fun that deals with simple vector images (e.g. placing small colored or textured triangles in specific locations and orientations). I want to TDD the project if I can after I get out of prototyping, but I don't really know how to TDD a vector graphic, or any graphic output for that matter. Right now I'm playing with RMagick / RVG, but I'm not married to it.
A sample of features that I would like to test:
a triangle has the correct background (this can come from a simple color, another vector graphic, or possibly (hopefully) raster image)
has a triangle has been rotated correctly (0-360 degree)
has a triangle been placed correctly (either in the context of the image or perhaps just in a correct "slot").
I started to look at ways of testing "equality" of a vector image. The first way that I came up with was to rasterize the vector and compare the vectors. While this seldom produces an absolute positive/negative match, it does create a good metric to compare the two images. See this gist for what I'm talking about.
Do you think this is a sane way of going about this? I was also thinking of decoupling my Triangle class from the imaging library, so that I could just unit test attributes within the domain of problem that translate into transformations. Has anyone else here TDDed with image generation that could share some wisdom?
In the ideal world, you would have an interface (thinking C#) that you can then use to create a mock object. The mock would then be supplied/provided to your triangle/object construction code. That code would not know it is a mock object or the real RVG. It should be designed to get an object that is of the interface type, so that when you go into production, the code is the same for either a mock or the RVG.
The tests with the mock should have expectations of what you send it to create/generate a triangle. The decoupling is the approach you want to take. Hope that makes some sense.
In case you can output SVG, you could use XPath to test specific properties while ignoring others. If you had something like
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="30">
<polygon class="triangle" points="0,0 50,0 25,30" fill="red"/>
</svg>
then you could test whether the triangle is red with an XPath expression like:
//svg:polygon[#class='triangle'][#fill='red']
How to evaluate the XPath depends on what XPath libraries are available for the language/platform/... you are using.
Testing more complex properties, e.g. asserting that two shapes don't intersect, is harder. This specific task could be solved using something like Paper.js which has a getIntersection() feature.
Related
I have some images where I wish to distinguish between the hood of the car, and the rest of the objects by any means necessary.
Specifically, Is there any way I can accurately segment the 'hood' of the car in all these images below? It contains some amount of light refections so using basic filters becomes tricky.
To be clear; I do not have any labelled data - but I think it is possible to achieve this with simple filters only.
A few samples:-
Is there any way I can somehow separate the hood of the car with the rest of the image?
Filters that convert the hood to, say black and the rest of the environments oddly identified as black or white doesn't matter. The only requirement is that demarcation between the hood and the surrounding road be present.
Any other way to accurately generalize and filter/segment/extract the hood based on other features are also welcome!
==> The real difficulty here is the reflective surfaces - I am well aware simple color based filters might have worked but the reflection causes havoc with the simple thresholding based methods I have tried! :)
There is no way to achieve this because the reflections that you see are legit image content, and the edge of the hood is not visible enough. Even the best semantic segmentation would fail.
I assume that the hood area is fixed wrt the camera, so the simplest is to draw the outline by hand.
I'm reading up on Direct2D before I migrate my GDI code to it, and I'm trying to figure out how paths work. I understand most of the work involved with geometries and geometry sinks, but there's one thing I don't understand: the D2D1_FIGURE_BEGIN type and its parameter to BeginFigure().
First, why is this value even needed? Why does a geometry need to know if it's filled or hollow ahead of time? I don't know nay other drawing API which cares about whether path objects are filled or not ahead of time; you just define the endpoints of the shapes and then call fill() or stroke() to draw your path, so how are geometries any different?
And if this parameter is necessary, how does choosing one value over the other affect the shapes I draw in?
Finally, if I understand the usage of this enumeration correctly, you're supposed to only use filled paths with FillGeometry() and hollow paths with DrawGeometry(). However, the hourglass example here and cited by several method documentation pages (like the BeginFigure() one) creates a filled figure and draws it with both DrawGeometry() and FillGeometry()! Is this undefined behavior? Does it have anything to do with the blue border around the gradient in the example picture, which I don't see anywhere in the code?
Thanks.
EDIT Okay I think I understand what's going on with the gradient's weird outline: the gradient is also transitioning alpha values, and the fill is overlapping the stroke because the stroke is centered on the line, and the fill is drawn after the stroke. That still doesn't explain why I can fill and stroke with a filled geometry, or what the difference between hollow and filled geometries are...
Also I just realized that hollow geometries are documented as not having bounds. Does this mean that hollow geometries are purely an optimization for stroke-only geometries and otherwise behave identically to a filled geometry?
If you want to better understand Direct2D's geometry system, I recommend studying the WPF geometry system. WPF, XPS, Direct2D, Silverlight, and the newer "XAML" frameworks all use the same building blocks (the same "language", if you will). I found it easier to understand the declarative object-oriented API in WPF, and after that it was a breeze to work with the imperative API in Direct2D. You can think of WPF's mutable geometry system as an implementation of the "builder" pattern from Java, where the build() method is behind the scenes (hidden from you) and spits out an immutable Direct2D geometry when it comes time to render things on-screen (WPF uses something called "MIL", which IIRC/AFAICT, Direct2D was forked from. They really are the same thing!) It is also straightforward to write code that converts between the two representations, e.g. walking a WPF PathGeometry and streaming it into a Direct2D geometry sink, and you can also use ID2D1PathGeometry::Stream and a custom ID2D1GeometrySink implementation to reconstitute a WPF PathGeometry.
(BTW this is not theoretical :) It's exactly what I do in Paint.NET 4.0+: I use a WPF-esque declarative, mutable object model that spits out immutable Direct2D geometries at render time. It works really well!)
Okay, anyway, to get directly to your specific question: BeginFigure() and D2D1_FIGURE_BEGIN map directly to the PathFigure.IsFilled property in WPF. In order to get an intuitive understanding of what effect this has, you can use something like KaXAML to play around with some geometries from WPF or Silverlight samples and see what the results look like. And the documentation is definitely better for WPF and Silverlight than for Direct2D.
Another key concept is that DrawGeometry is basically a helper method. You can accomplish the same thing by first widening your geometry with ID2D1Geometry::Widen and then using FillGeometry ("widening" seems like a misnomer to me, btw: in Photoshop or Illustrator you'd probably use a verb like "stroke"). That's not to say that either one always performs better/worse ... be sure to benchmark. I've seen it go both ways. The reason you can think of this as a helper method is dependent on the fact that the lowest level of the rasterization engine can only do one thing: fill a triangle. All other drawing "primitives" must be converted to triangle lists or strips (this is also why ID2D1Mesh is so fast: it bypasses all sorts of processing code!). Filling a geometry requires tessellation of its interior to a list of triangle strips which can then be filled by Direct3D. "Drawing" a geometry requires applying a stroke (width and/or style): even a simple 1-pixel wide straight line must be first converted to 2 filled triangles.
Oh, also, if you want to compute the "real" bounds of a geometry with hollow figures, use ID2D1Geometry::GetWidenedBounds with a strokeWidth of zero. This is a discrepancy between Direct2D and WPF that puzzles me. Geometry.Bounds (in WPF) is equivalent to ID2D1Geometry::GetWidenedBounds(0.0f).
I'm writing a raytracer, and I would like to code it using TDD, top-down approach.
I don't wan't to bore you with the details so in a nutshell the program will work in a way that you give it a specified scene (for example coordinates of a sphere and its radius), and it will output an image with 3d sphere rendered.
How in the world can I test first such behaviour?
I know I can test first some inner algorithms that are inside that .render() function, but I would like to go top-down, and I just can't anticipate the generated image in advance. I know I can test it for beeing all black or specified size, but if I would like to be strict in using TDD those tests wouldn't get me anywhere: "you shall not implement more than its required to pass the test".
So, any thoughts?
Do not try to automate testing the image. Your eyeballs are infinitely better at verifying images than any code you could possibly write. As a regression test you can capture the image output and do a pixel compare, but that will be a fragile regression test, not a TDD unit test.
Extract the rendering logic into separate testable classes. I assume there will be lots of calculations and algorithms you can TDD unless you can find an existing library (which is probably worth looking for). TDD is about design and right now it is telling you to separate out the rendering algorithm from the display.
Obviously you can't anticipate the exact image for a given use case, or you wouldn't need the project. Not even the most rabid TDD advocate would demand that you write the complete test first. However, you may well write the load-image-and-compare-to-program-output part first and insert the actual image later, and I would actually recommend doing that. Starting out with a use of the proposed API is always a good idea, as it can expose issues that you would otherwise never think about until it's too late.
It is also possible that you already know, not the exact image, but certain properties of the image that you want to test. For instance, if you render the obligatory reflecting metal sphere, you may well know how bright (approximately) certain areas of the image have to be, or that certain parts have to be symmetrical, etc. It is a good idea to write those tests first. (I maintain a test suite for automatically generated diagrams, and they contain tests such as "all ten colours must be visible in this image", i.e. concentric circles should be drawn largest-first. Raytracing output is, of course, much more complicated, but the principle applies.)
i want to identify a ball in the picture. I am thiking of using sobel edge detection algorithm,with this i can detect the round objects in the image.
But how do i differentiate between different objects. For example, a foot ball is there in one picture and in another picture i have a picture of moon.. how to differentiate what object has been detected.
When i use my algorithm i get ball in both the cases. Any ideas?
Well if all the objects you would like to differentiate are round, you could even use a hough transformation for round objects. This is a very good way of distinguishing round objects.
But your basic problem seems to be classification - sorting the objects on your image into different classes.
For this you don't really need a Neural Network, you could simply try with a Nearest Neighbor match. It's functionalities are a bit like neural networks since you can give it several reference pictures where you tell the system what can be seen there and it will optimize itself to the best average values for each attribute you detected. By this you get a dictionary of clusters for the different types of objects.
But for this you'll of course first need something that distinguishes a ball from a moon.
Since they are all real round objects (which appear as circles) it will be useless to compare for circularity, circumference, diameter or area (only if your camera is steady and if you know a moon will always have the same size on your images, other than a ball).
So basically you need to look inside the objects itself and you can try to compare their mean color value or grayscale value or the contrast inside the object (the moon will mostly have mid-gray values whereas a soccer ball consists of black and white parts)
You could also run edge filters on the segmented objects just to determine which is more "edgy" in its texture. But for this there are better methods I guess...
So basically what you need to do first:
Find several attributes that help you distinguish the different round objects (assuming they are already separated)
Implement something to get these values out of a picture of a round object (which is already segmented of course, so it has a background of 0)
Build a system that you feed several images and their class to have a supervised learning system and feed it several images of each type (there are many implementations of that online)
Now you have your system running and can give other objects to it to classify.
For this you need to segment the objects in the image, by i.e Edge filters or a Hough Transformation
For each of the segmented objects in an image, let it run through your classification system and it should tell you which class (type of object) it belongs to...
Hope that helps... if not, please keep asking...
When you apply an edge detection algorithm you lose information.
Thus the moon and the ball are the same.
The moon has a diiferent color, a different texture, ... you can use these informations to differnentiate what object has been detected.
That's a question in AI.
If you think about it, the reason you know it's a ball and not a moon, is because you've seen a lot of balls and moons in your life.
So, you need to teach the program what a ball is, and what a moon is. Give it some kind of dictionary or something.
The problem with a dictionary of course would be that to match the object with all the objects in the dictionary would take time.
So the best solution would probably using Neural networks. I don't know what programming language you're using, but there are Neural network implementations to most languages i've encountered.
You'll have to read a bit about it, decide what kind of neural network, and its architecture.
After you have it implemented it gets easy. You just give it a lot of pictures to learn (neural networks get a vector as input, so you can give it the whole picture).
For each picture you give it, you tell it what it is. So you give it like 20 different moon pictures, 20 different ball pictures. After that you tell it to learn (built in function usually).
The neural network will go over the data you gave it, and learn how to differentiate the 2 objects.
Later you can use that network you taught, give it a picture, and it a mark of what it thinks it is, like 30% ball, 85% moon.
This has been discussed before. Have a look at this question. More info here and here.
I am building a web application using .NET 3.5 (ASP.NET, SQL Server, C#, WCF, WF, etc) and I have run into a major design dilemma. This is a uni project btw, but it is 100% up to me what I develop.
I need to design a system whereby I can take an image and automatically crop a certain object within it, without user input. So for example, cut out the car in a picture of a road. I've given this a lot of thought, and I can't see any feasible method. I guess this thread is to discuss the issues and feasibility of achieving this goal. Eventually, I would get the dimensions of a car (or whatever it may be), and then pass this into a 3d modelling app (custom) as parameters, to render a 3d model. This last step is a lot more feasible. It's the cropping issue which is an issue. I have thought of all sorts of ideas, like getting the colour of the car and then the outline around that colour. So if the car (example) is yellow, when there is a yellow pixel in the image, trace around it. But this would fail if there are two yellow cars in a photo.
Ideally, I would like the system to be completely automated. But I guess I can't have everything my way. Also, my skills are in what I mentioned above (.NET 3.5, SQL Server, AJAX, web design) as opposed to C++ but I would be open to any solution just to see the feasibility.
I also found this patent: US Patent 7034848 - System and method for automatically cropping graphical images
Thanks
This is one of the problems that needed to be solved to finish the DARPA Grand Challenge. Google video has a great presentation by the project lead from the winning team, where he talks about how they went about their solution, and how some of the other teams approached it. The relevant portion starts around 19:30 of the video, but it's a great talk, and the whole thing is worth a watch. Hopefully it gives you a good starting point for solving your problem.
What you are talking about is an open research problem, or even several research problems. One way to tackle this, is by image segmentation. If you can safely assume that there is one object of interest in the image, you can try a figure-ground segmentation algorithm. There are many such algorithms, and none of them are perfect. They usually output a segmentation mask: a binary image where the figure is white and the background is black. You would then find the bounding box of the figure, and use it to crop. The thing to remember is that none of the existing segmentation algorithm will give you what you want 100% of the time.
Alternatively, if you know ahead of time what specific type of object you need to crop (car, person, motorcycle), then you can try an object detection algorithm. Once again, there are many, and none of them are perfect either. On the other hand, some of them may work better than segmentation if your object of interest is on very cluttered background.
To summarize, if you wish to pursue this, you would have to read a fair number of computer vision papers, and try a fair number of different algorithms. You will also increase your chances of success if you constrain your problem domain as much as possible: for example restrict yourself to a small number of object categories, assume there is only one object of interest in an image, or restrict yourself to a certain type of scenes (nature, sea, etc.). Also keep in mind, that even the accuracy of state-of-the-art approaches to solving this type of problems has a lot of room for improvement.
And by the way, the choice of language or platform for this project is by far the least difficult part.
A method often used for face detection in images is through the use of a Haar classifier cascade. A classifier cascade can be trained to detect any objects, not just faces, but the ability of the classifier is highly dependent on the quality of the training data.
This paper by Viola and Jones explains how it works and how it can be optimised.
Although it is C++ you might want to take a look at the image processing libraries provided by the OpenCV project which include code to both train and use Haar cascades. You will need a set of car and non-car images to train a system!
Some of the best attempts I've see of this is using a large database of images to help understand the image you have. These days you have flickr, which is not only a giant corpus of images, but it's also tagged with meta-information about what the image is.
Some projects that do this are documented here:
http://blogs.zdnet.com/emergingtech/?p=629
Start with analyzing the images yourself. That way you can formulate the criteria on which to match the car. And you get to define what you cannot match.
If all cars have the same background, for example, it need not be that complex. But your example states a car on a street. There may be parked cars. Should they be recognized?
If you have access to MatLab, you could test your pattern recognition filters with specialized software like PRTools.
Wwhen I was studying (a long time ago:) I used Khoros Cantata and found that an edge filter can simplify the image greatly.
But again, first define the conditions on the input. If you don't do that you will not succeed because pattern recognition is really hard (think about how long it took to crack captcha's)
I did say photo, so this could be a black car with a black background. I did think of specifying the colour of the object, and then when that colour is found, trace around it (high level explanation). But, with a black object in a black background (no constrast in other words), it would be a very difficult task.
Better still, I've come across several sites with 3d models of cars. I could always use this, stick it into a 3d model, and render it.
A 3D model would be easier to work with, a real world photo much harder. It does suck :(
If I'm reading this right... This is where AI shines.
I think the "simplest" solution would be to use a neural-network based image recognition algorithm. Unless you know that the car will look the exact same in each picture, then that's pretty much the only way.
If it IS the exact same, then you can just search for the pixel pattern, and get the bounding rectangle, and just set the image border to the inner boundary of the rectangle.
I think that you will never get good results without a real user telling the program what to do. Think of it this way: how should your program decide when there is more than 1 interesting object present (for example: 2 cars)? what if the object you want is actually the mountain in the background? what if nothing of interest is inside the picture, thus nothing to select as the object to crop out? etc, etc...
With that said, if you can make assumptions like: only 1 object will be present, then you can have a go with using image recognition algorithms.
Now that I think of it. I recently got a lecture about artificial intelligence in robots and in robotic research techniques. Their research went on about language interaction, evolution, and language recognition. But in order to do that they also needed some simple image recognition algorithms to process the perceived environment. One of the tricks they used was to make a 3D plot of the image where x and y where the normal x and y axis and the z axis was the brightness of that particular point, then they used the same technique for red-green values, and blue-yellow. And lo and behold they had something (relatively) easy they could use to pick out the objects from the perceived environment.
(I'm terribly sorry, but I can't find a link to the nice charts they had that showed how it all worked).
Anyway, the point is that they were not interested (that much) in image recognition so they created something that worked good enough and used something less advanced and thus less time consuming, so it is possible to create something simple for this complex task.
Also any good image editing program has some kind of magic wand that will select, with the right amount of tweaking, the object of interest you point it on, maybe it's worth your time to look into that as well.
So, it basically will mean that you:
have to make some assumptions, otherwise it will fail terribly
will probably best be served with techniques from AI, and more specifically image recognition
can take a look at paint.NET and their algorithm for their magic wand
try to use the fact that a good photo will have the object of interest somewhere in the middle of the image
.. but i'm not saying that this is the solution for your problem, maybe something simpler can be used.
Oh, and I will continue to look for those links, they hold some really valuable information about this topic, but I can't promise anything.