We are given a rather complex image map, like the one linked below. Except that the layout, shapes of each booth are more irregular, and we have lots of image maps to process.
The requirement is that the software is able to detect which booth (the boxes) is being clicked on. Once having identified the booth, we have to fetch its ID and do some processing. So we need a way to map the physical data on the map to its logical counterpart.
Usually, there are two ways I would approach the problem.
Pragmatically determine where the hotspot are - however in this case, there is no consistency in the layout of booths - some are a small rectangle, some are a squares
Manually figure out the coordinates of each booth and program it into a giant lookup. This is very time consuming, considering the number of booths (the image below is not from the project - it's just a demo). There's an estimate of at least 5000 booths spread across different maps.
Besides the two usual methods of creating hotspots for an image map, what other ways could I use to determine which booth is being clicked?
Platform used is LimeJS, but this problem should be generic enough...
You could separate the map into booths using flood-fills, a new color for each region. You want to flood a known "corridor" spot first to eliminate that. 0,0 should work for that on most maps, I'd imagine.
This would create the hotspots you need. To cope with the print inside the boxes messing with the fill, you can just use the far corners of each region to create a rectangle. This assumes the booths are actually rectangular on the map, of course. For L-shaped booths, a little extra work might be necessary.
To get the ID from each booth, you can feed each region(from above) into an OCR, but you'll have to be able to distinguish between the ID numbers and the dimensions, etc.
Related
I’m building a Starflight-inspired 2D space exploration game with a procedural world. The gameplay is divided into different « scenes » (to use Godot terminology) to manage the different « depths » of the game. For example, interstellar flight is a scene where the star systems are simply represented by star sprites. When the player gets in range, the view is moved to the solar system scene, where the player moves his ship inside the actual solar system.
So far so good, I generate the universe (the solar systems) from a hard coded array of coordinates and seeds. Now I also want to make the universe generation procedural, but I’m guessing that loading a whole universe (there is no real limit to the number of solar systems once it becomes procedural) in memory won’t be efficient.
I’m thinking of generating the universe on the first run and saving the data to a file, but I’m wondering how to load the relevant data in an efficient way that would let me load only a certain « radius » of data around the player’s ship. I feel like it would be the way to go if I use my generation algorithms that generate « realistic » galaxy shapes, since it implies many steps of data processing (different cluster shapes are generated, arms, blobs, etc. and then stars are spinned around the center to simulate the galaxy rotation, etc.) that would be probably too long to calculate in realtime.
I’m wondering which approach I should take on this problem. It’s not really language or engine dependant, so references to generic articles and algorithms on the subject would suffice.
I also read a bit about QuadTrees and I think I’m getting to something there, but I’m not exactly sure how to use that with a file on disk.
Thanks in advance for your help!
I have some suggestions:
Do not generate the whole universe on the first run, generate only the areas that are somehow visible. Then, instead of loading the whole universe from disk, you just generate it whenever your spaceship (or whatever) come within view distance of that area. This makes game initialization much faster and allows an (almost) infinite universe.
If you want the universe to be modifiable, store only the `edits' that a player makes. So if you want to show a part of the universe, generate the part from your seed and then overlay the stored edits. This makes storage much smaller.
For storage on disk, have a look at R-Tree, especially R*Tree and R+Tree, they are designed for storing data in disk pages.
as TilmannZ suggested, you should not be generating the whole dataset for the galaxy when you start the game, because there is likely no need (unless the player needs to see/interact with all the data at once - e.g. all stars). If this is the case, for example for a starmap, then you may be better loading all the data once and saving the result in an image file.
Instead, you should only genereated the data as needed around the player. The most obvious way to do this would be to construct a grid around the player, and keep this grid centered on the player as they move around. As the player moves around, you only need to update the conceptual galaxy coordinates of each cell (not the rendered coordinates). Then for each cell you can then use the coordinates as the input into a value or gradient generator like Perlin to determine what features should spawn in that location.
As for 'shaping' the galaxy or universe, one effective way is to sample the pixel data of a greyscale image of a galaxy which has the shape you want. You could load the image's RGB data at run time, and use the coordinates of your grid as you generate the stars to get the RGB value, which you can use as a density factor for the star generation; the whiter the pixel, the higher the star density at this location and visa-versa for black pixels. This method lets you effectively draw the shape of the galaxy in paint.
Maybe think about different layers of abstractions. Each layer uses the parent layer, designer input, events & procedural generation algorithms to generate the needed data.
The Universe layer contains user or randomly placed galaxy polygons & types.
The Galaxy layers can add more details (number & density of spiral arms) or a density map.
A cluster of solar systems.
The solar system adds the stars & planets.
And only create the details for currently needed elements.
in one of my projects, I would like to create heatmap of user clicks. I was searching a while and found this library - http://www.patrick-wied.at/static/heatmapjs/examples.html . That is basically exactly what I would like to make. I would like to create heatmap in SVG, if possible, that is only difference.
I would like to create my own heatmap and I'm just wondering how to do that. I have XY clicks position. Each click has mostly different XY position, but there can be exceptions time to time, a few clicks can have the came XY position.
I found a few solutions based on grid on website, where you have to check which clicks belong into the same column in this grid and according to these informations you are able to fill the most clicked columns with red or orange and so on. But it seems a little bit complicated to me and maybe slower for bigger grids.
So I'm wondering if there is another solution how to "calculate" heatmap colors or I would like to know the main idea used in library above.
Many thanks
To make this kind of heat map, you need some kind of writable array (or, as you put it, a "grid"). User clicks are added onto this array in a cumulative fashion, by adding a small "filter" sub-array (aligned around each click) to the writable array.
Unfortunately, this "grid" method seems to be the easiest, simplest way to get that kind of smooth, blobby appearance. Fortunately, this kind of operation is well-supported by software and hardware, under the name "computer graphics".
When considered as a computer graphics operation, the writable array is called an "accumulation buffer". The filter is what gives you the nice blobby appearance, even with a relatively small number of clicks -- you can tweak the size of the filter according to the needs of your application.
After accumulating the user clicks, you will need to convert from the raw accumulated values to some kind of visible color scale. This may involve looking through the entire accumulation buffer to find the largest value, and mapping your chosen color scale accordingly. Alternately, you could adjust your scale according to the number of mouse clicks, or (as in the demo you linked to) just choose a fixed scale regardless of the content of the buffer.
Finally, I should mention that SVG is not well-adapted to representing this kind of graphic. It should probably be saved as some kind of image file (.jpg or .png) instead.
I have a web cam that takes a picture every N seconds. This gives me a collection of images of the same scene over time. I want to process that collection of images as they are created to identify events like someone entering into the frame, or something else large happening. I will be comparing images that are adjacent in time and fixed in space - the same scene at different moments of time.
I want a reasonably sophisticated approach. For example, naive approaches fail for outdoor applications. If you count the number of pixels that change, for example, or the percentage of the picture that has a different color or grayscale value, that will give false positive reports every time the sun goes behind a cloud or the wind shakes a tree.
I want to be able to positively detect a truck parking in the scene, for example, while ignoring lighting changes from sun/cloud transitions, etc.
I've done a number of searches, and found a few survey papers (Radke et al, for example) but nothing that actually gives algorithms that I can put into a program I can write.
Use color spectroanalisys, without luminance: when the Sun goes down for a while, you will get similar result, colors does not change (too much).
Don't go for big changes, but quick changes. If the luminance of the image changes -10% during 10 min, it means the usual evening effect. But when the change is -5%, 0, +5% within seconds, its a quick change.
Don't forget to adjust the reference values.
Split the image to smaller regions. Then, when all the regions change same way, you know, it's a global change, like an eclypse or what, but if only one region's parameters are changing, then something happens there.
Use masks to create smart regions. If you're watching a street, filter out the sky, the trees (blown by wind), etc. You may set up different trigger values for different regions. The regions should overlap.
A special case of the region is the line. A line (a narrow region) contains less and more homogeneous pixels than a flat area. Mark, say, a green fence, it's easy to detect wheter someone crosses it, it makes bigger change in the line than in a flat area.
If you can, change the IRL world. Repaint the fence to a strange color to create a color spectrum, which can be identified easier. Paint tags to the floor and wall, which can be OCRed by the program, so you can detect wheter something hides it.
I believe you are looking for Template Matching
Also i would suggest you to look on to Open CV
We had to contend with many of these issues in our interactive installations. It's tough to not get false positives without being able to control some of your environment (sounds like you will have some degree of control). In the end we looked at combining some techniques and we created an open piece of software named OpenTSPS (Open Toolkit for Sensing People in Spaces - http://www.opentsps.com). You can look at the C++ source in github (https://github.com/labatrockwell/openTSPS/).
We use ‘progressive background relearn’ to adjust to the changing background over time. Progressive relearning is particularly useful in variable lighting conditions – e.g. if lighting in a space changes from day to night. This in combination with blob detection works pretty well and the only way we have found to improve is to use 3D cameras like the kinect which cast out IR and measure it.
There are other algorithms that might be relevant, like SURF (http://achuwilson.wordpress.com/2011/08/05/object-detection-using-surf-in-opencv-part-1/ and http://en.wikipedia.org/wiki/SURF) but I don't think it will help in your situation unless you know exactly the type of thing you are looking for in the image.
Sounds like a fun project. Best of luck.
The problem you are trying to solve is very interesting indeed!
I think that you would need to attack it in parts:
As you already pointed out, a sudden change in illumination can be problematic. This is an indicator that you probably need to achieve some sort of illumination-invariant representation of the images you are trying to analyze.
There are plenty of techniques lying around, one I have found very useful for illumination invariance (applied to face recognition) is DoG filtering (Difference of Gaussians)
The idea is that you first convert the image to gray-scale. Then you generate two blurred versions of this image by applying a gaussian filter, one a little bit more blurry than the first one. (you could use a 1.0 sigma and a 2.0 sigma in a gaussian filter respectively) Then you subtract from the less-blury image, the pixel intensities of the more-blurry image. This operation enhances edges and produces a similar image regardless of strong illumination intensity variations. These steps can be very easily performed using OpenCV (as others have stated). This technique has been applied and documented here.
This paper adds an extra step involving contrast equalization, In my experience this is only needed if you want to obtain "visible" images from the DoG operation (pixel values tend to be very low after the DoG filter and are veiwed as black rectangles onscreen), and performing a histogram equalization is an acceptable substitution if you want to be able to see the effect of the DoG filter.
Once you have illumination-invariant images you could focus on the detection part. If your problem can afford having a static camera that can be trained for a certain amount of time, then you could use a strategy similar to alarm motion detectors. Most of them work with an average thermal image - basically they record the average temperature of the "pixels" of a room view, and trigger an alarm when the heat signature varies greatly from one "frame" to the next. Here you wouldn't be working with temperatures, but with average, light-normalized pixel values. This would allow you to build up with time which areas of the image tend to have movement (e.g. the leaves of a tree in a windy environment), and which areas are fairly stable in the image. Then you could trigger an alarm when a large number of pixles already flagged as stable have a strong variation from one frame to the next one.
If you can't afford training your camera view, then I would suggest you take a look at the TLD tracker of Zdenek Kalal. His research is focused on object tracking with a single frame as training. You could probably use the semistatic view of the camera (with no foreign objects present) as a starting point for the tracker and flag a detection when the TLD tracker (a grid of points where local motion flow is estimated using the Lucas-Kanade algorithm) fails to track a large amount of gridpoints from one frame to the next. This scenario would probably allow even a panning camera to work as the algorithm is very resilient to motion disturbances.
Hope this pointers are of some help. Good Luck and enjoy the journey! =D
Use one of the standard measures like Mean Squared Error, for eg. to find out the difference between two consecutive images. If the MSE is beyond a certain threshold, you know that there is some motion.
Also read about Motion Estimation.
if you know that the image will remain reletivly static I would reccomend:
1) look into neural networks. you can use them to learn what defines someone within the image or what is a non-something in the image.
2) look into motion detection algorithms, they are used all over the place.
3) is you camera capable of thermal imaging? if so it may be worthwile to look for hotspots in the images. There may be existing algorithms to turn your webcam into a thermal imager.
i want to identify a ball in the picture. I am thiking of using sobel edge detection algorithm,with this i can detect the round objects in the image.
But how do i differentiate between different objects. For example, a foot ball is there in one picture and in another picture i have a picture of moon.. how to differentiate what object has been detected.
When i use my algorithm i get ball in both the cases. Any ideas?
Well if all the objects you would like to differentiate are round, you could even use a hough transformation for round objects. This is a very good way of distinguishing round objects.
But your basic problem seems to be classification - sorting the objects on your image into different classes.
For this you don't really need a Neural Network, you could simply try with a Nearest Neighbor match. It's functionalities are a bit like neural networks since you can give it several reference pictures where you tell the system what can be seen there and it will optimize itself to the best average values for each attribute you detected. By this you get a dictionary of clusters for the different types of objects.
But for this you'll of course first need something that distinguishes a ball from a moon.
Since they are all real round objects (which appear as circles) it will be useless to compare for circularity, circumference, diameter or area (only if your camera is steady and if you know a moon will always have the same size on your images, other than a ball).
So basically you need to look inside the objects itself and you can try to compare their mean color value or grayscale value or the contrast inside the object (the moon will mostly have mid-gray values whereas a soccer ball consists of black and white parts)
You could also run edge filters on the segmented objects just to determine which is more "edgy" in its texture. But for this there are better methods I guess...
So basically what you need to do first:
Find several attributes that help you distinguish the different round objects (assuming they are already separated)
Implement something to get these values out of a picture of a round object (which is already segmented of course, so it has a background of 0)
Build a system that you feed several images and their class to have a supervised learning system and feed it several images of each type (there are many implementations of that online)
Now you have your system running and can give other objects to it to classify.
For this you need to segment the objects in the image, by i.e Edge filters or a Hough Transformation
For each of the segmented objects in an image, let it run through your classification system and it should tell you which class (type of object) it belongs to...
Hope that helps... if not, please keep asking...
When you apply an edge detection algorithm you lose information.
Thus the moon and the ball are the same.
The moon has a diiferent color, a different texture, ... you can use these informations to differnentiate what object has been detected.
That's a question in AI.
If you think about it, the reason you know it's a ball and not a moon, is because you've seen a lot of balls and moons in your life.
So, you need to teach the program what a ball is, and what a moon is. Give it some kind of dictionary or something.
The problem with a dictionary of course would be that to match the object with all the objects in the dictionary would take time.
So the best solution would probably using Neural networks. I don't know what programming language you're using, but there are Neural network implementations to most languages i've encountered.
You'll have to read a bit about it, decide what kind of neural network, and its architecture.
After you have it implemented it gets easy. You just give it a lot of pictures to learn (neural networks get a vector as input, so you can give it the whole picture).
For each picture you give it, you tell it what it is. So you give it like 20 different moon pictures, 20 different ball pictures. After that you tell it to learn (built in function usually).
The neural network will go over the data you gave it, and learn how to differentiate the 2 objects.
Later you can use that network you taught, give it a picture, and it a mark of what it thinks it is, like 30% ball, 85% moon.
This has been discussed before. Have a look at this question. More info here and here.
I have nothing useful to do and was playing with jigsaw puzzle like this:
alt text http://manual.gimp.org/nl/images/filters/examples/render-taj-jigsaw.jpg
and I was wondering if it'd be possible to make a program that assists me in putting it together.
Imagine that I have a small puzzle, like 4x3 pieces, but the little tabs and blanks are non-uniform - different pieces have these tabs in different height, of different shape, of different size. What I'd do is to take pictures of all of these pieces, let a program analyze them and store their attributes somewhere. Then, when I pick up a piece, I could ask the program to tell me which pieces should be its 'neighbours' - or if I have to fill in a blank, it'd tell me how does the wanted puzzle piece(s) look.
Unfortunately I've never did anything with image processing and pattern recognition, so I'd like to ask you for some pointers - how do I recognize a jigsaw piece (basically a square with tabs and holes) in a picture?
Then I'd probably need to rotate it so it's in the right position, scale to some proportion and then measure tab/blank on each side, and also each side's slope, if present.
I know that it would be too time consuming to scan/photograph 1000 pieces of puzzle and use it, this would be just a pet project where I'd learn something new.
Data acquisition
(This is known as Chroma Key, Blue Screen or Background Color method)
Find a well-lit room, with the least lighting variation across the room.
Find a color (hue) that is rarely used in the entire puzzle / picture.
Get a color paper that has that exactly same color.
Place as many puzzle pieces on the color paper as it'll fit.
You can categorize the puzzles into batches and use it as a computer hint later on.
Make sure the pieces do not overlap or touch each other.
Do not worry about orientation yet.
Take picture and download to computer.
Color calibration may be needed because the Chroma Key background may have upset the built-in color balance of the digital camera.
Acquisition data processing
Get some computer vision software
OpenCV, MATLAB, C++, Java, Python Imaging Library, etc.
Perform connected-component on the chroma key color on the image.
Ask for the contours of the holes of the connected component, which are the puzzle pieces.
Fix errors in the detected list.
Choose the indexing vocabulary (cf. Ira Baxter's post) and measure the pieces.
If the pieces are rectangular, find the corners first.
If the pieces are silghtly-off quadrilateral, the side lengths (measured corner to corner) is also a valuable signature.
Search for "Shape Context" on SO or Google or here.
Finally, get the color histogram of the piece, so that you can query pieces by color later.
To make them searchable, put them in a database, so that you can query pieces with any combinations of indexing vocabulary.
A step back to the problem itself. The problem of building a puzzle can be easy (P) or hard (NP), depending of whether the pieces fit only one neighbour, or many. If there is only one fit for each edge, then you just find, for each piece/side its neighbour and you're done (O(#pieces*#sides)). If some pieces allow multiple fits into different neighbours, then, in order to complete the whole puzzle, you may need backtracking (because you made a wrong choice and you get stuck).
However, the first problem to solve is how to represent pieces. If you want to represent arbitrary shapes, then you can probably use transparency or masks to represent which areas of a tile are actually part of the piece. If you use square shapes then the problem may be easier. In the latter case, you can consider the last row of pixels on each side of the square and match it with the most similar row of pixels that you find across all other pieces.
You can use the second approach to actually help you solve a real puzzle, despite the fact that you use square tiles. Real puzzles are normally built upon a NxM grid of pieces. When scanning the image from the box, you split it into the same NxM grid of square tiles, and get the system to solve that. The problem is then to visually map the actual squiggly piece that you hold in your hand with a tile inside the system (when they are small and uniformly coloured). But you get the same problem if you represent arbitrary shapes internally.