Plotting a star chart efficiently - algorithm

I'd like to visualize astronomical star catalogues that can contain hundreds of thousands of entries. The catalogues usually consist of simply a list of stars, with spherical coordinates and other data for each star. By spherical coordinates I mean right ascension (0-360 degrees or 0-24 hours) and declination (-90 degrees to +90 degrees). This corresponds to longitude and latitude, just on the celestial sphere instead of Earth's surface.
I'd like to plot all the stars in the catalogue that are located inside a certain field of view, defined by the center (in spherical coordinates) and the size of the field of view (in degrees) and the projection (e.g. stereographic projection).
Plotting the stars by going through the whole catalogue and just checking whether each star is inside the field of view or not is very inefficient.
How could I make this more efficient? Is there a good algorithm or data structure for this kind of a problem?

For modern gfx cards numbers like 300K (and more) stars are still manageable...
So you can try to load them all to gfx as VBO/VAO and leave the render/clipping to gfx alone. I use Hipparcos (118322 stars) in this way without problems while each star is a transparent Quad. You just need to pre-compute the quads to view position prior to rendering (just once). Here screenshot from one of my apps where Hipparcos is used in this manner as background stars (in RT)
You can also use geometry shaders to ease up things a lot (can send just points or even Ra,Dec,Distance instead of Quads) but this will limit your Target gfx HW to only those supporting geometry shaders.
If you got more stars then your HW can handle then use sorted dataset
Most catalogs are sorted by Ra or Dec. You can use this by:
select view area min(Ra,Dec),max(Ra,Dec)
let assume your data is sorted by Ra ascending
find first i0 where star[i0].Ra>=min.Ra
use binary search !!!
find first i1 where star[i1].Ra>=max.Ra
use binary search !!!
process stars i0<=i<i1
test if min.Dec <= star[i].Dec <= max.Dec and if yes then render it.
If even this is not fast enough you need to use spatial subdivision
So divide your dataset to smaller ones. And prior to rendering according to selected view area use only datasets near that area. This will lower the amound of data processed significantly.

Related

Pre-projected geometry v getting the browser to do it (aka efficiency v flexibility)

To improve the performance of my online maps, especially on smartphones, I'm following Mike Bostock's advice to prepare the geodata as much as possible before uploading it to the server (as per his command-line cartography). For example, I'm projecting the TopoJSON data, usually via d3.geoConicEqualArea(), at the command line rather than making the viewer's browser do this grunt work when loading the map.
However, I also want to use methods like .scale, .fitSize, .fitExtent and .translate dynamically, which means I can't "bake" the scale or translate values into the TopoJSON file beforehand.
Bostock recommends using d3.geoTransform() as a proxy for projections like d3.geoConicEqualArea() if you're working with already-projected data but still want to scale or translate it. For example, to flip a projection on the y-axis, he suggests:
var reflectY = d3.geoTransform({
point: function(x, y) {
this.stream.point(x, -y);
}
}),
path = d3.geoPath()
.projection(reflectY);
My question: If I use this D3 function, aren't I still forcing the viewer's browser to do a lot of data processing, which will worsen the performance? The point of pre-processing the data is to avoid this. Or am I overestimating the processing work involved in the d3.geoTransform() function above?
If I use this D3 function, aren't I still forcing the viewer's browser
to do a lot of data processing, which will worsen the performance? The
point of pre-processing the data is to avoid this. Or am I
overestimating the processing work involved in the d3.geoTransform()
function above?
Short Answer: You are overestimating the amount of work required to transform projected data.
Spherical Nature of D3 geoProjections
A d3 geoProjection is relatively unique. Many platforms, tools, or libraries take points consisting of latitude and longitude pairs and treat them as though they are on a Cartesian plane. This simplifies the math to a huge extent, but comes at a cost: paths follow Cartesian routing.
D3 treats longitude latitude points as what they are: points on a three dimensional ellipsoid. This costs more computationally but provides other benefits - such as routing path segments along great circle routes.
The extra computational costs d3 incurs in treating coordinates as points on a 3d globe are:
Spherical Math
Take a look at a simple geographic projection before scaling, centering, etc:
function mercator(x, y) {
return [x, Math.log(Math.tan(Math.PI / 4 + y / 2))];
}
This is likely to take longer than the transform you propose above.
Pathing
On a Cartesian plane, lines between two points are easy, on a sphere, this is difficult. Take a line stretching from 179 degrees East to 179 degrees West - treating these as though they were on a Cartesian plane that is easy - draw a line across the earth. On a spherical earth, the line crosses the anti-meridian.
Consequently, in flattening the paths, sampling is required along the route, great circle distance between points requires bends, and therefore additional points.I'm not certain on the process of this in d3, but it certainly occurs.
Points on a cartesian plane don't require additional sampling - they are already flat, lines between points are straight. There is no need to detect if lines wrap around the earth another way.
Operations post Projection
Once projected, something like .fitSize will force additional work that is essentially what you are proposing with the d3.geoTransform(): the features need to be transformed and scaled based on their projected location and size.
This is very visible in d3v3 (before there was fitSize()) when autocentering features: calculations involve the svg extent of the projected features.
Basic Quasi Scientific Performance Comparison
Using a US census bureau shapefile of the United States, I created three geojson files:
One using WGS84 (long/lat) (file size: 389 kb)
One using geoproject in node with a plain d3.geoAlbers transform (file size: 386 kb)
One using geoproject in node with d3.geoAlbers().fitSize([500,500],d) (file size 385 kb)
The gold standard of speed should be option 3, the data is scaled and centered based on an anticipated display extent, no transform is required here and I will use a null projection to test it
I proceeded to project these to a 500x500 svg using:
// For the unprojected data
var projection = d3.geoAlbers()
.fitSize([500,500],wgs84);
var geoPath = d3.geoPath().projection(projection)
// for the projected but unscaled and uncentered data
var transform = d3.geoIdentity()
.fitSize([500,500],albers);
var projectedPath = d3.geoPath()
.projection(transform);
// for the projected, centered, and scaled data
var nullProjection = d3.geoPath()
Running this a few hundred times, I got average rendering times (data was preloaded) of:
71 ms: WGS84
33 ms: Projected but unscaled and uncentered
21 ms: Projected, scaled, and centered
I feel safe in saying there is a significant performance bump in pre-projecting the data, regardless of if it is actually centered and scaled.
Note I used d3.geoIdentity() as opposed to d3.geoTransform() as it allows the use of fitSize(), and you can reflect if needed on the y: .reflectY(true);

Future prospects for improvement of depth data on Project Tango tablet

I am interested in using the Project Tango tablet for 3D reconstruction using arbitrary point features. In the current SDK version, we seem to have access to the following data.
A 1280 x 720 RGB image.
A point cloud with 0-~10,000 points, depending on the environment. This seems to average between 3,000 and 6,000 in most environments.
What I really want is to be able to identify a 3D point for key points within an image. Therefore, it makes sense to project depth into the image plane. I have done this, and I get something like this:
The problem with this process is that the depth points are sparse compared to the RGB pixels. So I took it a step further and performed interpolation between the depth points. First, I did Delaunay triangulation, and once I got a good triangulation, I interpolated between the 3 points on each facet and got a decent, fairly uniform depth image. Here are the zones where the interpolated depth is valid, imposed upon the RGB iamge.
Now, given the camera model, it's possible to project depth back into Cartesian coordinates at any point on the depth image (since the depth image was made such that each pixel corresponds to a point on the original RGB image, and we have the camera parameters of the RGB camera). However, if you look at the triangulation image and compare it to the original RGB image, you can see that depth is valid for all of the uninteresting points in the image: blank, featureless planes mostly. This isn't just true for this single set of images; it's a trend I'm seeing for the sensor. If a person stands in front of the sensor, for example, there are very few depth points within their silhouette.
As a result of this characteristic of the sensor, if I perform visual feature extraction on the image, most of the areas with corners or interesting textures fall in areas without associated depth information. Just an example: I detected 1000 SIFT keypoints from an an RGB image from an Xtion sensor, and 960 of those had valid depth values. If I do the same thing to this system, I get around 80 keypoints with valid depth. At the moment, this level of performance is unacceptable for my purposes.
I can guess at the underlying reasons for this: it seems like some sort of plane extraction algorithm is being used to get depth points, whereas Primesense/DepthSense sensors are using something more sophisticated.
So anyway, my main question here is: can we expect any improvement in the depth data at a later point in time, through improved RGB-IR image processing algorithms? Or is this an inherent limit of the current sensor?
I am from the Project Tango team at Google. I am sorry you are experiencing trouble with depth on the device. Just so that we are sure your device is in good working condition, can you please test the depth performance against a flat wall. Instructions are as below:
https://developers.google.com/project-tango/hardware/depth-test
Even with a device in good working condition, the depth library is known to return sparse depth points on scenes with low IR reflectance objects, small sized objects, high dynamic range scenes, surfaces at certain angles and objects at distances larger than ~4m. While some of these are inherent limitations in the depth solution, we are working with the depth solution provider to bring improvements wherever possible.
Attached an image of a typical conference room scene and the corresponding point cloud. As you can see, 1) no depth points are returned from the laptop screen (low reflectance), the table top objects such as post-its, pencil holder etc (small object sizes), large portions of the table (surface at an angles), room corner at the far right (distance >4m).
But as you move around the device, you will start getting depth point returns. Accumulating depth points is a must to get denser point clouds.
Please also keep us posted on your findings at project-tango-hardware-support#google.com
In my very basic initial experiments, you are correct with respect to depth information returned from the visual field, however, the return of surface points is anything but constant. I find as I move the device I can get major shifts in where depth information is returned, i.e. there's a lot of transitory opacity in the image with respect to depth data, probably due to the characteristics of the surfaces.
So while no return frame is enough, the real question seems to be the construction of a larger model (point cloud to open, possibly voxel spaces as one scales up) to bring successive scans into a common model. It's reminiscent of synthetic aperture algorithms in spirit, but the letters in the equations are from a whole different set of laws.
In short, I think a more interesting approach is to synthesize a more complete model by successive accumulation of point cloud data - now, for this to work, the device team has to have their dead reckoning on the money for whatever scale this is done. Also this addresses an issue that no sensor improvements can address - if your visual sensor is perfect, it still does nothing to help you relate the sides of an object at least be in the close neighborhood of the front of the object.

Invoice / OCR: Detect two important points in invoice image

I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).

distinguishing objects with opencv

I want to identify lego bricks for building a lego sorting machine (I use c++ with opencv).
That means I have to distinguish between objects which look very similar.
The bricks are coming to my camera individually on a flat conveyer. But they might lay in any possible way: upside down, on the side or "normal".
My approach is to teach the sorting machine the bricks by taping them with the camera in lots of different positions and rotations. Features of each and every view are calculated by surf-algorythm.
void calculateFeatures(const cv::Mat& image,
std::vector<cv::KeyPoint>& keypoints,
cv::Mat& descriptors)
{
// detector == cv::SurfFeatureDetector(10)
detector->detect(image,keypoints);
// extractor == cv::SurfDescriptorExtractor()
extractor->compute(image,keypoints,descriptors);
}
If there is an unknown brick (the brick that i want to sort) its features also get calculated and matched with known ones.
To find wrongly matched features I proceed as described in the book OpenCV 2 Cookbook:
with the matcher (=cv::BFMatcher(cv::NORM_L2)) the two nearest neighbours in both directions are searched
matcher.knnMatch(descriptorsImage1, descriptorsImage2,
matches1,
2);
matcher.knnMatch(descriptorsImage2, descriptorsImage1,
matches2,
2);
I check the ratio between the distances of the found nearest neighbours. If the two distances are very similar it's likely that a false value is used.
// loop for matches1 and matches2
for(iterator matchIterator over all matches)
if( ((*matchIterator)[0].distance / (*matchIterator)[1].distance) > 0.65 )
throw away
Finally only symmatrical match-pairs are accepted. These are matches in which not only n1 is the nearest neighbour to feature f1, but also f1 is the nearest neighbour to n1.
for(iterator matchIterator1 over all matches)
for(iterator matchIterator2 over all matches)
if ((*matchIterator1)[0].queryIdx == (*matchIterator2)[0].trainIdx &&
(*matchIterator2)[0].queryIdx == (*matchIterator1)[0].trainIdx)
// good Match
Now only pretty good matches remain. To filter out some more bad matches I check which matches fit the projection of img1 on img2 using the fundamental matrix.
std::vector<uchar> inliers(points1.size(),0);
cv::findFundamentalMat(
cv::Mat(points1),cv::Mat(points2), // matching points
inliers,
CV_FM_RANSAC,
3,
0.99);
std::vector<cv::DMatch> goodMatches
// extract the surviving (inliers) matches
std::vector<uchar>::const_iterator itIn= inliers.begin();
std::vector<cv::DMatch>::const_iterator itM= allMatches.begin();
// for all matches
for ( ;itIn!= inliers.end(); ++itIn, ++itM)
if (*itIn)
// it is a valid match
The result is pretty good. But in cases of extreme alikeness faults still occur.
In the picture above you can see that a similar brick is recognized well.
However in the second picture a wrong brick is recognized just as well.
Now the question is how I could improve the matching.
I had two different ideas:
The matches in the second picture trace back to the features really fitting, but only if the visual field is intensely changed. To recognize a brick I have to compare it in many different positions anyway (at least as shown in figure three). This means I know that I am only allowed to minimally change the visual field. The information how intensely the visual field is changed should be hidden in the fundamental matrix. How can I read out of this matrix how far the position in the room has changed? Especially the rotation and strong scaling should be of interest; if the brick once is taped farer on the left side this shouldn't matter.
Second idea:
I calculated the fundamental matrix out of 2 pictures and filtered out features that don't fit the projections - shouldn't there be a way to do the same using three or more pictures? (keyword Trifocal tensor). This way the matching should become more stable. But I neither know how to do this using OpenCV nor could I find any information on this on google.
I don't have a complete answer, but I have a few suggestions.
On the image analysis side:
It looks like your camera setup is pretty constant. Easy to just separate the brick from the background. I also see your system finding features in the background. This is unnecessary. Set all non-brick pixels to black to remove them from the analysis.
When you have located just the brick, your first step should be to just filter likely candidates based on the size (i.e. number of pixels) in the brick. That way the example faulty match you show is already less likely.
You can take other features into account such as the aspect ratio of the bounding box of the brick, the major and minor axes (eigevectors of the covariance matrix of the central moments) of the brick etc.
These simpler features will give you a reasonable first filter to limit your search space.
On the mechanical side:
If bricks are actually coming down a conveyor you should be able to "straighten" the bricks along a straight edge using something like a rod that lies at an angle to the direction of the conveyor across the belt so that the bricks arrive more uniformly at your camera like so.
Similar to the previous point, you could use something like a very loose brush suspended across the belt to topple bricks standing up as they pass.
Again both these points will limit your search space.

Moving object Opengl Es 2.0

I am a bit confused about that I need to move my basic square .Should i use my translate matrix or just change the object vertexes. Which one is accurate ?.
I use vertex shader
gl_Position = myPMVMatrix * a_vertex;
and also i use VBO
From an accuracy point of view both methods are about equally good.
From a performance point of view, it's about minimizing bottlenecks:
For a single square you are probably not able to measure any differences, but when you think about 1 million squares (or triangles), thinks get a little more complicated:
If all of your triangles change position relative to each other, you are probably better off with changing the vbo, because you can push the data directly to the graphics card's memory, instead of having a million OpenGl calls (which are very slow).
If all your triangles stay at the same position relative to each other (like it is the case in a normal 3d-model) you should just change the transformation matrix. In this case you don't have to push the data again onto the gfx-memory, and you only have one function-call, and you are transfering only a few bytes of data to the gfx-memory.
Depending on your application it may be a good choice to devide your triangles into different categories and update them apropriately.
Don't move objects by changing all of the vertices! What about a complex model with thousands of vertices? Even if it's a simple square, don't evolve such bad practice. That's exactly what transformation matrices are for. You are already using a transformation matrix in your shader code. From the naming I assume it's a premultiplied model-view-projection matrix. So it consists of the model matrix positioning the object in world space (here's where your translation usually should go into), the view matrix positioning the world in eye/camera space (sometimes model and view matrix are combined into a single modelview matrix, like in fixed function GL) and the projection matrix doing any kind of perspective projection and/or transformation to the clipping volume, all three multiplied together as P * V * M. If there are still some questions on these transformation matrices and their use, consult some literature on 3d transformations or just your favourite OpenGL tutorial.

Resources