What do pixel coordinates translated from longitude/latitude represent in d3.js? - d3.js

If I do nothing else but this:
let projection = d3.geoAzimuthalEqualArea()
console.log(projection([-3.0026, 16.7666])) // passing long/lat
I get [473.67353385539417, 213.6120079887163], which are pixel coordinates.
But pixel coordinates relative to WHAT? I haven't even defined the size of the svg container, the center of the map etc. How can it already know the exact pixel position on the screen if I haven't even specified how big the map will be on the page? If it's a bigger map, the point will be further to the right and down, which means a different pixel value.
Also why are those returned values floats with many decimal places if screen pixels can only ever be whole numbers?
And are there maximum values for these pixel coordinates? (I.e. is this projected on some kind of a default sized map?)
Where is this in the API documentation?

The documentation is not very clear about the default dimensions of a projection's output. Somewhat hidden you will find the value in the section on projection.translate():
# projection.translate([translate])
[…]
The default translation offset places ⟨0°,0°⟩ at the center of a 960×500 area.
Those 960×500 are the dimensions of the 2D-plane the geographic data is projected upon.
The output of a projection is the result of a sequence of mathematical operations on the input values, i.e. the geographical data. Floating-point math is used to get as close to the exact values as possible. No assumption is made about the actual use of that result. Although those coordinates may be interpreted as pixel coordinates by simply rounding the values their use is not restricted to this interpretation. Especially, when it comes to vector formats like SVGs, it is quite normal to have floating point coordinate values with the user client, e.g. the browser, doing the calculations to further project those vectors onto the screen taking into account possible translations, rotations, skews, view boxes etc.

Related

Read the width and length of the pixel in .dcm format with dicominfo

I have pictures in .dcm format.
I'm looking for the width and length of the pixel.
As far as I know that Dicominfo gives the information of the picture.
Do you know what parameters are used to obtain the width and length of the pixel in the Dicominfo?
I had an idea that I first need the FOVx "Field Of View" and then I can divide by the number of pixels. This is how I get the width and length of the pixel.
I am very grateful for every answer.
Not sure what you exactly mean by "length". Furthermore, geometrical information (pixel size in mm) may vary regarding the tag numbers, depending on the type of object. The attribute tags I am providing here should work for the majority of DICOM images that have geometrical information at all.
image size in pixels (x,y) -> Columns (0028,0011), Rows (0028,0010)
size of the pixels (y,x) -> Pixel Spacing (0028,0030)
Pixel Spacing is a multi-valued attribute from which you can obtain two values which are separated by a Backslash "\". Not sure how the API of DicomInfo allows access to multiple values in the same attribute.
Note the difference "(y,x)" in Pixel Spacing. This is very unintuitive, but it is like it is.

How to get an oriented contour from vtkMarchingSquares

I have a 2D array of double values, and I want to extract oriented contours at a particular threshold. By "oriented" I mean that the values to the left of the direction of travel should be larger than the threshold, and the values to the right should be smaller.
I am trying to use vtkMarchingSquares, but it just gives me a bunch of tiny line segments across individual grid cells. So I'm trying to stitch them together with vtkStripper.
But the contours come out fragmented, and not in a consistent orientation with respect to the underlying values.
I have also tried setting a vtkMergePoints locator in the vtkMarchingSquares object, but as far as I can tell, the locator's SetTolerance method has no effect on the output.

Matlab - Registration and Cropping of aligned images from two different sources

Good day,
In MATLAB, I have multiple image-pairs of various samples. The images in a pair are taken by different cameras. The images are in differing orientations, though I have created transforms (for each image-pair) that can be applied to correct that. Their bounds contain the same physical area, but one image has smaller dimensions (ie. 50x50 against 250x250). Additionally, the smaller image is not in a consistent location within the larger image. However, the smaller image is within the borders of the larger image.
What I'd like to do is as follows: after applying my pre-determined transform to the larger image, I want to crop the part of the larger image that is of the same as the smaller image.
I know I can specify XData and YData when applying my transforms to output a subset of the transformed image, but I don't know how to relate that to the location of the smaller image. (Note: Transforms were created from control-point structures)
Please let me know if anything is unclear.
Any help is much appreciated.
Seeing how you are specifying control points to get the transformation from one image to another, I'm assuming this is a registration problem. As such, I'm also assuming you are using imtransform to warp one image to another.
imtransform allows you to specify two additional output parameters:
[out, xdata, ydata] = imtransform(in, tform);
Here, in would be the smaller image and tform would be the transformation you created to register the smaller image to warp into the larger image. You don't need to specify the XData and YData inputs here. The inputs of XData and YData will bound where you want to do the transformation. Usually people specify the dimensions of the image to ensure that the output image is always contained within the borders of the image. However in your case, I don't believe this is necessary.
The output variable out is the warped and transformed image that is dictated by your tform object. The other two output variables xdata and ydata are the minimum and maximum x and y values within your co-ordinate system that will encompass the transformed image fully. As such, you can use these variables to help you locate where exactly in the larger image the transformed smaller image appears. If you want to do a comparison, you can use these to crop out the larger image and see how well the transformation worked.
NB: Sometimes the limits of xdata and ydata will go beyond the dimensions of your image. However, because you said that the smaller image will always be contained within the larger image (I'm assuming fully contained), then this shouldn't be a problem. Also, the limits may also be floating point so you'll need to be careful here if you want to use these co-ordinates to crop a minimum spanning bounding box.

Invoice / OCR: Detect two important points in invoice image

I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).

Processing brightness with DWT. What is the general idea?

Folks,
I have read a number of articles on Discrete Wavelet Transform (DWT) and looked at some sample code as well. However, I am not clear on what exactly does DWT achieve.
Here is what I understand. For a two dimensional image in YUV format, I can pass in the Y plane (brightness) to DWT function as a parameter. The function returns me a matrix of the original width and height containing coefficient values.
What are these coefficient values telling me? Is it how fast or slow the brightness of a pixel has changed compared to its neighbors?
Further, the returned matrix is rearranged in four quarters. As the coefficients have been rearranged, I no longer know which coefficient belongs to which pixel. This is confusing. If I cannot associate the coefficient to its corresponding pixel location, how can I really use the coefficients?
A little bit of background. I am looking at hiding some information in an image as an invisible watermark. From what I understand, DWT can help me identify the best region to hide the information. However, I have not been able to put the whole picture together.
Ok. I figured out how DWT works. I was under the assumption that the coefficients generated have a relationship with the original image. However, the transform converts the input luma into a completely different set. It is possible to run the reverse transform on the new values to once again obtain the original values.
Regards,
Peter

Resources