Mahout algorithm advice

Mahout algorithm advice - algorithm

What I need is actually just a hint where I can start.
I'm somewhat familiar to Mahout, at least theoretically. I know how it work, how to set it up, etc, and I could build a simple recommendation system based in collaborative filtering.
However, now I'm trying to do something more complex and even after reading quite some about different algorithms, I'm not sure which direction to go.
Quickly what I want to do is:
The final goal is to define one scalar (a "score") of each one of a set of entities based on some "known" entities. The entities interact with each other, known scores influence and define the unknown ones. You can imagine with the following example.
I have a lot if white clothes and a few pieces of colorful ones; red, blue, green... I put them into the washing machine. I want to know what colors the white ones will get after the wash.
Things to take into account:
we make a series of washing with different "actors"... some clothes are washed in the 1st and 3rd washing, some of them only in the 2nd, some of them are washed in all
in consecutive washes the clothes that were white before but now colored also influence the rest, but not as strong (as they are not as colored)
some colors don't "color" as much as others. for example red has a strong effect on most of the clothes, but green not so much
coloring effect also depends on how many clothes are in one washing. If you wash a red shirt with a white t-shirt, it gets much more colored, than if there were 100 other white t-shirt
clothes don't "lose" their color when influencing others
You can see that while calculating, entities actually have 2 assigned scalars:
the color hue (this also defines "coloring power" as mentioned above). The hue can be represented as a number, from 0 to 1, let's say. The coherence between the coloring power and the color number is not linear. It is more like the ends of the scale have more coloring power (0 and 1) while the middle (0.5) has less
the color "lightness" (how much an entity is colored, for originally colored clothes it's 1, for white ones it's 0), which in the same time also defines coloring power regardless of the hue
So, again, what I know:
which clothes where washed in which consecutive washing
I know the original color of some of them, the rest is white in the beginning
What I want to know:
- the hue of all clothes in the end of the washing
The problem is that I don't know what (type) of algorithm should I start with. If you were so kind to read so far, please suggest me something (or further reading).
Obviously I don't ask for any detailed thing, again, only hints.
Thank you!

The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final color). You can look up PageRank algorithms but it is essentially a matter of calculating eigenvectors of a big, erm, sock color matrix.

Related

anyway to remove algorithmically discolorations from aerial imagery

I don't know much about image processing so please bear with me if this is not possible to implement.
I have several sets of aerial images of the same area originating from different sources. The pictures have been taken during different seasons, under different lighting conditions etc. Unfortunately some images look patchy and suffer from discolorations or are partially obstructed by clouds or pix-elated, as par example picture1 and picture2
I would like to take as an input several images of the same area and (by some kind of averaging them) produce 1 picture of improved quality. I know some C/C++ so I could use some image processing library.
Can anybody propose any image processing algorithm to achieve it or knows any research done in this field?

I would try with a "color twist" transform, i.e. a 3x3 matrix applied to the RGB components. To implement it, you need to pick color samples in areas that are split by a border, on both sides. You should fing three significantly different reference colors (hence six samples). This will allow you to write the nine linear equations to determine the matrix coefficients.
Then you will correct the altered areas by means of this color twist. As the geometry of these areas is intertwined with the field patches, I don't see a better way than contouring the regions by hand.
In the case of the second picture, the limits of the regions are blurred so that you will need to blur the region mask as well and perform blending.
In any case, don't expect a perfect repair of those problems as the transform might be nonlinear, and completely erasing the edges will be difficult. I also think that colors are so washed out at places that restoring them might create ugly artifacts.
For the sake of illustration, a quick attempt with PhotoShop using manual HLS adjustment (less powerful than color twist).

The first thing I thought of was a kernel matrix of sorts.
Do a first pass of the photo and use an edge detection algorithm to determine the borders between the photos - this should be fairly trivial, however you will need to eliminate any overlap/fading (looks like there's a bit in picture 2), you'll see why in a minute.
Do a second pass right along each border you've detected, and assume that the pixel on either side of the border should be the same color. Determine the difference between the red, green and blue values and average them along the entire length of the line, then divide it by two. The image with the lower red, green or blue value gets this new value added. The one with the higher red, green or blue value gets this value subtracted.
On either side of this line, every pixel should now be the exact same. You can remove one of these rows if you'd like, but if the lines don't run the length of the image this could cause size issues, and the line will likely not be very noticeable.
This could be made far more complicated by generating a filter by passing along this line - I'll leave that to you.
The issue with this could be where there was development/ fall colors etc, this might mess with your algorithm, but there's only one way to find out!

How to count the number of spots in this image?

I am trying to count the number of hairs transplanted in the following image. So practically, I have to count the number of spots I can find in the center of image.
(I've uploaded the inverted image of a bald scalp on which new hairs have been transplanted because the original image is bloody and absolutely disgusting! To see the original non-inverted image click here. To see the larger version of the inverted image just click on it). Is there any known image processing algorithm to detect these spots? I've found out that the Circle Hough Transform algorithm can be used to find circles in an image, I'm not sure if it's the best algorithm that can be applied to find the small spots in the following image though.
P.S. According to one of the answers, I tried to extract the spots using ImageJ, but the outcome was not satisfactory enough:
I opened the original non-inverted image (Warning! it's bloody and disgusting to see!).
Splited the channels (Image > Color > Split Channels). And selected the blue channel to continue with.
Applied Closing filter (Plugins > Fast Morphology > Morphological Filters) with these values: Operation: Closing, Element: Square, Radius: 2px
Applied White Top Hat filter (Plugins > Fast Morphology > Morphological Filters) with these values: Operation: White Top Hat, Element: Square, Radius: 17px
However I don't know what to do exactly after this step to count the transplanted spots as accurately as possible. I tried to use (Process > Find Maxima), but the result does not seem accurate enough to me (with these settings: Noise tolerance: 10, Output: Single Points, Excluding Edge Maxima, Light Background):
As you can see, some white spots have been ignored and some white areas which are not actually hair transplant spots, have been marked.
What set of filters do you advise to accurately find the spots? Using ImageJ seems a good option since it provides most of the filters we need. Feel free however, to advise what to do using other tools, libraries (like OpenCV), etc. Any help would be highly appreciated!

I do think you are trying to solve the problem in a bit wrong way. It might sound groundless, so I'd better show my results first.
Below I have a crop of you image on the left and discovered transplants on the right. Green color is used to highlight areas with more than one transplant.
The overall approach is very basic (will describe it later), but still it provides close to be accurate results. Please note, it was a first try, so there is a lot of room for enhancements.
Anyway, let's get back to the initial statement saying you approach is wrong. There are several major issues:
the quality of your image is awful
you say you want to find spots, but actually you are looking for hair transplant objects
you completely ignores the fact average head is far from being flat
it does look like you think filters will add some important details to your initial image
you expect algorithms to do magic for you
Let's review all these items one by one.
1. Image quality
It might be very obvious statement, but before the actual processing you need to make sure you have best possible initial data. You might spend weeks trying to find a way to process photos you have without any significant achievements. Here are some problematic areas:
I bet it is hard for you to "read" those crops, despite the fact you have the most advanced object recognition algorithms in your brain.
Also, your time is expensive and you still need best possible accuracy and stability. So, for any reasonable price try to get: proper contrast, sharp edges, better colors and color separation.
2. Better understanding of the objects to be identified
Generally speaking, you have a 3D objects to be identified. So you can analyze shadows in order to improve accuracy. BTW, it is almost like a Mars surface analysis :)
3. The form of the head should not be ignored
Because of the form of the head you have distortions. Again, in order to get proper accuracy those distortions should be corrected before the actual analysis. Basically, you need to flatten analyzed area.
3D model source
4. Filters might not help
Filters do not add information, but they can easily remove some important details. You've mentioned Hough transform, so here is interesting question: Find lines in shape
I will use this question as an example. Basically, you need to extract a geometry from a given picture. Lines in shape looks a bit complex, so you might decide to use skeletonization
All of a sadden, you have more complex geometry to deal with and virtually no chances to understand what actually was on the original picture.
5. Sorry, no magic here
Please be aware of the following:
You must try to get better data in order to achieve better accuracy and stability. The model itself is also very important.
Results explained
As I said, my approach is very simple: image was posterized and then I used very basic algorithm to identify areas with a specific color.
Posterization can be done in a more clever way, areas detection can be improved, etc. For this PoC I just have a simple rule to highlight areas with more than one implant. Having areas identified a bit more advanced analysis can be performed.
Anyway, better image quality will let you use even simple method and get proper results.
Finally
How did the clinic manage to get Yondu as client? :)
Update (tools and techniques)
Posterization - GIMP (default settings,min colors)
Transplant identification and visualization - Java program, no libraries or other dependencies
Having areas identified it is easy to find average size, then compare to other areas and mark significantly bigger areas as multiple transplants.
Basically, everything is done "by hand". Horizontal and vertical scan, intersections give areas. Vertical lines are sorted and used to restore the actual shape. Solution is homegrown, code is a bit ugly, so do not want to share it, sorry.
The idea is pretty obvious and well explained (at least I think so). Here is an additional example with different scan step used:
Yet another update
A small piece of code, developed to verify a very basic idea, evolved a bit, so now it can handle 4K video segmentation in real-time. The idea is the same: horizontal and vertical scans, areas defined by intersected lines, etc. Still no external libraries, just a lot of fun and a bit more optimized code.
Additional examples can be found on YouTube: RobotsCanSee
or follow the progress in Telegram: RobotsCanSee

I've just tested this solution using ImageJ, and it gave good preliminary result:
On the original image, for each channel
Small (radius 1 or 2) closing in order to get rid of the hairs (black part in the middle of the white one)
White top-hat of radius 5 in order to detect the white part around each black hair.
Small closing/opening in order to clean a little bit the image (you can also use a median filter)
Ultimate erode in order to count the number of white blob remaining. You can also certainly use a LoG (Laplacian of Gaussian) or a distance map.
[EDIT]
You don't detect all the white spots using the maxima function, because after the closing, some zones are flat, so the maxima is not a point, but a zone. At this point, I think that an ultimate opening or an ultimate eroded would give you the center or each white spot. But I am not sure that there is a function/pluggin doing it in ImageJ. You can take a look to Mamba or SMIL.
A H-maxima (after white top-hat) may also clean a little bit more your results and improve the contrast between the white spots.

As Renat mentioned, you should not expect algorithms to do magic for you, however I'm hopeful to come up with a reasonable estimate of the number of spots. Here, I'm going to give you some hints and resources, check them out and call me back if you need more information.
First, I'm kind of hopeful to morphological operations, but I think a perfect pre-processing step may push the accuracy yielded by them dramatically. I want you put my finger on the pre-processing step. Thus I'm going ti work with this image:
That's the idea:
Collect and concentrate the mass around the spot locations. What do I mean my concentrating the masses? Let's open the book from the other side: As you see, the provided image contains some salient spots surrounded by some noisy gray-level dots.
By dots, I mean the pixels that are not part of a spot, but their gray-value are larger than zero (pure black) - which are available around the spots. It is clear that if you clear these noisy dots, you surely will come up with a good estimate of spots using other processing tools such as morphological operations.
Now, how to make the image more sharp? What if we could make the dots to move forward to their nearest spots? This is what I mean by concentrating the masses over the spots. Doing so, only the prominent spots will be present in the image and hence we have made a significant step toward counting the prominent spots.
How to do the concentrating thing? Well, the idea that I just explained is available in this paper, which its code is luckily available. See the section 2.2. The main idea is to use a random walker to walk on the image for ever. The formulations is stated such that the walker will visit the prominent spots far more times and that can lead to identifying the prominent spots. The algorithm is modeled Markov chain and The equilibrium hitting times of the ergodic Markov chain holds the key for identifying the most salient spots.
What I described above is just a hint and you should read that short paper to get the detailed version of the idea. Let me know if you need more info or resources.
That is a pleasure to think on such interesting problems. Hope it helps.

You could do the following:
Threshold the image using cv::threshold
Find connected components using cv::findcontour
Reject the connected components of size larger than a certain size as you seem to be concerned about small circular regions only.
Count all the valid connected components.
Hopefully, you have a descent approximation of the actual number of spots.
To be statistically more accurate, you could repeat 1-4 for a range of thresholds and take the average.

This is what you get after applying unsharpen radius 22, amount 5, threshold 2 to your image.
This increases the contrast between the dots and the surrounding areas. I used the ballpark assumption that the dots are somewhere between 18 and 25 pixels in diameter.
Now you can take the local maxima of white as a "dot" and fill it in with a black circle until the circular neighborhood of the dot (a circle of radius 10-12) erases the dot. This should let you "pick off" the dots joined to each other in clusters more than 2. Then look for local maxima again. Rinse and repeat.
The actual "dot" areas are in stark contrast to the surrounding areas, so this should let you pick them off as well as you would by eyeballing it.

improve cartographic visualization

I need some advice about how to improve the visualization of cartographic information.
User can select different species and the webmapping app shows its geographical distribution (polygonal degree cells), each specie with a range of color (e.g darker orange color where we find more info, lighter orange where less info).
The problem is when more than one specie overlaps. What I am currently doing is just to calculate the additive color mix of two colors using http://www.xarg.org/project/jquery-color-plugin-xcolor/
As you can see in the image below, the resulting color where two species overlap (mixed blue and yellow) is not intuitive at all.
Someone has any idea or knows similar tools where to get inspiration? for creating the polygons I use d3.js, so if more complex SVG features have to be created I can give a try.
Some ideas I had are...
1) The more data on a polygon, the thicker the border (or each part of the border with its corresponding color)
2) add a label at the center of polygon saying how many species overlap.
3) Divide polygon in different parts, each one with corresponding species color.
thanks in advance,
Pere

My suggestion is something along the lines of option #3 that you listed, with a twist. Rather painting the entire cell with species colors, place a dot in each cell, one for each species. You can vary the color of each dot in the same way that you currently are: darker for more, ligher for less. This doesn't require you to blend colors, and it will expose more of your map to provide more context to the data. I'd try this approach with the border of the cell and without, and see which one works best.
Your visualization might also benefit from some interactivity. A tooltip providing more detailed information and perhaps a further breakdown of information could be displayed when the user hovers his mouse over each cell.
All of this is very subjective. However one thing's for sure: when you're dealing with multi-dimensional data as you are, the less you project dimensions down onto the same visual/perceptual axis, the better. I've seen some examples of "4-dimensional heatmaps" succeed in doing this (here's an example of visualizing latency on a heatmap, identifying different sources with different colors), but I don't think any attempt's made to combine colors.

My initial thoughts about what you are trying to create (a customized variant of a heat map for a slightly crowded data set, I believe:
One strategy is to employ a formula suggested for
n + 1
with regards to breaks in bin spacing. This causes me some concern regarding how many outliers your set has.
Equally-spaced breaks are ideal for compact data sets without
outliers. In many real data sets, especially proteomics data sets,
outliers can make this representation less effective.
One suggestion I have would be to consider the idea of adding some filters to your categories if you have not yet. This would allow slimming down the rendered data for faster reading by the user.
another solution would be to use something like (Comprehensive) R
or maybe even DanteR
Tutorial in displaying mass spectrometry-based proteomic data using heat maps
(Particularly worth noting I felt, was 'Color mapping'.)

Algorithm to simulate color blindness?

There are many tools online that take images and simulate what that image might look like to someone with color blindness. However, I can't find any descriptions of these algorithms.
Is there a standard algorithm used to simulate color blindness? I'm aware that there are many types of color blindness (see the Wikipedia page on the subject for more details), but I'm primarily interested in algorithms for simulating dichromacy.

I had the same frustration and wrote an article comparing opensource color blindness simulations. In short, there are four main algorithms:
Coblis and the "HCIRN Color Blind Simulation function". You'll find this one in many places, and a Javascript implementation by MaPePeR. The full HCIRN simulation function was not properly evaluated but is reasonable in practice. However the "ColorMatrix" approximation by colorjack is very inaccurate and should be totally avoided (the author himself said that). Unfortunately it's still widespread as it was easy to copy/paste.
"Computerized simulation of color appearance for dichromats" by Brettel, Viénot, and Mollon (1997). A very solid reference. Works for all kinds of dichromacies. I wrote a public domain C implementation in libDaltonLens.
"Digital video colourmaps for checking the legibility of displays by dichromats" by Viénot, Brettel and Mollon (1999). A solid reference too, simplifies the 1997 paper for protanopia and deuteranopia (2 of the 3 kinds of color blindness). Also in libDaltonLens.
"A Physiologically-based Model for Simulation of Color Vision Deficiency" by Machado et al. (2009). Precomputed matrices are available on their website, which makes it easy to implement yourself. You just need to add the conversion from sRGB to linearRGB.

Looks like you're answer is in the wikipedia entry you linked.
For example:
Protanopia (1% of males): Lacking the long-wavelength sensitive
retinal cones, those with this condition are unable to distinguish
between colors in the green–yellow–red section of the spectrum. They
have a neutral point at a greenish wavelength around 492 nm – that is,
they cannot discriminate light of this wavelength from white.
So you need to de-saturate any colors in the green-yellow-red spectrum to white.
Image color saturation
The other 2 types of dichromacy can be handled similarly.

First we have to understand how the eye works:
A regular/healthy eye has 3 types of cones and 1 type of rods that have activation functions over the visible spectrum of light.
Their activations then pass through some function to produce the signal that goes to your brain. Roughly speaking, the function takes 4 channels as input and produces 3 channels as output (namely lightness, yellow-blue and red-green).
A colorblind person would have one of those two things be different (afaik usually/always 1.), so for example the person would be missing one type of cone or the cone's activation would be different.
The best thing to do would be:
Convert all pixels from RGB space to a combination of frequencies (with intensities). To do this, first take calculate the activations of each of the three cones (of a healthy person) then find a "natural" solution for a set of frequencies (+ intensities) that would result in the same activation. Of course, one solution is just the original three RGB frequencies with their intensities, but it is unlikely that the original image actually had that. A natural solution would be for example a normal distribution around some frequency (or even just one frequency).
Then, (again for each pixel) calculate the activations of a colorblind person's cones to your combination of frequencies.
Finally, find an RGB value such that a healthy person would have the same activations as the ones the colorblind person has.
Note that, if the way these activations are combined is also different for the relevant type of colorblindness, you might want to carry that out as well in the above steps. (So instead of matching activations, you are matching the result of the function over the activations).

Calculate how humans perceive similarity between different colours

I'm working on a site where users can describe a physical object using (amongst many other things) any color in the rgb 0-255 range. We offer some simplified palettes for easy clicking but a full color wheel is a requirement.
Behind the scenes, one of the processes compares two user descriptions of the object and scores them for similarity.
What I'm trying to do is get a score for how similar the 2 colors are in terms of human perception . Basically, the algorithm needs to determine if a 2 humans picking 2 different colors could be describing the same object. Thus Light Red->Red should be 100%, Most of the shades of grey will be 100% to each other, etc but red-> green is definitely not a match.
To get a decent look at how the algorithms were working, I plotted grayscale and 3 intensities of each hue against every other color in the set and indicated no match (0%) with black, visually identical (100%) with white and grayscale to indicate the intermediate values.
My first (very simplistic approach) was to simply treat the RGB values as co-ordinates in the colour cube and work out the distance (magnitude of the vector) between them.
This threw out a number of problems with regards to Black->50% Grey being a larger distance than (say) Black->50% Blue. having run hundreds of comparisons and asked for feedback, this doesn't seem to match human perception (shown below)
Method 2 converted the RGB values into HSV. I then generated a score based 80% on hue with the other 20% on Sat/Lum. This seems to be the best method so far but still throws some odd matches
Method 3 was an attempt at a hybrid - HSL Values were calculated but the final score was based upon the distance between the 2 colors in the HSL color cylinder space (as in 3D polar co-ordinates).
I feel like I must be re-inventing the wheel - surely this has been done before? I can't find any decent examples on Google and as you can see my approach leaves something to be desired.
So, my question is:
Is there a standard way to do this? If so, how? If not, can anyone suggest a way to improve my approach? I can provide code snippets if required but be warned it's currently messy as hell due to 3 days of tweaking.
Solution (Delta E 2000):
Using the suggestions provided below, I've implemented a Delta E 2000 comparer. I've had to tweak the weighting values to be quite large - I'm not looking for colors which are imperceptibly different but which are not hugely different. In case anyone's interested, the resulting plot is below...

There are a half dozen or so possibilities. EasyRGB has a page devoted to them. Of those listed, DeltaE 2000 probably has the best correlation with human perception -- and is also extremely complex to compute. Delta CMC is almost as good for something like half the code (though the computation still isn't entirely trivial).

I'm not 100% clear on how your problem is set up, but you may want to read up on: Normalized Cross Correlation, and Lab and CIEXYZ color spaces.

This sounds like a prime example for a neural net based approach (if you are in an experimenting mode :) because it's about creating a decision rule that mimics Human perception. A neural net that has six inputs (r, r', g, g', b, b') and one output (is_similar) can be easily trained by using e.g. your own perception of similarity as the training source!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio