This is no actual single coding problem, rather a problem of the right approach to a complex issue.
So, I have built a rather complex svg visualisation of my XML data using xslt. It looks like this:
(source: erksst.de)
This is just a small sample of the whole data. There are two or three rows. Each row could contain up to 160 yellow boxes.
(The yellow boxes are letter collections, the blue/grey boxes single letters, the lines represent their way of dissemination.)
It works well so far but I want to optimize it:
(1) minimize the number of line crossing
(2) minimize the number of lines crossing a blue/grey box
(3) minimize the lines being too near to another line.
To achieve this there are things to vary:
(a) The broadest row (in the sample it is the third) is fix. It can't be moved. But the other (two) can be moved in the range of the width of the broadest row. I.e. in my example the yellow box of the second row could be moved some 160 pixels to the right.
(b) Furthermore, in the two smaller rows the margin between the yellow boxes could be varied. In my example there is just one per line. But of course there could be more than one yellow box in the two smaller rows.
(c) The order of the yellow boxes within a row could be altered.
So, many possibilites to realize this visualisation.
The problem is the performance time.
I have started with the line crossing problem by using a function which kind of pre-builds the visualisation and calculates the number of crossings.
The variation with the smallest number of crossings is actually built in the output.
The problem is the time it needs. The transformation with just 100 possibilites and my hole XML data took 90 seconds. Doesn't sound like much, but taking into account that 100 variations are just a very small part of all theoritically possible options and that the visualition should at some point in the future build on the fly on server for a user's selection of the data 90 seconds simply is way too much.
I have allready reduced the visualition template for the calculate line crossings functions to all what is necessary leaving asside all captions and so on. That did help, but not as much as expected.
The lines are drawn as follows: First, all boxes are drawn keeping their id from the original data. Then I go back to my data, look where connections are and build the lines.
You could transform your XML into the DOT language (plain TXT format) by XSLT and process it by GraphViz. I solved some similar issue (although not so huge as yours seems to be) this way.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Detecting thin lines in blurry image
So as the title says, I am trying to detect boundaries of patterns. In the images attached, you can basically see three different patterns.
Close stripe lines
One thick L shaped line
The area between 1 & 2
I am trying to separate these three, in say 3 separate images. Depend on where the answers go, I will upload more images if needed. Both idea or code will be helpful.
You can solve (for some values of "solve") this problem using morphology. First, to make the image more uniform, remove irrelevant minima. One way to do this is using the h-dome transform for regional minima, which suppresses minima of height < h. Now, we want to join the thin lines. That is accomplished by a morphological opening with a horizontal line of length l. If the lines were merged, then the regional minima of the current image is the background. So we can fill holes to obtain the relevant components. The following code summarizes these tasks:
f = rgb2gray(imread('http://i.stack.imgur.com/02X9Z.jpg'));
hm = imhmin(f, h);
o = imopen(hm, strel('line', l, 0));
result = imfill(~imregionalmin(o), 'holes');
Now, you need to determine h and l. The parameter h is expected to be easier since it is not related to the scale of the input, and in your example, values in the range [10, 30] work fine. To determine l maybe a granulometry analysis could help. Another way is to check if the result contains two significant connected components, corresponding to the bigger L shape and the region of the thin lines. There is no need to increase l one by one, you could perform something that resembles a binary search.
Here are the hm, o and result images with h = 30 and l = 15 (l in [13, 19] works equally good here). This approach gives flexibility on parameter choosing, making it easier to pick/find good values.
To calculate the area in the space between the two largest components, we could merge them and simply count the black pixels inside the new connected component.
You can pass a window (10x10 pixels?) and collect features for that window. The features could be something as simple as the cumulative gradients (edges) within that window. This would distinguish the various areas as long as the window is big enough.
Then using each window as a data point, you can do some clustering, or if the patterns don't vary that much you can do some simple thresholds to determine which data points belong to which patterns (the larger gradient sums belong to the small lines: more edges, while the smallest gradient sums belong to the thickest lines: only one edge, and those in between belong to the other "in-between" pattern .
Once you have this classification, you can create separate images if need be.
Just throwing out ideas. You can binarize the image and do connected component labelling. Then perform some analysis on the connected components such as width to discriminate between the regions.
I need some advice about how to improve the visualization of cartographic information.
User can select different species and the webmapping app shows its geographical distribution (polygonal degree cells), each specie with a range of color (e.g darker orange color where we find more info, lighter orange where less info).
The problem is when more than one specie overlaps. What I am currently doing is just to calculate the additive color mix of two colors using http://www.xarg.org/project/jquery-color-plugin-xcolor/
As you can see in the image below, the resulting color where two species overlap (mixed blue and yellow) is not intuitive at all.
Someone has any idea or knows similar tools where to get inspiration? for creating the polygons I use d3.js, so if more complex SVG features have to be created I can give a try.
Some ideas I had are...
1) The more data on a polygon, the thicker the border (or each part of the border with its corresponding color)
2) add a label at the center of polygon saying how many species overlap.
3) Divide polygon in different parts, each one with corresponding species color.
thanks in advance,
Pere
My suggestion is something along the lines of option #3 that you listed, with a twist. Rather painting the entire cell with species colors, place a dot in each cell, one for each species. You can vary the color of each dot in the same way that you currently are: darker for more, ligher for less. This doesn't require you to blend colors, and it will expose more of your map to provide more context to the data. I'd try this approach with the border of the cell and without, and see which one works best.
Your visualization might also benefit from some interactivity. A tooltip providing more detailed information and perhaps a further breakdown of information could be displayed when the user hovers his mouse over each cell.
All of this is very subjective. However one thing's for sure: when you're dealing with multi-dimensional data as you are, the less you project dimensions down onto the same visual/perceptual axis, the better. I've seen some examples of "4-dimensional heatmaps" succeed in doing this (here's an example of visualizing latency on a heatmap, identifying different sources with different colors), but I don't think any attempt's made to combine colors.
My initial thoughts about what you are trying to create (a customized variant of a heat map for a slightly crowded data set, I believe:
One strategy is to employ a formula suggested for
n + 1
with regards to breaks in bin spacing. This causes me some concern regarding how many outliers your set has.
Equally-spaced breaks are ideal for compact data sets without
outliers. In many real data sets, especially proteomics data sets,
outliers can make this representation less effective.
One suggestion I have would be to consider the idea of adding some filters to your categories if you have not yet. This would allow slimming down the rendered data for faster reading by the user.
another solution would be to use something like (Comprehensive) R
or maybe even DanteR
Tutorial in displaying mass spectrometry-based proteomic data using heat maps
(Particularly worth noting I felt, was 'Color mapping'.)
Very casual javscript user here. Hopefully this makes sense.
I am tracking 20 different series on a stacked area chart with nvd3.js. These series fluctuate between positive and negative values, some on a huge base and some on a small base. The result is that - when one of the really big series is below the x axis - it pushes everything else underneath too, and the positive series won't appear above the x axis until you filter out the bigger players using the key.
The technically inelegant but good looking solution I have come up with is to split all of my negative values into one array, and all of my positives into another. The top half of the page is a positive values graph, the bottom half is negative values and they line up pretty nicely.
The weakness with this approach is when you go to interact with it as an end user. If I filter out a series (by unchecking it in the key) or change the graph mode (with the type selector) or zoom in on a series (by clicking it so the graph refocuses to that series only) then it will only affect whichever graph you clicked on. I would like to adjust these three click events (and any others I've missed?) so that your action is synchronised across both graphs.
Is this achievable? Any reading material I can dig through where somebody has done something similar? I imagine linking two representations of one data set (e.g a pie and column graph) is vaguely analogous.
Let's say I have an encrypted file on an iPhone and every time I want to decrypt it, I want to "draw" a decryption symbol instead of having to use a keyboard to type it in.
If you request from the user to draw a symbol to decrypt a file every time it is needed (e.g. every time they launch your application) they would probably prefer it to having to type a 20 character or so password on the tiny keyboard, and they would still get the security a 20 character password would give them (depending on how complicated the shape/symbol they draw is).
The symbol they would draw would most likely be one stroke (e.g. it's over once you lift up your finger) but can be very complex, such that it's hard for someone else to repeat it, even if they do see you draw it in. Kind of like how every person's signature is unique and hard to duplicate. Actually, this may just overly complicate it if it had to prevent from being duplicated, so for now this can be ignored and we can assume that the symbol will not be seen by someone else and thus it doesn't matter if it could be repeated by them or not.
I guess the real question is how would you convert the same (reasonably) stroke consistently to the same key (e.g. hash value). There should obviously be some threshold of forgiveness in the algorithm, because the user can't be expected to repeat the stroke exactly 100%.
Using the symbol as a decryption method adds a whole other dimension to this problem. You never want to store the generated hash value anywhere in unencrypted form, cause then someone might be able to access that part of the hard drive and get the decryption key without needing to go through the whole drawing process and decrypt the file manually. You also most likely don't want to store anything about how the shape is drawn.
A good example of a stroke that a user might use as their decryption symbol is the "&" symbol. Imagine a user drawing this symbol on their iPhone every time they need to decrypt a file. The size of the symbol might not be the same every time it is drawn. Also, the rotation of the symbol may be different depending on how the user holds their device. Ideally, in both cases, because the symbol was drawn, relative to the user strokes, the same, it should be able to generate the same hash value and thus decrypt the file.
I thought something like shape or character recognition is a similar algorithm. Where the user draws something (reasonably representing a shape) and it then fixes it to the correct shape which would have the same hash value every time it is drawn. However, for something like this you would most likely need a database of shapes that can be drawn, and if you choose something like all the letters in the alphabet, you only get 26 letters. And assuming the user should only need to draw one symbol to decrypt the file, you have an extremely insecure password with only 26 possibilities.
Another thing I thought of is you could break up the symbol that is drawn into tiny segments and then run symbol recognition on those. So imagine that you have 4 symbols in a database: vertical line, horizontal line, and diagonal in both directions. Now as the user draws, each segment is recognized as one of these, and then they are all combined to form some hash value. So imagine the user chose as their decryption symbol the lowercase letter "r". So they would begin by drawing a vertical line down, followed by a vertical line up, followed by a diagonal line up and to the right. One problem with this method is how would you know when to split up the stroke into individual segments? You would probably also want to take into account how long each individual segment is roughly (e.g. in increments of 40 pixels). That way if someone drew a deformed "r" where the hump comes out near the bottom it isn't recognized as the same symbol and thus wouldn't decrypt the file.
A third method might be dividing the screen up into a grid (not sure which size yet) and simply seeing in which cells the stroke is drawn and using this data somehow to generate a string.
Any other ideas of how this could be implemented? Have you ever heard of something like this? Are there any fundamental flaws that would prevent a system like this from working?
Thanks
I would try a variation of the segmentation variant: Recognize simple patterns - I'll stick to straight and diagonal lines for this, but in theory you could also add circles, arcs and perhaps other things.
You can be quite sure when one line ends and another one starts as there are 8 directions and you can detect a direction change (or for a simpler approach, just detect pen up and pen down and use them as line delimiters). The first line gives a scale factor, so the length of every other line can be represented as a factor (for example, in an usual L shape, the first vertical line would give the "base length" b and the other line would then have the length of roughly 0.5 * b). After the user is finished, you can use the smallest factor s to "round" the lengths, so that you'll have an array of integer lengths like [1 * s, 2 * s, 4 * s, 5 * s]. This will prevent the system from being too exact, and using the base length makes the system robust against scaling.
Now somehow convert these informations (lengths and directions) to a string (or a hash value, whatever you like) and it will be the same for the same strokes, even if the symbol is translated or scaled.
Additionally, you can store an 2D offset value (of course "rounded", too) for every line after the second line so that the lines will also have to be at the same position, if you don't do this, L and T will most likely get the same string (1 line up-down, 1 line left-right length 0.5). So storing positions strengthens the whole thing a bit but is optional.
EDIT:
If you take the angle of the first line as a base angle, you can even make this robust to rotation.
Please note that this algorithm only gives 3 bits per stroke if all lines are of the same length and a maximum of perhaps up to 6-8 bits per stroke, some more if you store positions, too. This means you'd need a quite complex symbol of about 20-40 strokes to get 128 bits of security.
An easy way to add more variation/security would be to let the user use different colors from a given palette.
To reduce the risk of someone watching you, you could make each line disappear after it has been drawn or change the color to a color with a very low contrast to the background.
The problem of encrypting data with keymaterial that may have small errors has been studied quite extensively.
In particular there are a number of proposals for protecting data using biometric data (e.g. fingerprints or a retina scan) as a key. A typical approach is to use an appropriate error correction code, take your original key material K, compute the syndrome of it and only store the syndrome. Once you get a second reading of your key material K', the syndrome can be use to restore K from K' if K and K' are close enough (where 'close enough" of course depends on the error correction scheme.)
To get you started, here is a paper proposing a fuzzy vault scheme. This is a general proposal for an encryption scheme using a "fuzzy" key. Of course, you still need to examine how to extract characteristics from drawings that are stable enough for using such an error correction scheme. You will also have to examine how much entropy you can extract from such drawings. As bad as passwords are with respect to entropy, they may still be hard to beat.
Handwriting recognition often takes the duration of the stroke into account more than the actual length and such.
While it relates to pressure sensitivity, I think you may be able to see some similar conceptual bits to what you are thinking here.... jdadesign.net/safelock/
That's not exactly the same topic, but it's the closest thing that comes to mind at the moment.
I don't think that you could get enough "bits" from a hand-drawn symbol to perform secure encryption. As you note, you have to allow enough slop in the recognition that the natural variations in drawing will be tolerated. In other words, you have to discard noise in the strokes, smoothing them into reproducible signal. But noise (high entropy) makes better cryptographic keys.
Think of it this way. If you did decompose the gesture into segments of up, down, left, and right, each segment would represent 2 bits of information. For an AES key, the symbol would need 64 such segments. That's a pretty complicated gesture to remember. And if it's simplified by repeating many segments in a row ("right, right, right, ...") it makes a lousy (predictable, non-random) key.
I've had another think about this. I'm not a comp-sci person, but would something like this work.
Let's say that with whatever symbol or "pattern" someone draws. The only viable thing you're left with to analyze are all the points in the pattern generated in touchBegan, touchMoved and touchEnded events.
So... let's take all the points generated, be it 100 or 1,000,000, it doesn't really matter.
Divide them into groups, as many groups as you want. The more the merrier I assume, but for this example, let's put them in 4 groups. With 100 points, group 1 would contain points 1 > 25, group 2 contains 26 > 50 and so on.
For each group, use all the points to calculate an average position.
It may work better if the canvas spaces is divided into a grid, and the 'average positions' get plotted onto their nearest coordinate.
Then check the relative distance between all the groups. So the between 1,2 1,3 1,4 2,3 2,4 3,4.
You now have as many distinct points, and information about those points to generate a key. The averages and the grid should help smooth out some, if not all of the entropy.
You may have to ask the user to draw their pattern a few times, and compare each group against groups from previous attempts. That way, you can identify which groups the users can plot consistently. It has the added benefit of training the users hand at drawing their pattern.
I suspect that the more points and groups you have, the more accurate this will be.
In fact, I'm going to give it a try myself.
Gestures.
http://depts.washington.edu/aimgroup/proj/dollar/
You can define you own algorithms for particular gestures. EG a circle,
1.Find the start point
2. find the most left, most right and furthest for points and get an approximate radius.
3. check all the points against the radius with a margin of error (25%?)
4. If the radius checks out, you have a circle.
Vertical Straight line:
1. Check the start point and end point X and Y positions.
2. Compare the inbetween points against the x and y of the start and end.
3. If they're roughly on the same X coord, but ascending or descending Y coords, you have a vertical line.
And so on, getting more complex for more complicated gestures.
You can even combine gestures. So let's say you have an algorithm for 6 gestures. You can combine them to form different symbols. The order in which the gestures are created could be important, adding an extra layer of security.
what if you took all of the x,y coordinates of the stroke and preformed some sort of linear 2 way operation on them? You could then compute an 'approximate' hash, and if the number calculated when the stroke is within... say 10% of your approximation, then you grant access..
It all depends on what kind of attack you are trying to prevent against. If you want full on encryption, where you assume that the attacker has full access to the encrypted file, then you'll need quite a lot of bits of entropy to achieve a decent level of protection. Assuming you get the algorithms right, then you can take the two to the power of the entropy of input in bits (the upper limit for this is the number of different possible inputs), multiply by the amount of time the key setup procedure takes, divide by how much more computing power the attacker has and get the time the attacker has to take to break your encryption by brute force.
For example, something like the 9 cell figure unlock method of android might get you around 16 bits of entropy. Lets assume you use 5 seconds of CPU time to calculate the encryption key. Then with an average PC, it takes 5*2**16/20 seconds, or about 4.5 hours to crack. Any loss of entropy in the input or inefficiency in the key-setup and encryption will quickly take that down to minutes, not to mention if clusters of computers are used.
To be frank, that won't be much better than just storing the file in an obscure file format and hoping no one figures it out