Preventing generation of swastika-like images when generating identicons - algorithm

I am using this PHP script to generate identicons. It uses Don Park's original identicon algorithm.
The script works great and I have adapted it to my own application to generate identicons. The problem is that sometimes swastikas are generated. While swastikas have peaceful origins, people do take offence when seeing those symbols.
What I would like to do is to alter the algorithm so that swastikas are never generated. I have done a bit of digging and found this thread on Microsoft's website where an employee states that they have added a tweak to prevent generation of swastikas, but nothing more.
Has anyone identified what the tweak would be and how to prevent swastikas from being generated?

Identicons appear to me (on a quick glance) always to have four-fold rotational symmetry. Swastikas certainly do. How about just repeating the quarter-block in a different way? If you take a quarter-block that would produce a swastika in the current pattern, and reflect two diagonally-opposite quarters, then you get a sort of space invader.
Basically, nothing with reflectional symmetry can look very much like a swastika. I suppose if there's a small swastika entirely contained within the quarter, then you still have a problem.

On Jeff Atwood's introducing thread, Don Park suggested:
Re Swastika comments, that can be addressed by applying a specialized OCR-like visual analysis to identify all offending codes then crunch them into an effective bloom filter using genetic algorithm. When the filter returns true, a second type of identicon (i.e. 4-block quilt) can be used.
Alternatively, you could avoid the issue entirely by replacing identicons with unicorns.

My original suggestion involving visual analysis was in context of the particular algorithm in use, namely 9-block quilt.
If you want to try another algorithm without Swastika problem, try introducing symmetry like one seen in inkblots to popular 16-block quilt Identicons.

Related

My Algorithm only fails for large values - How do I debug this?

I'm working on transcribing as3delaunay to Objective-C. For the most part, the entire algorithm works and creates graphs exactly as they should be. However, for large values (thousands of points), the algorithm mostly works, but creates some incorrect graphs.
I've been going back through and checking the most obvious places for error, and I haven't been able to actually find anything. For smaller values I ran the output of the original algorithm and placed it into JSON files. I then read that output in to my own tests (tests with 3 or 4 points only), and debugged until the output matched; I checked the output of the two algorithms line for line, and found the discrepancies. But I can't feasibly do that for 1000 points.
Answers don't need to be specific to my situation (although suggesting tools I can use would be excellent).
How can I debug algorithms that only fail for large values?
If you are transcribing an existing algorithm to Objective-C, do you have a working original in some other language? In that case, I would be inclined to put in print statements in both versions and debug the first discrepancy (the first, because later discrepancies could be knock-on errors).
I think it is very likely that the program also makes mistakes for smaller graphs, but more rarely. My first step would in fact be to use the working original (or some other means) to run a large number of automatically checked test runs on small graphs, hoping to find the bug on some more manageable input size.
Find the threshold
If it works for 3 or 4 items, but not for 1000, then there's probably some threshold in between. Use a binary search to find that threshold.
The threshold itself may be a clue. For example, maybe it corresponds to a magic value in the algorithm or to some other value you wouldn't expect to be correlated. For example, perhaps it's a problem when the number of items exceeds the number of pixels in the x direction of the chart you're trying to draw. The clue might be enough to help you solve the problem. If not, it may give you a clue as to how to force the problem to happen with a smaller value (e.g., debug it with a very narrow chart area).
The threshold may be smaller than you think, and may be directly debuggable.
If the threshold is a big value, like 1000. Perhaps you can set a conditional breakpoint to skip right to iteration 999, and then single-step from there.
There may not be a definite threshold, which suggests that it's not the magnitude of the input size, but some other property you should be looking at (e.g., powers of 10 don't work, but everything else does).
Decompose the problem and write unit tests
This can be tedious but is often extremely valuable--not just for the current issue, but for the future. Convince yourself that each individual piece works in isolation.
Re-visit recent changes
If it used to work and now it doesn't, look at the most recent changes first. Source control tools are very useful in helping you remember what has changed recently.
Remove code and add it back piece by piece
Comment out as much code as you can and still get some kind of reasonable output (even if that output doesn't meet all the requirements). For example, instead of using a complicated rounding function, just truncate values. Comment out code that adds decorative touches. Put assert(false) in any special case handlers you don't think should be activated for the test data.
Now verify that output, and slowly add back the functionality you removed, one baby step at a time. Test thoroughly at each step.
Profile the code
Profiling is usually for optimization, but it can sometimes give you insight into code, especially when the data size is too large for single-stepping through the debugger. I like to use line or statement counts. Is the loop body executing the number of times you expect? Or twice as often? Or not at all? How about the then and else clauses of those if statements? Logic bugs often become very obvious with this type of profiling.

Search space data

I was wondering if anyone knew of a source which provides 2D model search spaces to test a GA against. I believe i read a while ago that there are a bunch of standard search spaces which are typically used when evaluating these type of algorithms.
If not, is it just a case of randomly generating this data yourself each time?
Edit: View from above and from the side.
The search space is completely dependent on your problem. The idea of a genetic algorithm being that modify the "genome" of a population of individuals to create the next generation, measure the fitness of the new generation and modify the genomes again with some randomness thrown is to try to prevent getting stuck in local minima. The search space however is completely determined by what you have in your genome, which in turn in completely determined by what the problem is.
There might be standard search spaces (i.e. genomes) that have been found to work well for particular problems (I haven't heard of any) but usually the hardest part in using GAs is defining what you have in your genome and how it is allowed to mutate. The usefulness comes from the fact that you don't have to explicitly declare all the values for the different variables for the model, but you can find good values (not necessarily the best ones though) using a more or less blind search.
EXAMPLE
One example used quite heavily is the evolved radio antenna (Wikipedia). The aim is to find a configuration for a radio antenna such that the antenna itself is as small and lightweight as possible, with the restriction that is has to respond to certain frequencies and have low noise and so on.
So you would build your genome specifying
the number of wires to use
the number of bends in each wire
the angle of each bend
maybe the distance of each bend from the base
(something else, I don't know what)
run your GA, see what comes out the other end, analyse why it didn't work. GAs have a habit of producing results you didn't expect because of bugs in the simulation. Anyhow, you discover that maybe the genome has to encode the number of bends individually for each of the wires in the antenna, meaning that the antenna isn't going to be symmetric. So you put that in your genome and run the thing again. Simulating stuff that needs to work in the physical world is usually the most expensive because at some point you have to test the indivudal(s) in the real world.
There's a reasonable tutorial of genetic algorithms here with some useful examples about different encoding schemes for the genome.
One final point, when people say that GAs are simple and easy to implement, they mean that the framework around the GA (generating a new population, evaluating fitness etc.) is simple. What usually is not said is that setting up a GA for a real problem is very difficult and usually requires a lot of trial and error because coming up with an encoding scheme that works well is not simple for complex problems. The best way to start is to start simple and make things more complex as you go along. You can of course make another GA to come with the encoding for first GA :).
There are several standard benchmark problems out there.
BBOB (Black Box Optimization Benchmarks) -- have been used in recent years as part of a continuous optimization competition
DeJong functions -- pretty old, and really too easy for most practical purposes these days. Useful for debugging perhaps.
ZDT/DTLZ multiobjective functions -- multi-objective optimization problems, but you could scalarize them yourself I suppose.
Many others

Flow Free Like Random Level Generation with only one possible solution?

I've implemented the algorithms marked as the correct answer in this question: What to use for flow free-like game random level creation?
However, using that method will create boards that may have multiple solutions. I was wondering if there is any simple restrictions or modification that can be made to the algorithm to make sure that there is only one possible solution?
Creating unique Numberlink/Flow Free is very difficult. If you look at my algorithm proposal in the mentioned thread, you'll find an algorithm that lets you create puzzles with the necessary condition that solutions must not have a 2x2 square of the same color. The discussion at http://forum.ukpuzzles.org/viewtopic.php?f=3&t=41, however, shows that this is insufficient, since there are also many non-trivial non-unique puzzles.
From my looking into this problem, it seems the only way to solve this problem is to have a separate algorithm for testing uniqueness, and discarding bad instances. One solver that's made precisely for uniqueness testing algorithm is Imo's solver.
Another option is to use multiple different solvers and check that they come up with the same solution.
I think you should implement the solver, which finds all the solutions for some level. The simplest way is backtracking.
When you have many levels, take one by one and look for solutions. As soon as you find the second solution for some level, throw that level away.

Is there an algorithm for positioning nodes on a link chart?

I'm a member of a small but fairly sociable online forum, and just for fun we've been plotting a chart of who's met who in real life. Here's what it looked like fairly recently.
(The colour is the "distance" from the currently-selected user, e.g., yellow is someone who's met someone who's met them. And no, I'm not Zak.) Apologies for the faded lines, they don't seem to have weathered the SO upload process very well.
It's generated as SVG, with a big block of JSON defining who's met who. The position (x,y) of each member on the chart is hard-coded into that JSON. Until now, it's been fairly easy to cope when someone meets someone else - at worst, maybe two or three people need to be shuffled around - but it does involve editing the co-ordinates manually. And now that the European and North American contingents are meeting up, and a few on the periphery are showing up at meets, all hell is breaking loose...
We can put some effort into making all the nodes draggable, which would make the job of re-arranging a bit less tiresome. But it seems more sensible to let the computer take care of positioning them, especially as the problem will only get harder with more members.
So, does anyone know of an algorithm for positioning these nodes on the chart, based on which other nodes they're linked with?
Ideally, it would
minimise or avoid long links
avoid having lines run underneath unrelated nodes
take account of the fact that well-connected nodes are bigger
do its best to show the wider "all these guys met each other" relationships (the big circle at the bottom is largely the result of one meet, for example, though the chart has no idea of when any two people met)
but if it gets us close enough to tweak it, that's progress.
And, what's the real name for these charts? I believe they're called "link charts", but I'm not getting good results from Google using that name or anything else I can think of.
We'll likely be implementing this in PHP or Javascript, but right now it's how to begin approaching the problem that's the bigger question.
Edit: Some great answers coming already. I would be very interested in the actual algorithm(s) used, though, as well as tools that do the job.
What you are looking for are f.e. force-based algorithms. There are quite a few libraries, and some have been named already, like prefuse, yWorks. Here a few more: jung, gvf, jGraph.
The real name for it is "graph". To generate graph, and have a good layout algorithm, the best is to use a software which will do the job.
I advise you to use Gephi.
This soft is able to do all the things you want to.
Have a look at the yWorks tools.
You can google for graph visualization. There are more libraries for this, including GraphViz, but probably not all your requirements will be met.
If you can deal w/ Java, take a look at prefuse.
Have a look at NodeXL
Also, this book may be relevant.

How to design an approximate solution algorithm

I want to write an algorithm that can take parts of a picture and match them to another picture of the same object.
For example, If I gave the computer a picture of a vase and a picture of a scene with the vase in it, I'd expect it to determine where in the image the vase is.
How would I begin to develop an algorithm like this?
The final usage for this algorithm will be an application that for example with a picture of somebody's face could tell if they were in a crowd of people. This algorithm would eventually be applied to video streams.
edit: I'm not expecting an actual solution to this problem as I don't hope to solve it anytime soon. The real question was how do you define something like this to a computer so that you could make an algorithm to do it.
Thanks
A former teacher of mine wrote his doctorate thesis on a similar sort of problem, except his input was a detailed 3D model of something, which he would use to find that object in 2D images. This is a VERY non-trivial problem, there is no single 'answer', certainly nothing that would fit the Stack Overflow format.
My best answer: gather a ton of money and hire a very experienced programmer.
Best of luck to you.
The first problem you describe and the second are both quite different.
A major part of each is solved by the numerous machine vision libraries available. You may need a combination of techniques to achieve any success at either task.
In the first one, you would need something that generically recognizes objects. Probably i'd use a number of algorithms in concert to identify the foreground object in the model image and then do some kind of weighted comparison of the partitioned target image.
In the second case, examining faces, is a much more difficult problem relative to the general recognizer above. Faces all look the same, or nearly so. The things that a general recognizer would notice aren't likely to be good for differentiating faces. You need an algorithm already tuned to facial recognition. Fortunately this is a rapidly maturing field and you can probably do this as well as the first case, but with a different set of functions.
The simple answer is, find a mathematical way to describe faces, that can account for angles and partial missing data, then refine and teach it.
Apparently apple has done something like this, however, it still makes mistakes and has to be taught as it moves forward.
I expect it will be more about the math, than about the programming.
I think you will find this to be quite a challenge. This is an extremely difficult problem and is one of the many areas of computing that fall under the domain of artificial intelligence (AI). Facial recognition would certainly be the most popular variant of this problem and in spite of what you may read in the media, any claimed success are not what they are made out to be. I think the closest solutions involve neural nets and they require very clear and carefully selected images usually.
You could try reading here though. Good luck!

Resources