Insects following the leader - can I implement Boids algorithm for that? - algorithm

I would like to illustrate how insects are following their leader in 2 dimensions.
How can I acomplish that?
Is it possible to do this with Boids algorithm?
Or maybe someone knows another algorithm, designed especially for that reason?

Boids-style algorithms should be fine for this, however you will probably need to tweak the algorithm and experiment a bit before you get something that looks really good. You'll get something like leader/follower behaviours providing you do the following:
Get the "followers" to adjust their heading towards the "leader". Depending on how strong you want the follower effect to be you can make this effect weaker or stronger, or only apply it some of the time etc.
You may choose to either have every bot follow the same leader, or each follow a different leader. If the former, you will get a big flock following a single individual. If the latter, you will tend to get "chains" forming.
You'll probably want the ultimate leader(s) to move relatively independently. Maybe make the leader change heading randomly or even try to head "away" from the centre of the group.

Related

Identify changes in the slope using machine learning

I want to get my hands dirty with some machine learning, and I finally have a problem which seems like a good beginner project. However, despite reading a lot about the subject I am unsure how to get started, and what my basic approach should be.
I have a dataset which should look like this.
a real dataset looks more like this:
I want to identify the points in the red circles (on the first image), and be robust against occasional artifacts like the one in the blue circle.
I sounds like a really easy task. However, the is quite a lot of noise in the raw data. My current implementation is pretty traditional. It blurs the data and compares the first and second derivative to some estimated threshold values. This approach works, but can "only" identify the points with ~99.7% accuracy, but since I do around 100.000 measurements a day I would love to increase this number.
So, this is what I have:
All the datasets I want/need
A pretty good model of how the data should look.
A pretty good training set, using my existing algorithm (the outlines can be fixed manually)
However, I do not have a basic idea how what approach I should use. I feels like none of the material I've read on machine learning fit's this problem.
Can someone help me with the super high level approach to solve this problem?

What Machine Learning algorithm would be appropriate?

I am working on a predictor for learning the most likely period for grape harvesting, depending on weather and on the characteristics of grape, namely sugar level, Ph, acidity. I've got two datasets and I am thinking of how to merge them together: one is the pre-harvest analysis data of some Italian vineyards in the 2003-2013 period, the other is the weather on that decade. What I want to do is learning from my samples when to harvest, given a range for the optimal sugar level, Ph and acidity, and given a weather forecast.
I thought that some Reinforcement Learning approach could work. Since the pre-harvest analysis are done about 5 times during the grape maturation period, I thought that those could be states I step in, while the weather conditions could be the "probabilities" of going from a state to another.
Yet I am not sure of what algorithm would be the best as every state and every "probability" depends on several variables. I was told that Hidden Markov Model would work, but it seems to me that my problem doesn't fit the model perfectly.
Do you have any suggestion? Thx in advance
This has nothing to do with the actual algorithm, but the problem you are going to run into here is that weather is extremely local. One vineyard can have completely different weather than another only a mile away from it, believe or not. If you put rain gauges at each vineyard, you will find this out. To get really good results you need to have a mini weather station at each vineyard. Absent this, your best option is to use only vineyards in the immediate vicinity of the weather measurements. For example, if your data is from an airport, only use vineyards right next to the airport.
Reinforcement learning is appropriate when you can control the action. It is like a monkey pushing buttons. You push a button and get shocked, so you don't push that button again. Here you have a passive data set and cannot conduct experimental actions, so reinforcement learning does not apply.
Here you have a complex set of uncontrolled inputs, the weather data, a controlled input (harvest time), and several output parameters, sugar etc. Given that data, you want to predict what harvest time to use for some future, unknown weather pattern.
In general, what you are doing is sensitivity analysis: trying to figure out how your factors affected the outcome that occurred. The tricky part is that the outcomes may be driven by some non-obvious pattern. For example, maybe 3 weeks of drought, followed by 2 weeks of heavy rain implies the best harvest will be 65 days hence, or something like that.
So, what you have to do is featurize the data to characterize it in possible likely ways, then do a sensitivity analysis. If the analysis has a strong correlation, then you have found a solution. If it does not, then you have to find a different way to featurize the data. For example, your featurization might be number of days with rain over 2 inches, or it might be most number of days without rain, or it might be total number of days with bright sunshine. Possibly multiple features might combine to make a solution. The options are limited only by your imagination.
Of course, as I was saying above, the fly in the ointment is that your weather data will only roughly approximate the real and actual weather at the particular vineyard, so there will be noise in the data, possibly so much noise as to make getting a good result impossible.
Why you actually don't care too much about the weather
Getting back to the data, having unreliable weather information is actually not a problem, because you actually don't care too much about the weather. The reason is two-fold. First of all, the question you are trying to answer is not when to harvest the grapes, it is whether to wait to harvest or not. The vintner can always measure the current sugar of the grapes. So, he just has to decide, "Should I harvest the grapes now with sugar X%, or should I wait and possibly get a better sugar Z% later? To answer this question the real data you need is not the weather, it is a series of sugar/acidity readings taken over time. What you want to predict is whether, given a situation, the grapes will get better or whether they will get worse.
Secondly, grapevines have an optimal amount of moisture they like. If the vine gets too dry, that is bad, if it gets too wet that is bad. You cannot predict how moist a vine is from the weather. Some soils hold moisture well, others are sandy. A sandy vineyard will require more rain than a clay vineyard to have the same moisture levels. Also, the vintner can water his vineyards, completely invalidating the rainfall pattern. Therefore, weather is pretty much a non-factor.
I agree with Tyler that from a feasible standpoint weather might harm your analysis. However, I think this is for you to test and find out!- there could be some interesting data that comes out of it.
I'm not sure exactly what your test is, but a simple way to start perhaps is to make this into a classification problem using svm (or even logistic regression since you want probabilities) and use all the data as the input for the algorithm- assuming you know which years were good harvest years or not. You could even test each variable individually and see how it effects your performance. I suggest you go this way if you can just because there's massive amounts of sources on the net and people here on SO that can help you tune your algo.
When you have a handle on this, I would, as you seem to have been suggested before, try the HMM- as it will tell you which day was probably the best for the harvest. This is where the weather might hurt, but you'll come to understand more about your data from the simpler experiments.
The think I've learned about machine learning is that while there are guidelines for when to choose which algorithm its not always set in stone and you can change your question slightly and try a new approach to the problem, depending how much freedom you have to play with the data. Good luck and have fun!

Algorithm choice for gaining intelligence from messages

What I'm trying to do is find an algorithm that can I can implement to generate 'intelligent' suggestions to people, by comparing messages they send to messages sent by their peers.
For example, Person A sends a message to Person B talking about Obj1. If Person C sends a message to Person D about Obj1, it will notice they are talking about the same things, and may suggest Person A talks to person C.
I have implemented collecting the statistics to capture the mentions people have in common but do not know which algorithm to use to analyse this.
Any suggestions?
(I hope this makes enough sense)
take a look at clustering algorithms
and k-means or
k-nearest neighbours for a quick start
How much data you've got? The more the better.
There are lots of approaches to this problem. You may for example take that all users, to some degree, are similar to each other and what you want to do is to find for each user the most similar ones.Vector space, cosine similarity, will give you quick results.
Give some more information on what you want to achieve.
This is exactly the same problem Twitter is battling with. You might end up with a job there if you crack this ;)
On serious note coming back, one could use some crude measures (i.e. heuristic based) to do something like this, but it has a big error percentage. As delnan said in the comment.
NLP is a sure bet. Note that using NLP too has some error %, but it's far more accurate than any heuristic you would use. If you are using python I would suggest this toolkit, I use it now and then - NLP.
For other languages I am sure there are packages which will help you in this regard.
UPDATE1: If you have a way for the users to tag their messages (like stackoverflow does), you could approach this problem barring NLP. Then you could simply take the intersection of the tags of both the messages to see if there is any commonality & suggest some top items for the common items.
But there are other issues you'll have to deal with - make tags a mandatory, plus you need to be sure that the users are actually entering correct tags etc... But nevertheless this greatly simplifies your problem.
UPDATE2: As the Q has been updated - Since you have some specific keywords/phrases only which you are interested in. This kind of simplifies it. You would need to get each of your message, split it into words, then stem each word. After stemming, intersect this set with the set of keywords you have. You'll get a set(S1). Do the same with the second message, you'll get a set(S2). Intersect S1, S2. If you find something is common, bingo! Some theme is common between message1, message2. else nothing.

Is there an algorithm for positioning nodes on a link chart?

I'm a member of a small but fairly sociable online forum, and just for fun we've been plotting a chart of who's met who in real life. Here's what it looked like fairly recently.
(The colour is the "distance" from the currently-selected user, e.g., yellow is someone who's met someone who's met them. And no, I'm not Zak.) Apologies for the faded lines, they don't seem to have weathered the SO upload process very well.
It's generated as SVG, with a big block of JSON defining who's met who. The position (x,y) of each member on the chart is hard-coded into that JSON. Until now, it's been fairly easy to cope when someone meets someone else - at worst, maybe two or three people need to be shuffled around - but it does involve editing the co-ordinates manually. And now that the European and North American contingents are meeting up, and a few on the periphery are showing up at meets, all hell is breaking loose...
We can put some effort into making all the nodes draggable, which would make the job of re-arranging a bit less tiresome. But it seems more sensible to let the computer take care of positioning them, especially as the problem will only get harder with more members.
So, does anyone know of an algorithm for positioning these nodes on the chart, based on which other nodes they're linked with?
Ideally, it would
minimise or avoid long links
avoid having lines run underneath unrelated nodes
take account of the fact that well-connected nodes are bigger
do its best to show the wider "all these guys met each other" relationships (the big circle at the bottom is largely the result of one meet, for example, though the chart has no idea of when any two people met)
but if it gets us close enough to tweak it, that's progress.
And, what's the real name for these charts? I believe they're called "link charts", but I'm not getting good results from Google using that name or anything else I can think of.
We'll likely be implementing this in PHP or Javascript, but right now it's how to begin approaching the problem that's the bigger question.
Edit: Some great answers coming already. I would be very interested in the actual algorithm(s) used, though, as well as tools that do the job.
What you are looking for are f.e. force-based algorithms. There are quite a few libraries, and some have been named already, like prefuse, yWorks. Here a few more: jung, gvf, jGraph.
The real name for it is "graph". To generate graph, and have a good layout algorithm, the best is to use a software which will do the job.
I advise you to use Gephi.
This soft is able to do all the things you want to.
Have a look at the yWorks tools.
You can google for graph visualization. There are more libraries for this, including GraphViz, but probably not all your requirements will be met.
If you can deal w/ Java, take a look at prefuse.
Have a look at NodeXL
Also, this book may be relevant.

How would you implement a perfect line-of-sight algorithm?

Disclaimer: I'm not actually trying to make one I'm just curious as to how it could be done.
When I say "Most Accurate" I include the basics
wall
distance
light levels
and the more complicated
Dust in Atmosphere
rain, sleet, snow
clouds
vegetation
smoke
fire
If I were to want to program this, what resources should I look into and what things should I watch out for?
Also, are there any relevant books on the theory behind line of sight including all these variables?
I personally don't know too much about this topic but a quick couple of Google searches turns up some formal papers that contain some very relevant information:
http://www.tecgraf.puc-rio.br/publications/artigo_1999_efficient_lineofsight_algorithms.pdf - Provides a detailed description of two different methods of efficiently performing an LOS calculation, along with issues involved
http://www.agc.army.mil/operations/programs/LOS/LOS%20Compendium.doc - This one aims to maintain "a current list of unique LOS algorithms"; it has a section listing quite a few and describing them in detail with a focus on military applications.
Hope this helps!
Typically, one represents the world as a set of volumes of space held in some kind of space partitioning data structure, then intersects the ray representing your "line of sight" with that structure to find the set of objects it hits; these are then walked in order from ray origin to determine the overall result. Reflective objects cause further rays to be fired, opaque objects stop the walk and semitransparent objects partially contribute to the result.
You might like to read up on ray tracing; there is a great body of literature on the subject and well-understood ways of solving what are basically the same problems you list exist.
The obvious question is do you really want the most accurate, and why?
I've worked on games that depended on line of sight and you really need to think clearly about what kind of line of sight you want.
First, can the AI see any part of your body? Or are you talking about "eye to eye" LOS?
Second, if the player's camera view is not his avatar's eye view, the player will not perceive your highly accurate LOS as highly accurate. At which point inaccuracies are fine.
I'm not trying to dissuade you, but remember that player experience is #1, and that might mean not having the best LOS.
A good friend of mine has done the AI for a long=-running series of popular console games. He often tells a story about how the AIs are most interesting (and fun) in the first game, because they stumble into you rather than see you from afar. Now, he has great LOS and spends his time trying to dumb them down to make them as fun as they were in the first game.
So why are you doing this? Does the game need it? Or do you just want the challenge?
There is no "one algorithm" for these since the inputs are not well defined.
If you treat Dust-In-Atmosphere as a constant value then there is an algorithm that can take it into account, but the fact is that dust levels will vary from point to point, and thus the algorithm you want needs to be aware of how your dust-data is structured.
The most used algorithm in todays ray-tracers is just incremental ray-marching, which is by definition not correct, but it does approximate the Ultimate Answer to a fair degree.
Even if you managed to incorporate all these properties into a single master-algorithm, you'd still have to somehow deal with how different people perceive the same setting. Some people are near-sighted, some far-sighted. Then there's the colour-blind. Not to mention that Dust-In-Atmosphere levels also affect tear-glands, which in turn affects visibility. And then there's the whole dichotomy between what people are actually seeying and what they think they are seeying...
There are far too many variables here to aim for a unified solution. Treat your environment as a voxelated space and shoot your rays through it. I suspect that's the only solution you'll be able to complete within a single lifetime...

Resources