What's the best way to implement real time operant conditioning (supervised reward/punishment-based learning) for an agent? Should I use a neural network (and what type)? Or something else?
I want the agent to be able to be trained to follow commands like a dog. The commands would be in the form of gestures on a touchscreen. I want the agent to be able to be trained to follow a path (in continuous 2D space), make behavioral changes on command (modeled by FSM state transitions), and perform sequences of actions.
The agent would be in a simulated physical environment.
Reinforcement Learning is a good machine learning algorithm for your problem.
The basic reinforcement learning model consists of:
a set of environment states S (you have a 2d space discretized in some way, which is the dog's current position, if you want to do continuous 2d-space, you might need a neural network to serve as the value function mapper.)
a set of actions A ( you mentioned the dog performs sequences of actions, e.g., move, rotate)
rules of transitioning between states ( your dog's position transition can be modeled by FSM)
rules that determine the scalar immediate reward r of a transition (When reaching the target position, you might want to give the dog a big reward, while small rewards are also welcomed at intermediate milestones)
rules that describe what the agent observes. (the dog might have a limited view, for example, only the 4 or 8 neighboring cells are viewable, below figure is an example showing the dog's current position P and the 4 neighboring cells that are viewable to the dog.)
To find the optimal policy, you can start with the model-free technique - q-learning.
Related
let me start saying that I'm new to neural networks, Machine Learning etc, and so far I have just made few very simple experiments to learn so please be patient with me also If I ask very naive or long questions.
My favorite coding language is java and I'm looking at playing with Weka. To play with this API that looks like to me
being very clear and complete, this time, to have something more than a set of ideal data to train and check the success rate, I started not from just the software but from a real world problem that I created for myself.
The problem I created and that I'd like to solve with neural networks, is a four legged spider shaped robot controlled by a Raspberry PI with some ADC and servo Hats. This sort of strange robot has 4 legs, each leg being made
of 3 parts, each part is moved by a servo motor. In total I have so 4 legs * 3 leg parts = 12 servo motors. On each servo motor I have a 3 axis analog accellerometer attached (12 in total) I can read from the Raspberry PI.
From each of these accellerometers I read 2 axis to determine the position of each servo, that is the position of each leg segment for each leg. In addition, the "sole of the foot" of each of the 4 legs, has a button to
determine if the leg has reached the floor and is sustaining the spider. The spider construction blog is here, for those interested: https://thestrangespider.blogspot.com/
The purpose of this experiment is to make the spider able to put itself in balance and set itself horizontally starting from any condition.
The spider should be able in few words, to level horizontally its body regardless of the fact that I have put the spider on a horizontal or oblique surface. The hardware platform is ready, I miss few details but let's assume
I can read all the signals I need from a java Spring boot application using the PI4J APIs to interface hardware (24 values from ADC for 12 servos and 4 digital inputs (true/false) from the buttons on
the soles of the spider feets). The intent it to solve the problem of moving the legs servo motors using neural networks built with Weka, reading the various input signals until the system reaches the
success condition (static balance and body in a horizontal position). The main problem is how to use all the data I have to build a data set in the best way, what neural networks to use to put in place
the adaptive corrective feedback, until the spider reaches the success condition of having its body finally horizontal and in static balance.
Going deeper into this topic, let me provide my analysis.
Each leg should perform this steps:
Can move independently one servo per time starting from a random position (in the range of the angle allowed by the leg portion)
Re-read after each move, all the legs segments position, waiting for the next static condition
Determine if last move has brought an advantage or a disadvantage to the system as a whole.
If no change happened because of the last move, keep the latest change/move and continue
If any max angle has been reached, change the direction for that angle next move
Check if system success is closer now respect to previous step and determine next action type
The problem to me is a mix of open questions where one is how to represent this system from a data perspective:
Data coming from acceletometers all belonging from servos in the same legs are candidates for a cluster of data or a robot sub-system ?
Can each servo and its position measurement be a subsystem of the main system instead ?
Can these be multiple problems instead of one really ?
How would you approach this problem from a neural network perspective ?
after studying Weka classification and regression, I finally got to the conclusion that the processing should be a regression problem. That ended as well in a too complex problem that would have need regression with multi label output that apart from Weka would have required another framework on top of that called Meka.
Finally I got to a conclusion that now I will try that is, using first Weka to classify data and detect first on which length the inclination is, the solve the single specific problem of acting on that leg eventually with a different Weka model working only on a single leg per time.
Problem resolution so, will have these steps:
Use a legs Weka classification model first to detect which of the four legs has the higher inclination.
Use a leg specific Weka classification to correct for the leg detected above in first step, the actions to be taken by the three leg segments in order to correct the problem.
Stefano
My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the
median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).
Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:
The problem is that I don't know which method to use.
There are several approaches of which I thought of:
sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both
(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)
using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)
What would you suggest? Are there better/simpler methods to get this task done?
I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)
Thank you a lot and best regards.
The key to evaluating your heuristic is to develop a model of the behaviour of the system.
For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?
What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?
Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.
Edit:
Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.
For example, suppose:
The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.
You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.
If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.
Edit:
The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.
from your description a correctly parameterized moving average might be sufficient
Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)
some other tips
you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease
I have to implement, as part of an assignment, a learning method that enables minesweepers to avoid colliding with mines. I have been given the choice to choose between a supervised/unsupervised/reinforcement learning algorithm.
I remember in one of my lectures, the lecturer mentioned ALVIN. He was teaching artificial neural networks.
Since the behaviour I'm looking for is more or less similar to ALVINN's, I want to implement an ANN. I've implemented an ANN before for solving the 3-parity xor problem, here's my solution. I've never really understood the intuition behind ANNs.
I was wondering, what could the inputs be to my ANN? In the case of the 3parity xor problem it was obvious.
When it comes to frameworks for ANN, each person will have their own preferences. I recently used Encog framework for implementing an image processing project and found it very easy to implement.
Now, coming to your problem statement, "a learning method that enables minesweepers to avoid colliding with mines" is a very wide scope. What is indeed going to be your input to the ANN? You will have to decide your input based on whether it is going to be implemented on a real robot or in a simulation environment.
It can be clearly inferred that an unsupervised learning can be ruled out if you are trying to implement something like the ALVIN.
In a simulation environment, the best option is if you can somehow form a grid map of the environment based on the simulated sensor data. Then the occupancy grid surrounding the robot can form a good input to the robot's ANN.
If you can't form a grid map (if the data is insufficient), then you should try to feed all the available and relevant sensor data to the ANN. However, they might have to be pre-processed, depending on the modelled sensor noise given by your simulation environment. If you have a camera feed (like the ALVIN model), then you may directly follow their footsteps and train your ANN likewise.
If this is a real robot, then the choices vary considerably, depending upon the robustness and accuracy requirements. I really hope you do not want to build a robust and field-ready minesweeper single-handedly. :) For a small, controlled environment, your options will be very similar to that of a simulated environment, however sensor noise would be nastier and you would have to figure in various special cases into your mission planner. Still, it would be advisable to fuse a few other sensors (LRF, ultrasound etc.) with vision sensors and use it as an input to your planner. If nothing else is available, copy paste the ALVIN system with only a front camera input.
The ANN training methodology will be similar (if using only vision). The output will be right/left/straight etc. Try with 5-7 hidden layer nodes first, since that is what ALVIN uses. Increase it up to 8-10 max. Should work. Use activation functions properly.
Given its success in the real world, ALVIN seems like a good system to base yours off of! As the page you linked to discusses, ALVIN essentially receives an image of the road ahead as its input. On a low level, this is achieved through 960 input nodes representing a 30X32 pixel image. The input value for each node is the color saturation of the pixel that that node represents (with 0 being a completely white image and 1 being a completely black image, or something along those lines) (I'm pretty sure the picture is greyscale, although maybe they're using color now, which could be achieved, for instance, by using three input nodes per pixel, one representing red saturation, one representing green, and one blue). Is there a reason that you don't think that this would be a good input for your system too?
For more low level details, see the original paper.
I'm a computer science student and for this years project, I need to create and apply a Genetic Algorithm to something. I think Neural Networks would be a good thing to apply it to, but I'm having trouble understanding them. I fully understand the concepts but none of the websites out there really explain the following which is blocking my understanding:
How the decision is made for how many nodes there are.
What the nodes actually represent and do.
What part the weights and bias actually play in classification.
Could someone please shed some light on this for me?
Also, I'd really appreciate it if you have any similar ideas for what I could apply a GA to.
Thanks very much! :)
Your question is quite complex and I don't think a small answer will fully satisfy you. Let me try, nonetheless.
First of all, there must be at least three layers in your neural network (assuming a simple feedforward one). The first is the input layer and there will be one neuron per input. The third layer is the output one and there will be one neuron per output value (if you are classifying, there might be more than one f you want to assign a "belong to" meaning to each neuron).. The remaining layer is the hidden one, which will stand between the input and output. Determining its size is a complex task as you can see in the following references:
comp.ai faq
a post on stack exchange
Nevertheless, the best way to proceed would be for you to state your problem more clearly (as weel as industrial secrecy might allow) and let us think a little more on your context.
The number of input and output nodes is determined by the number of inputs and outputs you have. The number of intermediate nodes is up to you. There is no "right" number.
Imagine a simple network: inputs( age, sex, country, married ) outputs( chance of death this year ). Your network might have a 2 "hidden values", one depending on age and sex, the other depending on country and married. You put weights on each. For example, Hidden1 = age * weight1 + sex * weight2. Hidden2 = country * weight3 + married * weight4. You then make another set of weights, Hidden3 and Hidden4 connecting to the output variable.
Then you get a data from, say the census, and run through your neural network to find out what weights best match the data. You can use genetic algorithms to test different sets of weights. This is useful if you have so many edges you could not try every possible weighting. You need to find good weights without exhaustively trying every possible set of weights, so GA lets you "evolve" a good set of weights.
Then you test your weights on data from a different census to see how well it worked.
... my major barrier to understanding this though is understanding how the hidden layer actually works; I don't really understand how a neuron functions and what the weights are for...
Every node in the middle layer is a "feature detector" -- it will (hopefully) "light up" (i.e., be strongly activated) in response to some important feature in the input. The weights are what emphasize an aspect of the previous layer; that is, the set of input weights to a neuron correspond to what nodes in the previous layer are important for that feature.
If a weight connecting myInputNode to myMiddleLayerNode is 0, then you can tell that myInputNode is not important to whatever feature myMiddleLayerNode is detecting. If, though, the weight connecting myInputNode to myMiddleLayerNode is very large (either positive or negative), you know that myInputNode is quite important (if it's very negative it means "No, this feature is almost certainly not there", while if it's very positive it means "Yes, this feature is almost certainly there").
So a corollary of this is that you want the number of your middle-layer nodes to have a correspondence to how many features are needed to classify the input: too few middle-layer nodes and it will be hard to converge during training (since every middle-layer node will have to "double up" on its feature-detection) while too many middle-layer nodes may over-fit your data.
So... a possible use of a genetic algorithm would be to design the architecture of your network! That is, use a GA to set the number of middle-layer nodes and initial weights. Some instances of the population will converge faster and be more robust -- these could be selected for future generations. (Personally, I've never felt this was a great use of GAs since I think it's often faster just to trial-and-error your way into a decent NN architecture, but using GAs this way is not uncommon.)
You might find this wikipedia page on NeuroEvolution of Augmenting Topologies (NEAT) interesting. NEAT is one example of applying genetic algorithms to create the neural network topology.
The best way to explain an Artificial Neural Network (ANN) is to provide the biological process that it attempts to simulate - a neural network. The best example of one is the human brain. So how does the brain work (highly simplified for CS)?
The functional unit (for our purposes) of the brain is the neuron. It is a potential accumulator and "disperser". What that means is that after a certain amount of electric potential (think filling a balloon with air) has been reached, it "fires" (balloon pops). It fires electric signals down any connections it has.
How are neurons connected? Synapses. These synapses can have various weights (in real life due to stronger/weaker synapses from thicker/thinner connections). These weights allow a certain amount of a fired signal to pass through.
You thus have a large collection of neurons connected by synapses - the base representation for your ANN. Note that the input/output structures described by the others are an artifact of the type of problem to which ANNs are applied. Theoretically, any neuron can accept input as well. It serves little purpose in computational tasks however.
So now on to ANNs.
NEURONS: Neurons in an ANN are very similar to their biological counterpart. They are modeled either as step functions (that signal out "1" after a certain combined input signal, or "0" at all other times), or slightly more sophisticated firing sequences (arctan, sigmoid, etc) that produce a continuous output, though scaled similarly to a step. This is closer to the biological reality.
SYNAPSES: These are extremely simple in ANNs - just weights describing the connections between Neurons. Used simply to weight the neurons that are connected to the current one, but still play a crucial role: synapses are the cause of the network's output. To clarify, the training of an ANN with a set structure and neuron activation function is simply the modification of the synapse weights. That is it. No other change is made in going from a a "dumb" net to one that produces accurate results.
STRUCTURE:
There is no "correct" structure for a neural network. The structures are either
a) chosen by hand, or
b) allowed to grow as a result of learning algorithms (a la Cascade-Correlation Networks).
Assuming the hand-picked structure, these are actually chosen through careful analysis of the problem and expected solution. Too few "hidden" neurons/layers, and you structure is not complex enough to approximate a complex function. Too many, and your training time rapidly grows unwieldy. For this reason, the selection of inputs ("features") and the structure of a neural net are, IMO, 99% of the problem. The training and usage of ANNs is trivial in comparison.
To now address your GA concern, it is one of many, many efforts used to train the network by modifying the synapse weights. Why? because in the end, a neural network's output is simply an extremely high-order surface in N dimensions. ANY surface optimization technique can be use to solve the weights, and GA are one such technique. The simple backpropagation method is alikened to a dimension-reduced gradient-based optimization technique.
I'm designing a realtime strategy wargame where the AI will be responsible for controlling a large number of units (possibly 1000+) on a large hexagonal map.
A unit has a number of action points which can be expended on movement, attacking enemy units or various special actions (e.g. building new units). For example, a tank with 5 action points could spend 3 on movement then 2 in firing on an enemy within range. Different units have different costs for different actions etc.
Some additional notes:
The output of the AI is a "command" to any given unit
Action points are allocated at the beginning of a time period, but may be spent at any point within the time period (this is to allow for realtime multiplayer games). Hence "do nothing and save action points for later" is a potentially valid tactic (e.g. a gun turret that cannot move waiting for an enemy to come within firing range)
The game is updating in realtime, but the AI can get a consistent snapshot of the game state at any time (thanks to the game state being one of Clojure's persistent data structures)
I'm not expecting "optimal" behaviour, just something that is not obviously stupid and provides reasonable fun/challenge to play against
What can you recommend in terms of specific algorithms/approaches that would allow for the right balance between efficiency and reasonably intelligent behaviour?
If you read Russell and Norvig, you'll find a wealth of algorithms for every purpose, updated to pretty much today's state of the art. That said, I was amazed at how many different problem classes can be successfully approached with Bayesian algorithms.
However, in your case I think it would be a bad idea for each unit to have its own Petri net or inference engine... there's only so much CPU and memory and time available. Hence, a different approach:
While in some ways perhaps a crackpot, Stephen Wolfram has shown that it's possible to program remarkably complex behavior on a basis of very simple rules. He bravely extrapolates from the Game of Life to quantum physics and the entire universe.
Similarly, a lot of research on small robots is focusing on emergent behavior or swarm intelligence. While classic military strategy and practice are strongly based on hierarchies, I think that an army of completely selfless, fearless fighters (as can be found marching in your computer) could be remarkably effective if operating as self-organizing clusters.
This approach would probably fit a little better with Erlang's or Scala's actor-based concurrency model than with Clojure's STM: I think self-organization and actors would go together extremely well. Still, I could envision running through a list of units at each turn, and having each unit evaluating just a small handful of very simple rules to determine its next action. I'd be very interested to hear if you've tried this approach, and how it went!
EDIT
Something else that was on the back of my mind but that slipped out again while I was writing: I think you can get remarkable results from this approach if you combine it with genetic or evolutionary programming; i.e. let your virtual toy soldiers wage war on each other as you sleep, let them encode their strategies and mix, match and mutate their code for those strategies; and let a refereeing program select the more successful warriors.
I've read about some startling successes achieved with these techniques, with units operating in ways we'd never think of. I have heard of AIs working on these principles having had to be intentionally dumbed down in order not to frustrate human opponents.
First you should aim to make your game turn based at some level for the AI (i.e. you can somehow model it turn based even if it may not be entirely turn based, in RTS you may be able to break discrete intervals of time into turns.) Second, you should determine how much information the AI should work with. That is, if the AI is allowed to cheat and know every move of its opponent (thereby making it stronger) or if it should know less or more. Third, you should define a cost function of a state. The idea being that a higher cost means a worse state for the computer to be in. Fourth you need a move generator, generating all valid states the AI can transition to from a given state (this may be homogeneous [state-independent] or heterogeneous [state-dependent].)
The thing is, the cost function will be greatly influenced by what exactly you define the state to be. The more information you encode in the state the better balanced your AI will be but the more difficult it will be for it to perform, as it will have to search exponentially more for every additional state variable you include (in an exhaustive search.)
If you provide a definition of a state and a cost function your problem transforms to a general problem in AI that can be tackled with any algorithm of your choice.
Here is a summary of what I think would work well:
Evolutionary algorithms may work well if you put enough effort into them, but they will add a layer of complexity that will create room for bugs amongst other things that can go wrong. They will also require extreme amounts of tweaking of the fitness function etc. I don't have much experience working with these but if they are anything like neural networks (which I believe they are since both are heuristics inspired by biological models) you will quickly find they are fickle and far from consistent. Most importantly, I doubt they add any benefits over the option I describe in 3.
With the cost function and state defined it would technically be possible for you to apply gradient decent (with the assumption that the state function is differentiable and the domain of the state variables are continuous) however this would probably yield inferior results, since the biggest weakness of gradient descent is getting stuck in local minima. To give an example, this method would be prone to something like attacking the enemy always as soon as possible because there is a non-zero chance of annihilating them. Clearly, this may not be desirable behaviour for a game, however, gradient decent is a greedy method and doesn't know better.
This option would be my most highest recommended one: simulated annealing. Simulated annealing would (IMHO) have all the benefits of 1. without the added complexity while being much more robust than 2. In essence SA is just a random walk amongst the states. So in addition to the cost and states you will have to define a way to randomly transition between states. SA is also not prone to be stuck in local minima, while producing very good results quite consistently. The only tweaking required with SA would be the cooling schedule--which decides how fast SA will converge. The greatest advantage of SA I find is that it is conceptually simple and produces superior results empirically to most other methods I have tried. Information on SA can be found here with a long list of generic implementations at the bottom.
3b. (Edit Added much later) SA and the techniques I listed above are general AI techniques and not really specialized to AI for games. In general, the more specialized the algorithm the more chance it has at performing better. See No Free Lunch Theorem 2. Another extension of 3 is something called parallel tempering which dramatically improves the performance of SA by helping it avoid local optima. Some of the original papers on parallel tempering are quite dated 3, but others have been updated4.
Regardless of what method you choose in the end, its going to be very important to break your problem down into states and a cost function as I said earlier. As a rule of thumb I would start with 20-50 state variables as your state search space is exponential in the number of these variables.
This question is huge in scope. You are basically asking how to write a strategy game.
There are tons of books and online articles for this stuff. I strongly recommend the Game Programming Wisdom series and AI Game Programming Wisdom series. In particular, Section 6 of the first volume of AI Game Programming Wisdom covers general architecture, Section 7 covers decision-making architectures, and Section 8 covers architectures for specific genres (8.2 does the RTS genre).
It's a huge question, and the other answers have pointed out amazing resources to look into.
I've dealt with this problem in the past and found the simple-behavior-manifests-complexly/emergent behavior approach a bit too unwieldy for human design unless approached genetically/evolutionarily.
I ended up instead using abstracted layers of AI, similar to a way armies work in real life. Units would be grouped with nearby units of the same time into squads, which are grouped with nearby squads to create a mini battalion of sorts. More layers could be use here (group battalions in a region, etc.), but ultimately at the top there is the high-level strategic AI.
Each layer can only issue commands to the layers directly below it. The layer below it will then attempt to execute the command with the resources at hand (ie, the layers below that layer).
An example of a command issued to a single unit is "Go here" and "shoot at this target". Higher level commands issued to higher levels would be "secure this location", which that level would process and issue the appropriate commands to the lower levels.
The highest level master AI is responsible for very board strategic decisions, such as "we need more ____ units", or "we should aim to move towards this location".
The army analogy works here; commanders and lieutenants and chain of command.