What's the difference of Petri Nets and Finite State Machines? - controls

Both of them represent the different states a system can take. So what is the difference of Petri Nets and Finite State Machines? When do I use Petri Nets, and when do I use Finite State Machines?

Standard finite state machine contain only a single current state. Whereas in Petri nets multiple locations, more or less comparable with states in a finite state machine, can contain one or more tokens. A finite state machine is single threaded while a Petri net is concurrent.
In a finite state machine the active state changes in response to an event. In a Petri net transitions are executed as soon as all input locations contain at least one token.
A finite state machine can be considered as a special case of a Petri net.
In general I would recommend using a finite state machine if your process, or the part you wish to represent, is single threaded: fellow software engineers are probably more familiar with finite state machines; and there are more tools to convert a finite state machine to an implementation.
Use a Petri net only when you need the concurrency or extra expressivity. Or when you are modeling a factory plant where half fabricates are transformed into products or when your audience is more familiar with this image.
Perhaps Petri nets can also be used to model, visualize running, massive concurrent systems such as micro service architectures, azure service fabric reliable services and reliable actors, services running on kubernetus, azure function, and AWS Lambda.
In addition, there is more theoretical research about, and using, Petri nets than there is about finite state machines (note that, as I said earlier, finite state machines are reducible to Petri nets).

In State Machines, the state is global. Given two states, all you can say is "these states are different". In Petri Nets, the state is structured by places. The state is a marking, which says how many tokens are in each place. Given two markings, you can compare them and say "they are the same in places X,Y,Z but differ in places U,V,W".
When defining an FSM, you have to look at each state individually and determine the possible transitions to other states. Each transition in a Petri Net represents a whole group of transitions in the underlying reachability graph. For example, a Petri Net transition might say: From every marking that has a token in P1 and a token in P2, this model can reach a marking that has one token less in P1, and one token less in P2, but one token more in P3. If the reachability graph has 8, or 800, markings with that property, the single Petri Net transition represents 8, or 800, transitions in the reachability graph.
In Petri Net models, you can create transition invariants. Those are cycles in the reachability graph. Then you can put more tokens into the initial marking of the model, and the number of states in the reachability graph explodes. Its structure is still given by the same cycles as in a model with less tokens though, and the Petri Net model remains understandable.
For example, think of a Client/Server system. You have places for the Clients, places for the Servers, places for the messages flowing back and forth. Then you just put in tokens for the numbers of Clients and Servers you want to model. They are easily changed.
As for when to use what, I agree with Kasper van den Berg.
If you have a problem that's small enough to be handled with an FSM, then use an FSM. Maybe up to two dozen states?
If you have a problem that naturally maps to an FSM, then use an FSM. You'll probably use an algorithm to construct the FSM in such cases. For example, parsing input with regular expressions. (Btw, many regular expression libraries have extensions that require at least a stack machine for processing.)
If you need to create a model for distinguishable subsystems that interact with eachother, use Petri Nets. Then you can have a set of places for each subsystem, whereas an FSM would require you to create a new state for every possible combination of every substate in each subsystem.

Related

Petri nets modelisation

I am face a problem that took me a lot of time and I do not resolve it yet, the problem is how does modelisation look like with Petri nets for an application writen in python? and if there are any exemple of couple of code and Petri nets representation(modelisation) please show me, so thank's for all of you.
I know that Petri nets compose of arrows, states(places) and transitions(events)
You cannot just translate any random program - no matter what language - into a Petri net model. Models are abstractions, so you have to decide which states of your program are important enough to become part of the model. Then you have to figure out how to represent these states with tokens in places. Then you have to figure out how to describe the state changes with transitions.
Does this sound all too vague to you? That's because your question is too broad. Your Python application could be a text editor, an HTTP server, a particle simulator, a chess game. We cannot help you to create a model without knowing what you're trying to model.
It is possible to model the logic of an application as a Petri Net even if you already encoded the application logic in a programming language. It is also possible to create a computer program (e.g. JavaScript) based on a Petri Net model of the application logic.
One way to model an application program as a Petri Net would be to consider the variables of the program as marks of places, weights of inputs or weights of outputs; and computations as logic annotations of inputs, outputs and transitions.
Roland Weber asked a very good question "which states of your program are important enough to become part of the model?" If you consider every variable and every computation as important then you might end up with a model that is too large for this exercise. Thus consider a part of the application program that is small enough for this exercise.
"A Petri Net Model for the Euclidean Algorithm" explains the relations between the variables of a function for the greatest commond divisor with the elements of a Petri Net using a dynamic and interactive diagram and discusses the various annotations associated with the Petri Net model.

Representing parallel operations in flowcharts

I need to do a flowchart of a hydraulic system featuring a temperature regulation module. However, temperature regulation is only activated during one part of the cycle. During this part, the system continues to perform other operations.
I want to represent that in my flowchart diagram. I need a way to tell when the parallel algorithm begins and when it ends within the main flowchart. How do I do that ?
Add two new flowchart nodes/operators fork and join.
Fork takes one control flow coming in, and produces two going out, similar to the diamond decision node in regular flowcharts. Unlike the decision node, the fork node passes control to both of its children, meaning that "parallel execution starts here".
Join takes two control flows in, and produces one control flow out. It means,
"wait for execution on both input branches to complete, then pass control to the output branch". This means "parallel execution stops here".
Now your flowchart can have serial and parallel sections; you can even have the parallel sections generate additional, nested parallelism.
Now you need a graphical representation of fork and join.
We represent the control flow graph of real computer programs with essentially a flowchart ("actions" and "decisions"). Because of the similarity of "fork" to "decision" (one input, two outputs) we chose to draw "fork" as an upward facing triangle (one input at the top, two outputs at the triangle base endpoints), and "join" as a downward facing triangle, with two inputs at the triangle base endpoints and one output at the peak of the downward facing triangle.
You can see automatically generated examples of this for C and COBOL programs. You might not think these contain parallelism... but, in fact, the langauge semantics of many languages is often unspecified, allowing parallel execution in effect (C function arguments according to the standard can be evaluated in arbitrary order, for example). We model this nondeterminism as "parallelism" because the net effect is the same.
People building industrial control software also need to express this. They
simply split the control flow line going from one flow graph item to another. See Sequential function charts in this document.
The most general "control flow" notation I know are Colored Petri Nets. These can model not just control flow, but data flow, synchronization, and arithmetic, which means you can model systems with very complex flows. You can model a regular flowchart directly with a CPN. CPNs also generalize finite state machines. What that means for programmers, is that if you don't know about CPNs, you should learn about these now. And you discover that "flowcharts" (as CPNs) are now useful again in discussing systems whose parts run asynchronously.

How to preserve "building blocks" of neural net?

I am making a kind of neural network, with neurons and "synapses". It kind of resembles turings type b nets, connections can go anywhere. It starts with a randomly generated net that has random connections between the neurons. There are both electrical and chemical variant with different effects on the neurons. To the point:
A net is basically a series of neurons with connections to other neurons. I cant figure out how to do "crossover" to form new generations of nets based on the best performing parents. More specifically, if I combine them based on single connections, I will break any potential "structure" or function that may have formed from a certain set of neurons and connections.
I considered splitting the network map, say, taking half from one parent and half from the other, but that may still break any potential functions that may have been created.
It is higly likely that I am missing something, I am learning this as I go.
Is there some way of doing this?
If you are evolving the network structure and weights, there is an excellent algorithm called NEAT.
If you are evolving the weights only, you have several possibilites, but the most basic one is use the weight matrix of the network graph as a genotype. Then, crossover can be done using any continuous GA crossover method, like SBX or BLX-alpha.
The problem of breaking functionality (most often by mutation) is common and can be solved by e.g. fitness sharing (NEAT uses it) or some other mechanism which protects modified individuals for certain amount of time.

Neural Network Basics

I'm a computer science student and for this years project, I need to create and apply a Genetic Algorithm to something. I think Neural Networks would be a good thing to apply it to, but I'm having trouble understanding them. I fully understand the concepts but none of the websites out there really explain the following which is blocking my understanding:
How the decision is made for how many nodes there are.
What the nodes actually represent and do.
What part the weights and bias actually play in classification.
Could someone please shed some light on this for me?
Also, I'd really appreciate it if you have any similar ideas for what I could apply a GA to.
Thanks very much! :)
Your question is quite complex and I don't think a small answer will fully satisfy you. Let me try, nonetheless.
First of all, there must be at least three layers in your neural network (assuming a simple feedforward one). The first is the input layer and there will be one neuron per input. The third layer is the output one and there will be one neuron per output value (if you are classifying, there might be more than one f you want to assign a "belong to" meaning to each neuron).. The remaining layer is the hidden one, which will stand between the input and output. Determining its size is a complex task as you can see in the following references:
comp.ai faq
a post on stack exchange
Nevertheless, the best way to proceed would be for you to state your problem more clearly (as weel as industrial secrecy might allow) and let us think a little more on your context.
The number of input and output nodes is determined by the number of inputs and outputs you have. The number of intermediate nodes is up to you. There is no "right" number.
Imagine a simple network: inputs( age, sex, country, married ) outputs( chance of death this year ). Your network might have a 2 "hidden values", one depending on age and sex, the other depending on country and married. You put weights on each. For example, Hidden1 = age * weight1 + sex * weight2. Hidden2 = country * weight3 + married * weight4. You then make another set of weights, Hidden3 and Hidden4 connecting to the output variable.
Then you get a data from, say the census, and run through your neural network to find out what weights best match the data. You can use genetic algorithms to test different sets of weights. This is useful if you have so many edges you could not try every possible weighting. You need to find good weights without exhaustively trying every possible set of weights, so GA lets you "evolve" a good set of weights.
Then you test your weights on data from a different census to see how well it worked.
... my major barrier to understanding this though is understanding how the hidden layer actually works; I don't really understand how a neuron functions and what the weights are for...
Every node in the middle layer is a "feature detector" -- it will (hopefully) "light up" (i.e., be strongly activated) in response to some important feature in the input. The weights are what emphasize an aspect of the previous layer; that is, the set of input weights to a neuron correspond to what nodes in the previous layer are important for that feature.
If a weight connecting myInputNode to myMiddleLayerNode is 0, then you can tell that myInputNode is not important to whatever feature myMiddleLayerNode is detecting. If, though, the weight connecting myInputNode to myMiddleLayerNode is very large (either positive or negative), you know that myInputNode is quite important (if it's very negative it means "No, this feature is almost certainly not there", while if it's very positive it means "Yes, this feature is almost certainly there").
So a corollary of this is that you want the number of your middle-layer nodes to have a correspondence to how many features are needed to classify the input: too few middle-layer nodes and it will be hard to converge during training (since every middle-layer node will have to "double up" on its feature-detection) while too many middle-layer nodes may over-fit your data.
So... a possible use of a genetic algorithm would be to design the architecture of your network! That is, use a GA to set the number of middle-layer nodes and initial weights. Some instances of the population will converge faster and be more robust -- these could be selected for future generations. (Personally, I've never felt this was a great use of GAs since I think it's often faster just to trial-and-error your way into a decent NN architecture, but using GAs this way is not uncommon.)
You might find this wikipedia page on NeuroEvolution of Augmenting Topologies (NEAT) interesting. NEAT is one example of applying genetic algorithms to create the neural network topology.
The best way to explain an Artificial Neural Network (ANN) is to provide the biological process that it attempts to simulate - a neural network. The best example of one is the human brain. So how does the brain work (highly simplified for CS)?
The functional unit (for our purposes) of the brain is the neuron. It is a potential accumulator and "disperser". What that means is that after a certain amount of electric potential (think filling a balloon with air) has been reached, it "fires" (balloon pops). It fires electric signals down any connections it has.
How are neurons connected? Synapses. These synapses can have various weights (in real life due to stronger/weaker synapses from thicker/thinner connections). These weights allow a certain amount of a fired signal to pass through.
You thus have a large collection of neurons connected by synapses - the base representation for your ANN. Note that the input/output structures described by the others are an artifact of the type of problem to which ANNs are applied. Theoretically, any neuron can accept input as well. It serves little purpose in computational tasks however.
So now on to ANNs.
NEURONS: Neurons in an ANN are very similar to their biological counterpart. They are modeled either as step functions (that signal out "1" after a certain combined input signal, or "0" at all other times), or slightly more sophisticated firing sequences (arctan, sigmoid, etc) that produce a continuous output, though scaled similarly to a step. This is closer to the biological reality.
SYNAPSES: These are extremely simple in ANNs - just weights describing the connections between Neurons. Used simply to weight the neurons that are connected to the current one, but still play a crucial role: synapses are the cause of the network's output. To clarify, the training of an ANN with a set structure and neuron activation function is simply the modification of the synapse weights. That is it. No other change is made in going from a a "dumb" net to one that produces accurate results.
STRUCTURE:
There is no "correct" structure for a neural network. The structures are either
a) chosen by hand, or
b) allowed to grow as a result of learning algorithms (a la Cascade-Correlation Networks).
Assuming the hand-picked structure, these are actually chosen through careful analysis of the problem and expected solution. Too few "hidden" neurons/layers, and you structure is not complex enough to approximate a complex function. Too many, and your training time rapidly grows unwieldy. For this reason, the selection of inputs ("features") and the structure of a neural net are, IMO, 99% of the problem. The training and usage of ANNs is trivial in comparison.
To now address your GA concern, it is one of many, many efforts used to train the network by modifying the synapse weights. Why? because in the end, a neural network's output is simply an extremely high-order surface in N dimensions. ANY surface optimization technique can be use to solve the weights, and GA are one such technique. The simple backpropagation method is alikened to a dimension-reduced gradient-based optimization technique.

Machine Learning Algorithm for Peer-to-Peer Nodes

I want to apply machine learning to a classification problem in a parallel environment. Several independent nodes, each with multiple on/off sensors, can communicate their sensor data with the goal of classifying an event as defined by a heuristic, training data or both.
Each peer will be measuring the same data from their unique perspective and will attempt to classify the result while taking into account that any neighbouring node (or its sensors or just the connection to the node) could be faulty. Nodes should function as equal peers and determine the most likely classification by communicating their results.
Ultimately each node should make a decision based on their own sensor data and their peers' data. If it matters, false positives are OK for certain classifications (albeit undesirable) but false negatives would be totally unacceptable.
Given that each final classification will receive good or bad feedback, what would be an appropriate machine learning algorithm to approach this problem with if the nodes could communicate with each other to determine the most likely classification?
If the sensor data in each individual node is generally sufficient to make a reasonable decision, they could just communicate the result and take a majority vote. If majority vote is not appropriate, you could train an additional classifier that uses the outputs of the nodes as its feature vector.
Since you want to have on-line supervised learning with feedback, you could use a neural network with backpropagation or an incremental support vector machine that adds the errors to the training set. Look into classifier biasing to deal with false-positive/false-negative trade-off.
In this instance, a neural network could be very appropriate. The inputs to the network would be each of the sensors onboard the node, along with that of its neighbors. You would calculate weights based on your feedback.
Another option (that is simpler, but can achieve good results as well) is a Gossip Algorithm. You would have to look into incorporating feedback though.

Resources