Algorithm for detecting combinations - algorithm

I am creating a simple intrusion detection system for an Information Security course using jpcap.
One of the features will be remote OS detection, in which I must implement an algorithm that detects when a host sends 5 packets within 20 seconds that have different ACK, SYN, and FIN combinations.
What would be a good method of detecting these different "combinations"? A brute-force algorithm would be time-consuming to implement, but I can't think of a better method.
Notes: jpcap's API allows one to know if the packet is ACK, SYN, and/or FIN. Also note that one doesn't need to know what ACK, SYN, and FIN are in order to understand the problem.
Thanks!

I built my own data structure based on vectors that hold "records" about the type of packet.

You need to keep state on each session. - using hashtables. Keep each syn,ack and fin/fin-ack. I wrote and opensource IDS sniffer a few years ago that does this; feel free to look at the code. It should be very easy to write an algorithm to do passive os-detection (google it). My opensource code is here dnasystem

Related

Networking - Data To Be Sent To Server

I'm attempting to make my first multiplayer game (I'm doing this in Ruby with Gosu) and I'm wondering what information to send to the server and how many, if any, of the calculations should be done on the server.
Should the client be used simply for input gathering and drawing while leaving the server to compute everything else? Or should it be more evenly distributed than that?
I'm going to answer my own question with some more experience under my belt for the sake of anyone who might be interested or in need of an answer.
It will depend on what you're doing, but primarily, for most games, it's the best practice to have the client get and send inputs to the server so that it can do all the required calculations. This makes it much harder for players to cheat by using software such as Cheat Engine, as it means the only values they'd be able to change would be local variables, which have no bearing on the game.
However, in sending all of the client data from the server to the client, be careful not to send too much as it can end up creating a lot of network overhead. Keep your data transferred to the bare minimum needed. On that same note however, don't be afraid of adding data to your packets, just make sure you're being efficient.
Good luck with your projects everyone, and feel free to add to or debate my answer if something isn't up to scratch.

How do you mitigate proposal-number overflow attacks in Byzantine Paxos?

I've been doing a lot of research into Paxos recently, and one thing I've always wondered about, I'm not seeing any answers to, which means I have to ask.
Paxos includes an increasing proposal number (and possibly also a separate round number, depending on who wrote the paper you're reading). And of course, two would-be leaders can get into duels where each tries to out-increment the other in a vicious cycle. But as I'm working in a Byzantine, P2P environment, it makes me what to do about proposers that would attempt to set the proposal number extremely high - for example, the maximum 32-bit or 64-bit word.
How should a language-agnostic, platform-agnostic Paxos-based protocol deal with integer maximums for proposal number and/or round number? Especially intentional/malicious cases, which make the modular-arithmetic approach of overflowing back to 0 a bit unattractive?
From what I've read, I think this is still an open question that isn't addressed in literature.
Byzantine Proposer Fast Paxos addresses denial of service, but only of the sort that would delay message sending through attacks not related to flooding with incrementing (proposal) counters.
Having said that, integer overflow is probably the least of your problems. Instead of thinking about integer overflow, you might want to consider membership attacks first (via DoS). Learning about membership after consensus from several nodes may be a viable strategy, but probably still vulnerable to Sybil attacks at some level.
Another strategy may be to incorporate some proof-of-work system for proposals to limit the flood of requests. However, it's difficult to know what to use this as a metric to balance against (for example, free currency when you mine the block chain in Bitcoin). It really depends on what type of system you're trying to build. You should consider the value of information in your system, then create a proof of work system that requires slightly more cost to circumvent.
However, once you have the ability to slow down a proposal counter, you still need to worry about integer maximums in any system with a high number of (valid) operations. You should have a strategy for number wrapping or a multiple precision scheme in place where you can clearly determine how many years/decades your network can run without encountering trouble without blowing out a fixed precision counter. If you can determine that your system will run for 100 years (or whatever) without blowing out your fixed precision counter, even with malicious entities, then you can choose to simplify things.
On another (important) note, the system model used in most papers doesn't reflect everything that makes a real-life implementation practical (Raft is a nice exception to this). If anything, some authors are guilty of creating a system model that is designed to avoid a hard problem that they haven't found an answer to. So, if someone says that X will solve everything, please be aware they they only mean that it solves everything in the very specific system model that they defined. On the other side of this, you should consider that the system model is closely tied to a statement that says "Y is impossible". A nice example to explain this concept is the completely asynchronous message passing of the Ben-Or consensus algorithm which uses nondeterminism in the system model's state machine to avoid the limits specified by the FLP impossibility result (which specifies that consensus requires partially asynchronous message passing when the system model's state machine is deterministic).
So, you should continue to consider the "impossible" after you read a proof that says it can't be done. Nancy Lynch did a nice writeup on this concept.
I guess what I'm really saying is that a good solution to your question doesn't really exist yet. If you figure it out, please publish it (or let me know if you find an existing paper).

MPI Alltoallv or better individual Send and Recv? (Performance)

I have a number of processes (of the order of 100 to 1000) and each of them has to send some data to some (say about 10) of the other processes. (Typically, but not necessary always, if A sends to B, B also sends to A.) Every process knows how much data it has to receive from which process.
So I could just use MPI_Alltoallv, with many or most of the message lengths zero.
However, I heard that for performance reasons it would be better to use several MPI_send and MPI_recv communications rather than the global MPI_Alltoallv.
What I do not understand: if a series of send and receive calls are more efficient than one Alltoallv call, why is Alltoallv not just implemented as a series of sends and receives?
It would be much more convenient for me (and others?) to use just one global call. Also I might have to be concerned about not running into a deadlock situation with several Send and Recv (fixable by some odd-even strategy or more complex? or by using buffered send/recv?).
Would you agree that MPI_Alltoallv is necessary slower than the, say, 10 MPI_Send and MPI_Recv; and if yes, why and how much?
Usually the default advice with collectives is the opposite: use a collective operation when possible instead of coding your own. The more information the MPI library has about the communication pattern, the more opportunities it has to optimize internally.
Unless special hardware support is available, collective calls are in fact implemented internally in terms of sends and receives. But the actual communication pattern will probably not be just a series of sends and receives. For example, using a tree to broadcast a piece of data can be faster than having the same rank send it to a bunch of receivers. A lot of work goes into optimizing collective communications, and it is difficult to do better.
Having said that, MPI_Alltoallv is somewhat different. It can be difficult to optimize for all irregular communication scenarios at the MPI level, so it is conceivable that some custom communication code can do better. For example, an implementation of MPI_Alltoallv might be synchronizing: it could require that all processes "check in", even if they have to send a 0-length message. I though that such an implementation is unlikely, but here is one in the wild.
So the real answer is "it depends". If the library implementation of MPI_Alltoallv is a bad match for the task, custom communication code will win. But before going down that path, check if the MPI-3 neighbor collectives are a good fit for your problem.

Best Practice in designing a client/server communication protocol

I am currently integrating a server functionality into a software that runs a complicated measuring system.
The client will be a software from another company that will periodically ask my software for the current state of the system.
Now my question is: What is the best way to design the protocol to provide these state information. There are many different states that have to be transmitted.
I have seen solutions where they generate a different state flags and then only transfer for example a 32 bit number where each bit stands for a different state.
Example:
Bit 0 - System Is online
Bit 1 - Measurement in Progress
Bit 2 - Temperature stabilized
... and so on.
This solution will produce very little traffic. Though it seems very unflexible to me and also very hard to debug.
The other I think it could be done is to tranfer each state preceded by the name of the state:
Example:
#SystemOnline#1#MeasurementInProgress#0#TemperatureInProgress#0#.....
This solution will produce a lot more traffic. But it appears a lot more flexible because the order in which each state is tranfered irrelevant. Also it should be a lot easier to debug.
Does anybody knows from experience a good way to solve the problem, or does anybody know a good source of knowledge where I can find best practices. I just want to prevent trying to reinvent the wheel
Once you've made a network request to a remote system, waited for the response, and received and decoded the response, it hardly matters whether the response is 32 bits or 32K. And how many times a second will you be generating this traffic? If less than 1, it matters even less. So use whatever is easiest to implement and most natural for the client, be it a string, or XML.

What kind of problems are state machines good for? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What kind of programming problems are state machines most suited for?
I have read about parsers being implemented using state machines, but would like to find out about problems that scream out to be implemented as a state machine.
The easiest answer is probably that they are suited for practically any problem. Don't forget that a computer itself is also a state machine.
Regardless of that, state machines are typically used for problems where there is some stream of input and the activity that needs to be done at a given moment depends the last elements seen in that stream at that point.
Examples of this stream of input: some text file in the case of parsing, a string for regular expressions, events such as player entered room for game AI, etc.
Examples of activities: be ready to read a number (after another number followed by a + have appear in the input in a parser for a calculator), turn around (after player approached and then sneezed), perform jumping kick (after player pressed left, left, right, up, up).
A good resource is this free State Machine EBook. My own quick answer is below.
When your logic must contain information about what happened the last time it was run, it must contain state.
So a state machine is simply any code that remembers (or acts on) information that can only be gained by understanding what happened before.
For instance, I have a cellular modem that my program must use. It has to perform the following steps in order:
reset the modem
initiate communications with the modem
wait for the signal strength to indicate a good connection with a tower
...
Now I could block the main program and simply go through all these steps in order, waiting for each to run, but I want to give my user feedback and perform other operations at the same time. So I implement this as a state machine inside a function, and run this function 100 times a second.
enum states{reset,initsend, initresponse, waitonsignal,dial,ppp,...}
modemfunction()
{
static currentstate
switch(currentstate)
{
case reset:
Do reset
if reset was successful, nextstate=init else nextstate = reset
break
case initsend
send "ATD"
nextstate = initresponse
break
...
}
currentstate=nextstate
}
More complex state machines implement protocols. For instance a ECU diagnostics protocol I used can only send 8 byte packets, but sometimes I need to send bigger packets. The ECU is slow, so I need to wait for a response. Ideally when I send a message I use one function and then I don't care what happens, but somewhere my program must monitor the line and send and respond to these messages, breaking them up into smaller pieces and reassembling the pieces of received messages into the final message.
Stateful protocols such as TCP are often represented as state machines. However it's rare that you should want to implement anything as a state machine proper. Usually you will use a corruption of one, i.e. have it carrying out a repeated action while sitting in one state, logging data while it transitions, or exchanging data while remaining in one state.
Objects in games are often represented as state machines. An AI character might be:
Guarding
Aggressive
Patroling
Asleep
So you can see these might model some simple but effective states. Of course you could probably make a more complex continuous system.
Another example would be a process such as making a purchase on Google Checkout. Google gives a number of states for Financial and Order, and then informs you of transistions such as the credit card clearing or getting rejected, and allows you to inform it that the order has been shipped.
Regular expression matching, Parsing, Flow control in a complex system.
Regular expressions are a simple form of state machine, specifically finite automata. They have a natural represenation as such, although it is possible to implement them using mutually recursive functions.
State machines when implemented well, will be very efficient.
There is an excellent state machine compiler for a number of target languages, if you want to make a readable state machine.
http://research.cs.queensu.ca/~thurston/ragel/
It also allows you to avoid the dreaded 'goto'.
AI in games is very often implemented using State Machines.
Helps create discrete logic that is much easier to build and test.
Workflow (see WF in .net 3.0)
They have many uses, parsers being a notable one. I have personally used simplified state machines to implement complex multi-step task dialogs in applications.
A parser example. I recently wrote a parser that takes a binary stream from another program. The meaning of the current element parsed indicates the size/meaning of the next elements. There are a (small) finite number of elements possible. Hence a state machine.
They're great for modelling things that change status, and have logic that triggers on each transition.
I'd use finite state machines for tracking packages by mail, or to keep track of the different stata of a user during the registration process, for example.
As the number of possible status values goes up, the number of transitions explodes. State machines help a lot in that case.
Just as a side note, you can implement state machines with proper tail calls like I explained in the tail recursion question.
In that exemple each room in the game is considered one state.
Also, Hardware design with VHDL (and other logic synthesis languages) uses state machines everywhere to describe hardware.
Any workflow application, especially with asynchronous activities. You have an item in the workflow in a certain state, and the state machine knows how to react to external events by placing the item in a different state, at which point some other activity occurs.
The concept of state is very useful for applications to "remember" the current context of your system and react properly when a new piece of information arrives. Any non trivial application has that notion embedded in the code thru variables and conditionals.
So if your application has to react differently every time it receives a new piece of information because of the context you are in, you could model your system with with a state machines. An example would be how to interpret the keys on a calculator, which depends on what your are processing at that point in time.
On the contrary, if your computation does not depend of the context but solely on the input (like a function adding two numbers), you will not need an state machine (or better said, you will have a state machine with zero states)
Some people design the whole application in terms of state machines since they capture the essential things to keep in mind in your project and then use some procedure or autocoders to make them executable. It takes some paradigm chance to program in this way, but I found it very effective.
Things that comes to mind are:
Robot/Machine manipulation... those robot arms in factories
Simulation Games, (SimCity, Racing Game etc..)
Generalizing: When you have a string of inputs that when interacting with anyone of them, requires the knowledge of the previous inputs or in other words, when processing of any single input requires the knowledge of previous inputs. (that is, it needs to have "states")
Not much that I know of that isn't reducible to a parsing problem though.
If you need a simple stochastic process, you might use a Markov chain, which can be represented as a state machine (given the current state, at the next step the chain will be in state X with a certain probability).

Resources