socket library poll vs custom poll - algorithm

So I was having some (arguably) fun with sockets (in c) and then I came across the problem of asynchronously receiving.
As stated here, select and poll does a linear search across the sockets, which does not scale very well. Then I thought, can I do better, knowing application specific behaviour of the sockets?
For instance, if
Xn: the time of arrival of the nth datagram for socket X (for simplicity lets assume time is discrete)
Pr(Xn = xn | Xn-1 = xn-1, Xn-2 = xn-2 ...): the probability of Xn = xn given the previous arrival times
is known by statistics or assumption or whatever. I could then implement an algorithm that polls sockets in the order of largest probability.
The question is, is this an insane attempt? Does the library poll/select have some advantage that I can't beat from user space?
EDIT: to clarify, I don't mean to duplicate the semantics of poll and select, I just want a working way of finding at least a socket that is ready to receive.
Also, stuff like epoll exists and all that, which I think is most likely superior but I want to seek out any possible alternatives first.

Does the library poll/select have some advantage that I can't beat from user space?
The C library runs in userspace, too, but its select() and poll() functions almost certainly are wrappers for system calls (but details vary from system to system). That they wrap single system calls (where in fact they do so) does give them a distinct advantage over any scheme involving multiple system calls, such as I imagine would be required for the kind of approach you have in mind. System calls have high overhead.
All of that is probably moot, however, if you have in mind to duplicate the semantics of select() and poll(): specifically, that when they return, they provide information on all the files that are ready. In order to do that, they must test or somehow watch every specified file, and so, therefore, must your hypothetical replacement. Since you need to scan every file anyway, it doesn't much matter what order you scan them in; a linear scan is probably an ideal choice because it has very low overhead.

Related

How is Monte Carlo Tree Search implemented in practice

I understand, to a certain degree, how the algorithm works. What I don't fully understand is how the algorithm is actually implemented in practice.
I'm interested in understanding what optimal approaches would be for a fairly complex game (maybe chess). i.e. recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)?
-- What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)
-- If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?
recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)
In MCTS, there's not much of a point in a recursive implementation (which is common in other tree search algorithms like the minimax-based ones), because you always go "through" a game in sequences from current game state (root node) till game states you choose to evaluate (terminal game states, unless you choose to go with a non-standard implementation using a depth limit on the play-out phase and a heuristic evaluation function). The much more obvious implementation using while loops is just fine.
If it's your first time implementing the algorithm, I'd recommend just going for a single-threaded implementation first. It is a relatively easy algorithm to parallelize though, there are multiple papers on that. You can simply run multiple simulations (where simulation = selection + expansion + playout + backpropagation) in parallel. You can try to make sure everything gets updated cleanly during backpropagation, but you can also simply decide to not use any locks / blocking etc. at all, there's already enough randomness in all the simulations anyway so if you lose information from a couple of simulations here and there due to naively-implemented parallelization it really doesn't hurt too much.
As for data structures, unlike algorithms like minimax, you actually do need to explicitly build a tree and store it in memory (it is built up gradually as the algorithm is running). So, you'll want a general tree data structure with Nodes that have a list of successor / child Nodes, and also a pointer back to the parent Node (required for backpropagation of simulation outcomes).
What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)
Running across many cores can be done yes (see point about parallelization above). I don't see any part of the algorithm being particularly well-suited for GPU implementations (there are no large matrix multiplications or anything like that), so GPU is unlikely to be interesting.
If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?
In the most commonly-described implementation, the algorithm creates only one new node to store in memory per iteration/simulation in the expansion phase (the first node encountered after the Selection phase). All other game states generated in the play-out phase of the same simulation do not get any nodes to store in memory at all. This keeps memory usage in check, it means your tree only grows relatively slowly (at a rate of 1 node per simulation). It does mean you get slightly less re-usage of previously-simulated branches, because you don't store everything you see in memory. You can choose to implement a different strategy for the expansion phase (for example, create new nodes for all game states generated in the play-out phase). You'll have to carefully monitor memory usage if you do this though.

Algorithm that costs time to run but easy to verify?

I am designing an website for experiment, there will be a button which user must click and hold for a while, then release, then client submits AJAX event to server.
However, to prevent from autoclick bots and fast spam, I want the hold time to be very real and not skip-able, e.g. doing some calculation. The point is to waste actual CPU time, so that you can't simply guess the AJAX callback value or turning faster system clock to bypass it.
Are there any algorithm that
fast & easy to generate a challenge on a server
costs some time to execute on the client side, no spoof or shortcut the time.
easy & fast to verify the response result on a server?
You're looking for a Proof-of-work system.
The most popular algorithm seems to be Hashcash (also on Wikipedia), which is used for bitcoins, among other things. The basic idea is to ask the client program to find a hash with a certain number of leading zeroes, which is a problem they have to solve with brute force.
Basically, it works like this: the client has some sort of token. For email, this is usually the recipient's email address and today's date. So it could look like this:
bob#example.com:04102011
The client now has to find a random string to put in front of this:
asdlkfjasdlbob#example.com:04202011
such that the hash of this has a bunch of leading 0s. (My example won't work because I just made up a number.)
Then, on your side, you just have to take this random input and run a single hash on it, to check if it starts with a bunch of 0s. This is a very fast operation.
The reason that the client has to spend a fair amount of CPU time on finding the right hash is that it is a brute-force problem. The only know want to do it is to choose a random string, test it, and if it doesn't work, choose another one.
Of course, since you're not doing emails, you will probably want to use a different token of some sort rather than an email address and date. However, in your case, this is easy: you can just make up a random string server-side and pass it to the client.
One advantage of this particular algorithm is that it's very easy to adjust the difficulty: just change how many leading zeroes you want. The more zeroes you require, the longer it will take the client; however, verification still takes the same amount of time on your end.
Came back to answer my own question. This is called Verifiable Delay Function
The concept was first proposed in 2018 by Boneh et al., who proposed several candidate structures for constructing verifiable delay functions and it is an important tool to add time delay in decentralized applications. To be exact, the verifiable delay function is a function f:X→Y that takes a prescribed wall-clock time to compute, even on a parallel processor, ond outputs a unique result that can effectively output the verification. In short, even if it is evaluated on a large number of parallel processors and still requires evaluation of f in a specified number of sequential steps
https://www.mdpi.com/1424-8220/22/19/7524
The idea of VDF is a step forward than #TikhonJelvis's PoW answer because apparently it "takes a prescribed wall-clock time to compute, even on a parallel processor"

How to Compare the Running Times of Two Data Structures' Operations

I want to compare the performance of two search trees of integers (an AVL tree vs a RedBlack tree). So how should I design/engineer the tests to accomplish this ? For instance, let's consider the insert operation, what steps should I follow in order to state that on average this operation is faster in the RB case ? Should I time inserting just one element (assuming the trees are pre-populated) or should I time a sequence of insertions ? Also what considerations should I take to correctly measure CPU time accurately ?
Thanks in advance.
This is a really broad question, and as such, I don't think you should be hoping for anybody to get on here and give you the one final correct answer regarding how to measure performance. That being said...
First, you should develop a suite of tests. Two popular techniques exist for doing this: monitor a real-world sequence of operations done by an application (so, find some open source application that uses either an AVL or RB tree, and add some code to print out the sequence of operations it performs) or create such a stream of operations analytically (or synthetically) to target any number of cases (the average usage, particular kinds of abnormal or otherwise unusual usage, random usage, etc.). The more of these kinds of traces you get to test, the better.
Once you have your set of traces to test, you need to develop a driver to do the evaluation. The driver should be simple, the same for both AVL and RB trees (I think that in this case, this shouldn't be a problem; both present the same interface to users, differing only in terms of internal implementation details). The driver should be able to reproduce the usage recorded in your trace sets efficiently and cause the traced operations to be carried out on your data structures. One thing I like to do is to include a third "dummy" candidate that does nothing; this way, I can see how much of an influence the processing of traces is exerting on overall performance.
Each trace should be executed many, many times. You can formalize this somewhat (to reduce statistical uncertainty to within known bounds), but a rule of thumb is that the order of your error will shrink according to 1/sqrt(n), where n is the number of trials. In other words, by running each trace 10,000 times instead of 100 times, you will get errors in the average that are 10x smaller. Record all values; things to look for are the mean, median, mode(s), etc. For each run, try to keep the system conditions the same; no other programs running, etc. To help eliminate spurious results due to external factors changing, you can cull the bottom and top 10% of outliers...
Now, simply compare the data sets. Perhaps what you care most about is the average time the trace takes? Perhaps the worst? Maybe what you really care about is consistency; is the standard deviation big or small? You should have enough data to compare the results for a given trace executed on both test structures; and for different traces, it might make more sense to look at different figures (for instance, if you created a synthetic benchmark that should be the worst case for RB trees, you might ask how badly RB and AVL trees did, whereas you might not care about this for another trace representing the best case for AVL trees, etc.)
Timing on the CPU can be a challenge in its own right. You'll need to ensure that the resolution of your timer is sufficient for measuring your events. clock() and gettimeofday() functions - and others - are popular choices for recording the time of events. If your traces finish too quickly, you can get the aggregate time for several trials (so that if your timer supports microsecond timing and your traces finish in 10 microseconds, you can measure 100 executions of the trace instead of 1, and get time values on 10s of milliseconds, which should be accurate).
Another potential pitfall is providing the same execution environment each time. In between trace runs, at the very least, you might consider techniques for ensuring that you start with a clean cache. Either that, or don't time the first execution, or understand that this result might be culled when you eliminate outliers. It might be safer to just reset the cache (by manipulating every element of some large array, for instance in between executions of traces), since code A might benefit from having some of the values in cache while code B might suffer.
These are a few of the things you might consider when doing your own performance evaluation. Other tools - like PAPI and other profilers, for instance - can measure certain events - cache hits/misses, instructions, etc. - and this information can allow for much richer comparisons than simple comparisons of wall-clock run time.
Measuring CPU time accurately can be very tricky depending on your particular programming language, implementation, etc. For example, with Java's JIT compilation, the results can be extremely different depending on how much you've run the code before now!
Can you give more detail about your situation?

An explanation of the packet-pair probing algorithm in plain language

Networked applications often benefit from the ability to estimate the bandwidth between two end-points on the Internet. This may be advantageous not only for rate control purposes, but also in isolating preferred connections where a number of alternatives exist.
Although there are a couple of rigorous treatments of packet-pair probing, a summary of the high-level principles and salient points, covering both the how and the why of the method would be very beneficial; even if only to serve as a bootstrap to more in-depth study.
Any pointers to implementations or usage of packet-pair probing that serve as good examples would also be much appreciated.
Update:
I found some good soft introduction material in a usenix paper derived from work on the nettimer tool - in particular the discussion concerning use of cross-talk filters and sampling windows for increased agility make a lot of sense.
About high-level principles: traditional means of estimating bandwidth send one packet to target and wait for it to return, then send another packet and wait for return, etc... in a sequential way. Then one computes some kind of average/median of the total time of the return trip per k-byte (or any other unit). This information is then used against the theoretical maximum bandwidth (when available) to estimate the available unused bandwidth.
Packet-pair probing send a group of packets to the target at once (i.e., in a parallel way) and wait for them to return. Then a kind of average/median is computed too and evaluated against the maximum theoretical bandwidth.
If you send more packets at once, you are disturbing the system you are trying to measure and you have to take this into account in your estimations, but it goes faster than the one-by-one method and feels more like a snapshot. The bottom question is: what's the trade-off between measurement accuracy and speed of measurement in both cases? Is there any value in this trading?
I have written a program for bandwidth estimation using packet pair method. If anyone wants to have a look at it, will be happy to share it..
EDIT:
Here, is how, I had implemented it in a class assignment,
https://github.com/npbendre/Bandwidth-Estimation-using-Packet-Pair-Probing-Algorithm
Hope this helps!

How to detect anomalous resource consumption reliably?

This question is about a whole class of similar problems, but I'll ask it as a concrete example.
I have a server with a file system whose contents fluctuate. I need to monitor the available space on this file system to ensure that it doesn't fill up. For the sake of argument, let's suppose that if it fills up, the server goes down.
It doesn't really matter what it is -- it might, for example, be a queue of "work".
During "normal" operation, the available space varies within "normal" limits, but there may be pathologies:
Some other (possibly external)
component that adds work may run out
of control
Some component that removes work seizes up, but remains undetected
The statistical characteristics of the process are basically unknown.
What I'm looking for is an algorithm that takes, as input, timed periodic measurements of the available space (alternative suggestions for input are welcome), and produces as output, an alarm when things are "abnormal" and the file system is "likely to fill up". It is obviously important to avoid false negatives, but almost as important to avoid false positives, to avoid numbing the brain of the sysadmin who gets the alarm.
I appreciate that there are alternative solutions like throwing more storage space at the underlying problem, but I have actually experienced instances where 1000 times wasn't enough.
Algorithms which consider stored historical measurements are fine, although on-the-fly algorithms which minimise the amount of historic data are preferred.
I have accepted Frank's answer, and am now going back to the drawing-board to study his references in depth.
There are three cases, I think, of interest, not in order:
The "Harrods' Sale has just started" scenario: a peak of activity that at one-second resolution is "off the dial", but doesn't represent a real danger of resource depletion;
The "Global Warming" scenario: needing to plan for (relatively) stable growth; and
The "Google is sending me an unsolicited copy of The Index" scenario: this will deplete all my resources in relatively short order unless I do something to stop it.
It's the last one that's (I think) most interesting, and challenging, from a sysadmin's point of view..
If it is actually related to a queue of work, then queueing theory may be the best route to an answer.
For the general case you could perhaps attempt a (multiple?) linear regression on the historical data, to detect if there is a statistically significant rising trend in the resource usage that is likely to lead to problems if it continues (you may also be able to predict how long it must continue to lead to problems with this technique - just set a threshold for 'problem' and use the slope of the trend to determine how long it will take). You would have to play around with this and with the variables you collect though, to see if there is any statistically significant relationship that you can discover in the first place.
Although it covers a completely different topic (global warming), I've found tamino's blog (tamino.wordpress.com) to be a very good resource on statistical analysis of data that is full of knowns and unknowns. For example, see this post.
edit: as per my comment I think the problem is somewhat analogous to the GW problem. You have short term bursts of activity which average out to zero, and long term trends superimposed that you are interested in. Also there is probably more than one long term trend, and it changes from time to time. Tamino describes a technique which may be suitable for this, but unfortunately I cannot find the post I'm thinking of. It involves sliding regressions along the data (imagine multiple lines fitted to noisy data), and letting the data pick the inflection points. If you could do this then you could perhaps identify a significant change in the trend. Unfortunately it may only be identifiable after the fact, as you may need to accumulate a lot of data to get significance. But it might still be in time to head off resource depletion. At least it may give you a robust way to determine what kind of safety margin and resources in reserve you need in future.

Resources