How to automatically tune parameters of an algorithm?

How to automatically tune parameters of an algorithm? - algorithm

Here's the setup:
I have an algorithm that can succeed or fail.
I want it to succeed with highest probability possible.
Probability of success depends on some parameters (and some external circumstances):
struct Parameters {
float param1;
float param2;
float param3;
float param4;
// ...
};
bool RunAlgorithm (const Parameters& parameters) {
// ...
// P(return true) is a function of parameters.
}
How to (automatically) find best parameters with a smallest number of calls to RunAlgorithm ?
I would be especially happy with a readl library.
If you need more info on my particular case:
Probability of success is smooth function of parameters and have single global optimum.
There are around 10 parameters, most of them independently tunable (but some are interdependent)
I will run the tunning overnight, I can handle around 1000 calls to Run algorithm.
Clarification:
Best parameters have to found automatically overnight, and used during the day.
The external circumstances change each day, so computing them once and for all is impossible.
More clarification:
RunAlgorithm is actually game-playing algorithm. It plays a whole game (Go or Chess) against fixed opponent. I can play 1000 games overnight. Every night is other opponent.
I want to see whether different opponents need different parameters.
RunAlgorithm is smooth in the sense that changing parameter a little does change algorithm only a little.
Probability of success could be estimated by large number of samples with the same parameters.
But it is too costly to run so many games without changing parameters.
I could try optimize each parameter independently (which would result in 100 runs per parameter) but I guess there are some dependencies.
The whole problem is about using the scarce data wisely.
Games played are very highly randomized, no problem with that.

Maybe you are looking for genetic algorithms.

Why not allow the program fight with itself? Take some vector v (parameters) and let it fight with v + (0.1,0,0,0,..,0), say 15 times. Then, take the winner and modify another parameter and so on. With enough luck, you'll get a strong player, able to defeat most others.
Previous answer (much of it is irrevelant after the question was edited):
With these assumptions and that level of generality, you will achieve nothing (except maybe an impossiblity result).
Basic question: can you change the algorithm so that it will return probability of success, not the result of a single experiment? Then, use appropriate optimization technique (nobody will tell you which under such general assumptions). In Haskell, you can even change code so that it will find the probability in simple cases (probability monad, instead of giving a single result. As others mentioned, you can use a genetic algorithm using probability as fitness function. If you have a formula, use a computer algebra system to find the maximum value.
Probability of success is smooth function of parameters and have single global optimum.
Smooth or continuous? If smooth, you can use differential calculus (Lagrange multipliers?). You can even, with little changes in code (assuming your programming language is general enough), compute derivatives automatically using automatic differentiation.
I will run the tunning overnight, I can handle around 1000 calls to Run algorithm.
That complex? This will allow you to check two possible values (210=1024), out of many floats. You won't even determine order of magnitude, or even order of order of magnitude.
There are around 10 parameters, most of them independently tunable (but some are interdependent)
If you know what is independent, fix some parameters and change those that are independent of them, like in divide-and-conquer. Obviously it's much better to tune two algorithms with 5 parameters.
I'm downvoting the question unless you give more details. This has too much noise for an academic question and not enough data for a real-world question.

The main problem you have is that, with ten parameters, 1000 runs is next to nothing, given that, for each run, all you have is a true/false result rather than a P(success) associated with the parameters.
Here's an idea that, on the one hand, may make best use of your 1000 runs and, on the other hand, also illustrates the the intractability of your problem. Let's assume the ten parameters really are independent. Pick two values for each parameter (e.g. a "high" value and a "low" value). There are 1024 ways to select unique combinations of those values; run your method for each combination and store the result. When you're done, you'll have 512 test runs for each value of each parameter; with the independence assumption, that might give you a decent estimate on the conditional probability of success for each value. An analysis of that data should give you a little information about how to set your parameters, and may suggest refinements of your "high" and "low" values for future nights. The back of my mind is dredging up ANOVA as a possibly useful statistical tool here.
Very vague advice... but, as has been noted, it's a rather vague problem.

Specifically for tuning parameters for game-playing agents, you may be interested in CLOP
http://remi.coulom.free.fr/CLOP/

Not sure if I understood correctly...
If you can choose the parameters for your algorithm, does it mean that you can choose it once for all?
Then, you could simply:
have the developper run all/many cases only once, find the best case, and replace the parameters with the best value
at runtime for your real user, the algorithm is already parameterized with the best parameters
Or, if the best values change for each run ...
Are you looking for Genetic Algorithms type of approach?

The answer to this question depends on:
Parameter range. Can your parameters have a small or large range of values?
Game grading. Does it have to be a boolean, or can it be a smooth function?
One approach that seems natural to this problem is Hill Climbing.
A possible way to implement would be to start with several points, and calculate their "grade". Then figure out a favorable direction for the next point, and try to "ascend".
The main problems that I see in this question, as you presented it, is the huge range of parameter values, and the fact that the result of the run is boolean (and not a numeric grade). This will require many runs to figure out whether a set of chosen parameters are indeed good, and on the other hand, there is a huge set of parameters values yet to check. Just checking all directions will result in a (too?) large number of runs.

Related

manufacturing ambiguous data set

I hope questions like this belong here.
So here is the problem I am dealing with right now:
I have some data collected from a manufacturing process (sensor data, process parameters etc.) and for every part that leaves the production line i know if it is scrap or not.
So for each part I have the its process data and the quality (0: good 1:bad)
My goal is to optimize the manufacturing process, i.e. find the optimal process parameters to produce the least amount of scrap.
What i did so far: I tried different classification algorithms (random forest, SVM, neural network) but none are able to achieve a good accuracy.
I think the reason is that the data is very ambiguous, i.e if i have parts with the same process parameters some of them might be scrap while some might be good. But there is definitely a connection between quality and process parameters.
What i want to now is to predict the "probability" for a part to be good or bad. Imo i want to estimate the probability density? Can i do this with K-nearest neighbours?

A step you could try is to, for each parameter, estimate , where x is the parameter value and is the good/bad indicator variable.
There's a chance that does not adhere to any particular distribution, and not knowing the type of values they take on it would be hard for me to make a suggestion.
A "model free" approach would be to, given a set of n observations , "discretize" the parameter x so that
Then you can estimate the pmf via
and similarly for the "bad" case.
After you have for each parameter, you can compute the relative entropy/KL divergence between that parameter's "good" and "bad" cases. Those that have larger divergence between the two classes are the parameters that matter most, and their pmfs will hopefully show you which values are indicative of bad performance.
This is of course assuming the paramters iid, which they may in fact not be, but a similar process can be performed by considering co-parameters that are not independent and discretizing accordingly.

Estimate good parameters for Algorithms with lots of arguments (Like for MSER in OpenCV)

I was wondering if there is a better way to estimate a good set of parameters for algorithms with lots of arguments than just randomly picking them. In detail I am trying to find some good parameters for the MSER Feature Detector which consumes 9 number parameters so there is a huge space to search in. I was thinking about alternatingly picking smaller and larger numbers around the default parameter value with exponentially growing distance. Are there any good thoughts that could help me?
Thanks!

First, you must define an objective function you want to minimize - what defines "better" parameters? In your case, I'd suggest using the number of correct matches found or similar.
Second, you must have an efficient way of looping over the virtually uncountable possibilities. Here, it probably helps that there is a minimal step size beyond which the results don't meaningfully change. Since the objective function is not necessarily derivable, I'd use a method similar to the Golden search in each dimension separately, and then repeat, until hopefully a global "good enough" maximum is reached.

Multiple parameter optimization with a stochastic element

I am looking for a method to find the best parameters for a simulation. It's about break-shots in billiards / pool. A shot is defined by 7 parameters, I can simulate the shot and then rate the outcome and I would like to compute the best parameters.
I have found the following link here:
Multiple parameter optimization with lots of local minima
suggesting 4 kinds of algorithms. In the pool simulator I am using, the shots are altered by a little random value each time it is simulated. If I simulate the same shot twice, the outcome will be different. So I am looking for an algorithm like the ones in the link above, only with the addition of a stochastical element, optimizing for the 7 parameters that will on average yield the best parameters, i.e. a break shot that most likely will be a success. My initial idea was simulating the shot 100 or 1000 times and just take the average as rating for the algorithms above, but I still feel like there is a better way. Does anyone have an idea?
The 7 parameters are continuous but within different ranges (one from 0 to 10, another from 0.0 to 0.028575 and so on).
Thank you

At least for some of the algorithms, simulating the same shot repeatedly might not be neccessary. As long as your alternatives have some form of momentum, like in the swarm simulation approach, you can let that be affected by the outcome of each individual simulation. In that case, a single unlucky simulation would slow the movement in parameter space only slightly, whereas a serious loss of quality should be enough to stop and reverse the movement. Thos algorithms which don't use momentum might be tweaked to have momentum. If not, then repeated simulation seems the best approach. Unless you can get your hands on the internals of the simulator, and rate the shot as a whole without having to simulate it over and over again.

You can use the algorithms you mentioned in your non-deterministic scenario with independent stochastic runs. Your idea with repeated simulations is good, you can read more about how many repeats you might have to consider for your simulations (unfortunately, there is no trivial answer). If you are not so much into maths, and the runs go fast, do 1.000 repeats, then 10.000 repeats, and see if the results differ largely. If yes, you have to collect more samples, if not, you are probably on the safe side (the central limit theorem states that the results converge).
Further, do not just consider the average! Make sure to look into the standard deviation for each algorithm's results; you might want to use box plots to compare their quartiles. If you rely on the average only, you could pick an algorithm that produces very varying results, sometimes excellent, sometimes terrible in performance.
I don't know what language you are using, but if you use Java, I am maintaining a tool that could simplify your "monte carlo" style experiments.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.

How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.

Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.

I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.

I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.

There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

A Good and SIMPLE Measure of Randomness

What is the best algorithm to take a long sequence of integers (say 100,000 of them) and return a measurement of how random the sequence is?
The function should return a single result, say 0 if the sequence is not all all random, up to, say 1 if perfectly random. It can give something in-between if the sequence is somewhat random, e.g. 0.95 might be a reasonably random sequence, whereas 0.50 might have some non-random parts and some random parts.
If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1. If I passed the sequence 1, 2, ... 100,000 to it, it should return 0.
This way I can easily take 30 sequences of numbers, identify how random each one is, and return information about their relative randomness.
Is there such an animal?
…..
Update 24-Sep-2019: Google may have just ushered in an era of quantum supremacy says:
"Google’s quantum computer was reportedly able to solve a calculation — proving the randomness of numbers produced by a random number generator — in 3 minutes and 20 seconds that would take the world’s fastest traditional supercomputer, Summit, around 10,000 years. This effectively means that the calculation cannot be performed by a traditional computer, making Google the first to demonstrate quantum supremacy."
So obviously there is an algorithm to "prove" randomness. Does anyone know what it is? Could this algorithm also provide a measure of randomness?

Your question answers itself. "If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1", except the digits of Pi are not random numbers so if your algorithm does not recognise a very specific sequence as being non-random then it's not very good.
The problem here is there are many types of non random-ness:-
eg. "121,351,991,7898651,12398469018461" or "33,27,99,3000,63,231" or even "14297141600464,14344872783104,819534228736,3490442496" are definitely not random.
I think what you need to do is identify the aspects of randomness that are important to you-
distribution, distribution of digits, lack of common factors, the expected number of primes, Fibonacci and other "special" numbers etc. etc.
PS. The Quick and Dirty (and very effective) test of randomness does the file end up roughly the same size after you gzip it.

It can be done this way:
CAcert Research Lab does a Random Number Generator Analysis.
Their results page evaluates each random sequence using 7 tests (Entropy, Birthday Spacing, Matrix Ranks, 6x8 Matrix Ranks, Minimum Distance, Random Spheres, and the Squeeze). Each test result is then color coded as one of "No Problems", "Potentially deterministic" and "Not Random".
So a function can be written that accepts a random sequence and does the 7 tests.
If any of the 7 tests are "Not Random" then the function returns a 0. If all of the 7 tests are "No Problems", then it returns a 1. Otherwise, it can return some number in-between based on how many tests come in as "Potentially Deterministic".
The only thing missing from this solution is the code for the 7 tests.

You could try to zip-compress the sequence. The better you succeed the less random the sequence is.
Thus, heuristic randomness = length of zip-code/length of original sequence

As others have pointed out, you can't directly calculate how random a sequence is but there are several statistical tests that you could use to increase your confidence that a sequence is or isn't random.
The DIEHARD suite is the de facto standard for this kind of testing but it neither returns a single value nor is it simple.
ENT - A Pseudorandom Number Sequence Test Program, is a simpler alternative that combines 5 different tests. The website explains how each of these tests works.
If you really need just a single value, you could pick one of the 5 ENT tests and use that. The Chi-Squared test would probably be the best to use, but that might not meet the definition of simple.
Bear in mind that a single test is not as good as running several different tests on the same sequence. Depending on which test you choose, it should be good enough to flag up obviously suspicious sequences as being non-random, but might not fail for sequences that superficially appear random but actually exhibit some pattern.

You can treat you 100.000 outputs as possible outcomes of a random variable and calculate associated entropy of it. It will give you a measure of uncertainty. (Following image is from wikipedia and you can find more information on Entropy there.) Simply:
You just need to calculate the frequencies of each number in the sequence. That will give you p(xi) (e.g. If 10 appears 27 times p(10) = 27/L where L is 100.000 for your case.) This should give you the measure of entropy.
Although it will not give you a number between 0 to 1. Still 0 will be minimal uncertainty. However the upper bound will not be 1. You need to normalize the output to achieve that.

What you seek doesn't exist, at least not how you're describing it now.
The basic issue is this:
If it's random then it will pass tests for randomness; but the converse doesn't hold -- there's no test that can verify randomness.
For example, one could have very strong correlations between elements far apart and one would generally have to test explicitly for this. Or one could have a flat distribution but generated in a very non-random way. Etc, etc.
In the end, you need to decide on what aspects of randomness are important to you, and test for these (as James Anderson describes in his answer). I'm sure if you think of any that aren't obvious how to test for, people here will help.
Btw, I usually approach this problem from the other side: I'm given some set of data that looks for all I can see to be completely random, but I need to determine whether there's a pattern somewhere. Very non-obvious, in general.

"How random is this sequence?" is a tough question because fundamentally you're interested in how the sequence was generated. As others have said it's entirely possible to generate sequences that appear random, but don't come from sources that we'd consider random (e.g. digits of pi).
Most randomness tests seek to answer a slightly different questions, which is: "Is this sequence anomalous with respect to a given model?". If you're model is rolling ten sided dice, then it's pretty easy to quantify how likely a sequence is generated from that model, and the digits of pi would not look anomalous. But if your model is "Can this sequence be easily generated from an algorithm?" it becomes much more difficult.

I want to emphasize here that the word "random" means not only identically distributed, but also independent of everything else (including independent of any other choice).
There are numerous "randomness tests" available, including tests that estimate p-values from running various statistical probes, as well as tests that estimate min-entropy, which is roughly a minimum "compressibility" level of a bit sequence and the most relevant entropy measure for "secure random number generators". There are also various "randomness extractors", such as the von Neumann and Peres extractors, that could give you an idea on how much "randomness" you can extract from a bit sequence. However, all these tests and methods can only be more reliable on the first part of this definition of randomness ("identically distributed") than on the second part ("independent").
In general, there is no algorithm that can tell, from a sequence of numbers alone, whether the process generated them in an independent and identically distributed way, without knowledge on what that process is. Thus, for example, although you can tell that a given sequence of bits has more zeros than ones, you can't tell whether those bits—
Were truly generated independently of any other choice, or
form part of an extremely long periodic sequence that is only "locally random", or
were simply reused from another process, or
were produced in some other way,
...without more information on the process. As one important example, the process of a person choosing a password is rarely "random" in this sense since passwords tend to contain familiar words or names, among other reasons.
Also I should discuss the article added to your question in 2019. That article dealt with the task of sampling from the distribution of bit strings generated by pseudorandom quantum circuits, and doing so with a low rate of error (a task specifically designed to be exponentially easier for quantum computers than for classical computers), rather than the task of "verifying" whether a particular sequence of bits (taken out of its context) was generated "at random" in the sense given in this answer. There is an explanation on what exactly this "task" is in a July 2020 paper.

In Computer Vision when analysing textures, the problem of trying to gauge the randomness of a texture comes up, in order to segment it. This is exactly the same as your question, because you are trying to determine the randomness of a sequence of bytes/integers/floats. The best discussion I could find of image entropy is http://www.physicsforums.com/showthread.php?t=274518 .
Basically, its the statistical measure of randomness for a sequence of values.
I would also try autocorrelation of the sequence with itself. In the autocorrelation result, if there is no peaks other than the first value that means there is no periodicity to your input.

I would use Claude Shannon’s Information Entropy algorithm. You can find the calculation on Youtube easily. I guess it really depends upon why you want this to be measured, and what type of reporting you want to do with the data points you collect.

#JohnFx "... mathematically impossible."
poster states: take a long sequence of integers ...
Thus, just as limits are used in The Calculus, we can take the value as being the value - the study of Chaotics shows us finite limits may 'turn on themselves' producing tensor fields that provide the illusion of absolute(s), and which can be run as long as there is time and energy. Due to the curvature of space-time, there is no perfection - hence the op's "... say 1 if perfectly random." is a misnomer.
{ noted: ample observations on that have been provided - spare me }
According to your position, given two byte[] of a few k, each randomized independently - op could not obtain "a measurement of how random the sequence is" The article at Wiki is informative, and makes definite strides dis-entagling the matter, but
In comparison to classical physics, quantum physics predicts that the properties of a quantum mechanical system depend on the measurement context, i.e. whether or not other system measurements are carried out.
A team of physicists from Innsbruck,
Austria, led by Christian Roos and
Rainer Blatt, have for the first time
proven in a comprehensive experiment
that it is not possible to explain
quantum phenomena in non-contextual
terms.
Source: Science Daily
Let us consider non-random lizard movements. The source of the stimulus that initiates complex movements in the shed tails of leopard geckos, under your original, corrected hyper-thesis, can never be known. We, the experienced computer scientists, suffer the innocent challenge posed by newbies knowing too well that there - in the context of an un-tainted and pristine mind - are them gems and germinators of feed-forward thinking.
If the thought-field of the original lizard produces a tensor-field ( deal with it folks, this is front-line research in sub-linear physics ) then we could have "the best algorithm to take a long sequence" of civilizations spanning from the Toba Event to present through a Chaotic Inversion". Consider the question whether such a thought-field produced by the lizard, taken independently, is a spooky or knowable.
"Direct observation of Hardy's paradox
by joint weak measurement with an
entangled photon pair," authored by
Kazuhiro Yokota, Takashi Yamamoto,
Masato Koashi and Nobuyuki Imoto from
the Graduate School of Engineering
Science at Osaka University and the
CREST Photonic Quantum Information
Project in Kawaguchi City
Source: Science Daily
( considering the spooky / knowable dichotomy )
I know from my own experiments that direct observation weakens the absoluteness of perceptible tensors, distinguishing between thought and perceptible tensors is impossible using only single focus techniques because the perceptible tensor is not the original thought. A fundamental consequence of quantaeus is that only weak states of perceptible tensors can be reliably distinguished from one another without causing a collapse into a unified perceptible tensor. Try it sometime - work on the mainifestation of some desired eventuality, using pure thought. Because an idea has no time or space, it is therefore in-finite. ( not-finite ) and therefore can attain "perfection" - i.e. absoluteness. Just for a hint, start with the weather as that is the easiest thing to influence ( as least as far as is currently known ) then move as soon as can be done to doing a join from the sleep-state to the waking-state with virtually no interruption of sequential chaining.
There is an almost unavoidable blip there when the body wakes up but it is just like when the doorbell rings, speaking of which brings an interesting area of statistical research to funding availability: How many thoughts can one maintain synchronously? I find that duality is the practical working limit, at triune it either breaks on the next thought or doesn't last very long.
Perhaps the work of Yokota et al could reveal the source of spurious net traffic...maybe it's ghosts.

As per Knuth, make sure you test the low-order bits for randomness, since many algorithms exhibit terrible randomness in the lowest bits.

Although this question is old, it does not seem "solved", so here is my 2 cents, showing that it is still an important problem that can be discussed in simple terms.
Consider password security.
The question was about "long" number sequences, "say 100.000", but does not state what is the criterium for "long". For passwords, 8 characters might be considered long. If those 8 chars were "random", it might be considered a good password, but if it can be easily guessed, a useless password.
Common password rules are to mix upper case, numbers and special characters. But the commonly used "Password1" is still a bad password. (okay, 9-char example, sorry) So how many of the methods of the other answers you apply, you should also check if the password occurs in several dictionaries, including sets of leaked passwords.
But even then, just imagine the rise of a new Hollywood star. This may lead to a new famous name that will be given to newborns, and may become popular as a password, that is not yet in the dictionaries.
If I am correctly informed, it is pretty much impossible to automatically verify that a password selected by a human is random and not derived with an easy to guess algorithm. And also that a good password system should work with computer-generated random passwords.
The conclusion is that there is no method to verify if an 8-char password is random, let alone a good and simple method. And if you cannot verify 8 characters, why would it be easier to verify 100.000 numbers?
The password example is just one example of how important this question of randomness is; think also about encryption. Randomness is the holy grail of security.

Measuring randomness? In order to do so, you should fully understand its meaning. The problem is, if you search the internet you will reach the conclusion that there is a nonconformity concept of randomness. For some people it's one thing, for others it's something else. You'll even find some definitions given through a philosophical perspective. One of the most frequent misleading concepts is to test if "it's random or not random". Randomness is not a "yes" or a "no", it could be anything in between. Although it is possible to measure and quantify "randomness", its concept should remain relative regarding its classification and categorization. So, to say that something is random or not random in an absolute way would be wrong because it's relative and even subjective for that matter. Accordingly, it is also subjective and relative to say that something follows a pattern or doesn't because, what's a pattern? In order to measure randomness, you have to start off by understanding it's mathematical theoretical premise. The premise behind randomness is easy to understand and accept. If all possible outcomes/elements in your sample space have the EXACT same probability of happening than randomness is achieved to it's fullest extent. It's that simple. What is more difficult to understand is linking this concept/premise to a certain sequence/set or a distribution of outcomes of events in order to determine a degree of randomness. You could divide your sample into sets or subsets and they could prove to be relatively random. The problem is that even if they prove to be random by themselves, it could be proven that the sample is not that random if analyzed as a whole. So, in order to analyze the degree of randomness, you should consider the sample as a whole and not subdivided. Conducting several tests to prove randomness will necessarily lead to subjectiveness and redundancy. There are no 7 tests or 5 tests, there is only one. And that test follows the already mentioned premise and thus determines the degree of randomness based on the outcome distribution type or in other words, the outcome frequency distribution type of a given sample. The specific sequence of a sample is not relevant. A specific sequence would only be relevant if you decide to divide your sample into subsets, which you shouldn't, as I already explained. If you consider the variable p(possible outcomes/elements in sample space) and n(number of trials/events/experiments) you will have a number of total possible sequences of (p^n) or (p to the power of n). If we consider the already mentioned premise to be true, any of these possible sequences have the exact same probability of occurring. Because of this, any specific sequence would be inconclusive in order to calculate the "randomness" of a sample. What is essential is to calculate the probability of the outcome distribution type of a sample of happening. In order to do so, we would have to calculate all the sequences that are associated with the outcome distribution type of a sample. So if you consider s=(number of all possible sequences that lead to a outcome distribution type), then s/(p^n) would give you a value between 0 and 1 which should be interpreted as being a measurement of randomness for a specific sample. Being that 1 is 100% random and 0 is 0% random. It should be said that you will never get a 1 or a 0 because even if a sample represents the MOST likely random outcome distribution type it could never be proven as being 100%. And if a sample represents the LEAST likely random outcome distribution type it could never be proven as being 0%. This happens because since there are several possible outcome distribution types, no single one of them can represent being 100% or 0% random. In order to determine the value of variable (s), you should use the same logic used in multinominal distribution probabilities. This method applies to any number of possible outcomes/elements in sample space and to any number of experiments/trials/events. Notice that, the bigger your sample is, the more are the possible outcome frequency distribution types, and the less is the degree of randomness that can be proven by each one of them.
Calculating [s/(n^t)]*100 will give you the probability of the outcome frequency dirtibution type of a set occuring if the source is truly random. The higher the probability the more random your set is. To actually obtain a value of randomness you would have to divide [s/(n^t)] by the highest value [s/(n^t)] of all possible outcome frequency distibution types and multiply by 100.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio