Creating auto-correlated random values - random

We are trying to create auto-correlated random values which will be used as timeseries.
We have no existing data we refer to and just want to create the vector from scratch.
On the one hand we need of course a random process with distribution and its SD.
On the other hand the autocorrelation influencing the random process has to be described. The values of the vector are autocorrelated with decreasing strengh over several timelags.
e.g. lag1 has 0.5, lag2 0.3, lag1 0.1 etc.
So in the end the vector should look something that:
2, 4, 7, 11, 10 , 8 , 5, 4, 2, -1, 2, 5, 9, 12, 13, 10, 8, 4, 3, 1, -2, -5
and so on.

Related

Build deep learning model for non-image data

Can you please tell me how to build an autoencoder using CNN and pooling layers, with a single matrix(4,4) with integer numbers?
e.g,
input data = array([[ 4, 3, 8, 6], [ 1, 1, 2, 2], [24, 18, 32, 24], [ 6, 6, 8, 8]])
autoencoder(data)
output data= array([[ 4, 3, 8, 6], [ 1, 1, 2, 2], [24, 18, 32, 24], [ 6, 6, 8, 8]])
Explanation:
https://medium.com/machine-learning-researcher/auto-encoder-d942a29c9807
The article you quoted is already well detailed, all you need to do is replaced linear units by convolution and max pool in the encoding part and deconvolution and upsampling in the decoding part.
Here is in example using Keras of a convolutional autoencoder.
But if you're new to Deep Learning I would not advise you to start here. Autoencoder are tricky, the expected output you described above could easily be obtained by setting:
def autoencoder(x):
return x
Which is not something you want.
People commonly design "undercomplete" autoencoders, with internal layer of smaller dimension to induce some compression. But "overcomplete" autoencoders with larger internal dimension are also valid architecture combined with regularization.

R: Convert binary file to matrix

R: I uploaded a big binary file (numeric, one array) into R with readBin.
My data is composed of values bigger than 10000 and values smaller than 10000, like
25685, 5, 6, 2, 1, 5, 46789, 6, 2, 9, 44220, 5, 1, 3, 7, 9, 12, 88952, 6, 8,...
How can I separate the values, so that I can create a matrix?
1: 25685, 5, 6, 2, 1, 5
2: 46789, 6, 2, 9
3: 44220, 5, 1, 3, 7, 9, 12
The separation should be between the last small value and the big value like above.
name<-file(".dat", "rb")
size<-(file.info("Path/Name.dat")$size)/8
rB<-readBin(name, numeric(), size, endian="little")
size<-length(rB)
sep<-which(rB>10000) # time steps
len<-diff(sep,1) # lags
len<-c(len,(length(rB)-tail(sep,1)+1)) # last value
mat<-matrix(NA,(size-length(sep)+14),4) # matrix
for(i in 1:(length(sep))) # fill the matrix
{
mat[sep[i]:(sep[i]+len[i]-2),1]<-i
mat[(sep[i]):(sep[i]+len[i]-2),2]<-(1:(len[i]-1))
mat[(sep[i]):(sep[i]+len[i]-2),3]<-rB[sep[i]]
mat[(sep[i]):(sep[i]+len[i]-2),4]<-rB[(sep[i]+1):(sep[i]+len[i]-1)]
}
mat<-mat[which(!is.na(mat[,1])),]

How to make one-dimensional k-means clustering using Ruby?

My question:
I have searched through available Ruby gems to find one that performs k-means clustering. I've found quite a few: kmeans, kmeans-clustering, reddavis-k_means and k_means_pp. My problem is that none of the gems deals with one-dimensional k-means clustering. They all expect input like this:
[[1, 2], [3, 4], [5, 6]]
My input looks like this:
[1, 2, 3, 4, 5, 6]
Hence my question: How do I perform a one-dimensional k-means clustering using Ruby?
The context (my task):
I have 100 input values:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 8, 8, 10, 16, 18, 22, 22, 35, 50, 50
Each value represents a response time, i.e. the number of minutes it took for some customer service agent to respond to an email from a customer. So the first value 0 indicates that the customer only waited 0 minutes for a response.
I need to find out how many fast, medium-fast and slow response time instances there is. In other words, I want to cut my input values up in 3 pools, and then count how many there are in each pool.
The complicating factor is that I based on the overall slope steepness have to figure out where to make the cuts. There is no fixed definition of fast, medium-fast and slow. The first cut (between fast and medium-fast) should occur where the steepness of the slope starts to increase more drastically than before. The second cut (between medium-fast and slow) should occur when an even more dramatic steepness increase occur.
Here is a graphical representation of the input values.
In the above example, common sense would probably define fast as 0-3, because there are many instances of 0, 1, 2, and 3. 4-8 or 4-10 looks like common sense choices for medium-fast. But how to determine something like this mathematically? If the response times were generally faster, then the customers would be expecting this, so an even smaller increase towards the end should trigger the cut.
Finishing notes:
I did find the gem davidrichards-kmeans that deals with one-dimensional k-means clustering, but it don't seem to work properly (the example code raises a syntax error).
k-means is the wrong tool for this job anyway.
It's not designed for fitting an exponential curve.
Here is a much more sound proposal for you:
Look at the plot, mark the three points, and then you have your three groups.
Or look at quantiles... Report the median response time, the 90% quantile, and the 99% quantile...
Clustering is about structure discovery in multivariate data. It's probably not what you want it to be, sorry.
If you insist on trying k-means, try encoding the data as
[[1], [2], [3], [4], [5]]
and check if the results are at least a little bit what you want them to be (also remember that k-means is randomized. Running it multiple times may yield very different results).

Combinatorial algorithm for assigning people to groups

A coworker came to me with an interesting problem, a practical one having to do with a "new people in town" group she's a part of.
18 friends want to have dinner in groups for each of the next 4 days. The rules are as follows:
Each day the group will split into 4 groups of 4, and a group of 2.
Any given pair of people will only see each other at most once over the course of the 4 days.
Any given person will only be part of the size 2 group at most once.
A brute force recursive search for a valid set of group assignment is obviously impractical. I've thrown in some simple logic for pruning parts of the tree as soon as possible, but not enough to make it practical.
Actually, I'm starting to suspect that it might be impossible to follow all the rules, but I can't come up with a combinatorial argument for why that would be.
Any thoughts?
16 friends can be scheduled 4x4 for 4 nights using two mutually orthogonal latin squares of order 4. Assign each friend to a distinct position in the 4x4 grid. On the first night, group by row. On the second, group by column. On the third, group by similar entry in latin square #1 (card rank in the 4x4 example). On the fourth, group by similar entry in latin square #2 (card suit in the 4x4 example). Actually, the affine plane construction gives rise to three mutually orthogonal latin squares, so a fifth night could be scheduled, ensuring that each pair of friends meets exactly once.
Perhaps the schedule for 16 could be extended, using the freedom of the unused fifth night.
EDIT: here's the schedule for 16 people over 5 nights. Each row is a night. Each column is a person. The entry is the group to which they're assigned.
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
[0, 1, 2, 3, 1, 0, 3, 2, 2, 3, 0, 1, 3, 2, 1, 0]
[0, 2, 3, 1, 1, 3, 2, 0, 2, 0, 1, 3, 3, 1, 0, 2]
[0, 3, 1, 2, 1, 2, 0, 3, 2, 1, 3, 0, 3, 0, 2, 1]

Riffling Cards in Mathematica

My friend posed this question to me; felt like sharing it here.
Given a deck of cards, we split it into 2 groups, and "interleave them"; let us call this operation a 'split-join'. And repeat the same operation on the resulting deck.
E.g., { 1, 2, 3, 4 } becomes { 1, 2 } & { 3, 4 } (split) and we get { 1, 3, 2, 4 } (join)
Also, if we have an odd number of cards i.e., { 1, 2, 3 } we can split it like { 1, 2 } & { 3 } (bigger-half first) leading to { 1, 3, 2 }
(i.e., n is split up as Ceil[n/2] & n-Ceil[n/2])
The question my friend asked me was:
HOW many such split-joins are needed to get the original deck back?
And that got me wondering:
If the deck has n cards, what is the number of split-joins needed if:
n is even ?
n is odd ?
n is a power of '2' ? [I found that we then need log (n) (base 2) number of split-joins...]
(Feel free to explore different scenarios like that.)
Is there a simple pattern/formula/concept correlating n and the number of split-joins required?
I believe, this is a good thing to explore in Mathematica, especially, since it provides the Riffle[] method.
To quote MathWorld:
The numbers of out-shuffles needed to return a deck of n=2, 4, ... to its original order are 1, 2, 4, 3, 6, 10, 12, 4, 8, 18, 6, 11, ... (Sloane's A002326), which is simply the multiplicative order of 2 (mod n-1). For example, a deck of 52 cards therefore is returned to its original state after eight out-shuffles, since 2**8=1 (mod 51) (Golomb 1961). The smallest numbers of cards 2n that require 1, 2, 3, ... out-shuffles to return to the deck's original state are 1, 2, 4, 3, 16, 5, 64, 9, 37, 6, ... (Sloane's A114894).
The case when n is odd isn't addressed.
Note that the article also includes a Mathematica notebook with functions to explore out-shuffles.
If we have an odd number of cards n==2m-1, and if we split the cards such that during each shuffle the first group contains m cards, the second group m-1 cards, and the groups are joined such that no two cards of the same group end up next to each other, then the number of shuffles needed is equal to MultiplicativeOrder[2, n].
To show this, we note that after one shuffle the card which was at position k has moved to position 2k for 0<=k<m and to 2k-2m+1 for m<=k<2m-1, where k is such that 0<=k<2m-1. Written modulo n==2m-1 this means that the new position is Mod[2k, n] for all 0<=k<n. Therefore, for each card to return to its original position we need N shuffles where N is such that Mod[2^N k, n]==Mod[k, n] for all 0<=k<n from which is follows that N is any multiple of MultiplicativeOrder[2, n].
Note that due to symmetry the result would have been exactly the same if we had split the deck the other way around, i.e. the first group always contains m-1 cards and the second group m cards. I don't know what would happen if you alternate, i.e. for odd shuffles the first group contains m cards, and for even shuffles m-1 cards.
There's old work by magician/mathematician Persi Diaconnis about restoring the order with perfect riffle shuffles. Ian Stewart wrote about that work in one of his 1998 Scientific American Mathematical Recreation columns -- see, e.g.: http://www.whydomath.org/Reading_Room_Material/ian_stewart/shuffle/shuffle.html
old question I know, but strange no one put up an actual mathematica solution..
countrifflecards[deck_] := Module[{n = Length#deck, ct, rifdeck},
ct = 0;
rifdeck =
Riffle ##
Partition[ # , Ceiling[ n/2], Ceiling[ n/2], {1, 1}, {} ] &;
NestWhile[(++ct; rifdeck[#]) &, deck, #2 != deck &,2 ]; ct]
This handles even and odd cases:
countrifflecards[RandomSample[ Range[#], #]] & /# Range[2, 52, 2]
{1, 2, 4, 3, 6, 10, 12, 4, 8, 18, 6, 11, 20, 18, 28, 5, 10, 12, 36,
12, 20, 14, 12, 23, 21, 8}
countrifflecards[RandomSample[ Range[#], #]] & /# Range[3, 53, 2]
{2, 4, 3, 6, 10, 12, 4, 8, 18, 6, 11, 20, 18, 28, 5, 10, 12, 36, 12,
20, 14, 12, 23, 21, 8, 52}
You can readily show if you add a card to the odd-case the extra card will stay on the bottom and not change the sequence, hence the odd case result is just the n+1 even result..
ListPlot[{#, countrifflecards[RandomSample[ Range[#], #]]} & /#
Range[2, 1000]]

Resources