Convert a String Containing an Array to an Array in Ruby - ruby

I have a string that contains an array that i would like to convert into an array. How would you do this?
I want to convert this:
myvar=
"[[Date.UTC(2010, 0, 23),0],[Date.UTC(2010, 0, 24),0],[Date.UTC(2010, 0, 25),3],[Date.UTC(2010, 0, 26),0],[Date.UTC(2010, 0, 27),0],[Date.UTC(2010, 0, 28),0],[Date.UTC(2010, 0, 29),0],[Date.UTC(2010, 0, 30),0],[Date.UTC(2010, 0, 31),0],[Date.UTC(2010, 1, 01),0],[Date.UTC(2010, 1, 02),0],[Date.UTC(2010, 1, 03),1],[Date.UTC(2010, 1, 04),2],[Date.UTC(2010, 1, 05),0],[Date.UTC(2010, 1, 06),0],[Date.UTC(2010, 1, 07),0],[Date.UTC(2010, 1, 08),0],[Date.UTC(2010, 1, 09),0],[Date.UTC(2010, 1, 10),0],[Date.UTC(2010, 1, 11),0],[Date.UTC(2010, 1, 12),0],[Date.UTC(2010, 1, 13),0],[Date.UTC(2010, 1, 14),0],[Date.UTC(2010, 1, 15),0],[Date.UTC(2010, 1, 16),0],[Date.UTC(2010, 1, 17),0],[Date.UTC(2010, 1, 18),0],[Date.UTC(2010, 1, 19),0],[Date.UTC(2010, 1, 20),0],[Date.UTC(2010, 1, 21),0]]"
myvar.class
>>string
Into This:
myvar =
[[Date.UTC(2010, 0, 23),0],[Date.UTC(2010, 0, 24),0],[Date.UTC(2010, 0, 25),3],[Date.UTC(2010, 0, 26),0],[Date.UTC(2010, 0, 27),0],[Date.UTC(2010, 0, 28),0],[Date.UTC(2010, 0, 29),0],[Date.UTC(2010, 0, 30),0],[Date.UTC(2010, 0, 31),0],[Date.UTC(2010, 1, 01),0],[Date.UTC(2010, 1, 02),0],[Date.UTC(2010, 1, 03),1],[Date.UTC(2010, 1, 04),2],[Date.UTC(2010, 1, 05),0],[Date.UTC(2010, 1, 06),0],[Date.UTC(2010, 1, 07),0],[Date.UTC(2010, 1, 08),0],[Date.UTC(2010, 1, 09),0],[Date.UTC(2010, 1, 10),0],[Date.UTC(2010, 1, 11),0],[Date.UTC(2010, 1, 12),0],[Date.UTC(2010, 1, 13),0],[Date.UTC(2010, 1, 14),0],[Date.UTC(2010, 1, 15),0],[Date.UTC(2010, 1, 16),0],[Date.UTC(2010, 1, 17),0],[Date.UTC(2010, 1, 18),0],[Date.UTC(2010, 1, 19),0],[Date.UTC(2010, 1, 20),0],[Date.UTC(2010, 1, 21),0]]
myvar.class
>>Array

While the obvious answer involves eval, this is dangerous. I would instead recommend parsing it. Since this is quite a well defined data format (it seems), you can use this:
myvar.scan(/\d+/).map(&:to_i).each_slice(4).map{|*x,y| [Date.UTC(*x), y]}
this will
pull out all the digits
convert them to integers
separate them into groups of four
apply the first three of each group to Date.UTC as the first through third arguments
pair each date with its corresponding y
create an array containing all of these pairs.
I don't have a Date.UTC method, but I assume you have some custom method called that.

try eval command
x = eval("[\"foo\",\"bar\",\"land\"]")
=> ["foo", "bar", "land"]
x
=> ["foo", "bar", "land"]
but eval is danger be care full when use it.

Related

Why is positional encoding needed while input ids already represent the order of words in Bert?

For example, in Huggingface's example:
encoded_input = tokenizer("Do not meddle in the affairs of wizards, for they are subtle and quick to anger.")
print(encoded_input)
{'input_ids': [101, 2079, 2025, 19960, 10362, 1999, 1996, 3821, 1997, 16657, 1010, 2005, 2027, 2024, 11259, 1998, 4248, 2000, 4963, 1012, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
The input_ids vector already encode the order of each token in the original sentence. Why does it need positional encoding again with an extra vector to represent it?
The reason is the design of the neural architecture. BERT consists of self-attention and feedforward sub-layers, and neither of them is sequential.
The feedforward layers process each token independently of others.
The self-attention views the input states as an unordered set of states. Attention can be interpreted as soft probabilistic retrieval from a set of values according to some keys. The position embeddings are there so the keys can contain information about their relative order.

How I can find the next value? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Given an array of 0 and 1, e.g. array[] = {0, 1, 0, 0, 0, 1, ...}, how I can predict what the next value will be with the best possible accuracy?
What kind of methods are best suited for this kind of task?
The prediction method would depend on the interpretation of data.
However, it looks like in this particular case we can make some general assumptions that might justify use of certain machine learning techniques.
Values are generated one after another in chronological order
Values depend on some (possibly non-observable) external state. If the state repeats itself, so do the values.
This is a pretty common scenario in many machine learning contexts. One example is the prediction of stock prices based on history.
Now, to build the predictive model you'll need to define the training data set. Assume our model looks at the last k values. In case if k=1, we might end up with something similar to a Markov chain model.
Our training data set will consist of k-dimensional data points together with their respective dependent values. For example, suppose k=3 and we have the following input data
0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1...
We'll have the following training data:
(0,0,1) -> 1
(0,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 1
(1,0,1) -> 1
(0,1,1) -> 1
(1,1,1) -> 1
(1,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 0
(1,0,0) -> 1
Now, let's say you want to predict the next value in the sequence. The last 3 values are 0,0,1, so the model must predict the value of the function at (0,0,1), based on the training data.
A popular and relatively simple approach would be to use a multivariate linear regression on a k-dimensional data space. Alternatively, consider using a neural network if linear regression underfits the training data set.
You might need to try out different values of k and test against your validation set.
You could use a maximum likelihood estimator for the Bernoulli distribution. In essence you would:
look at all observed values and estimate parameter p
then use p to determine the next value
In Python this could look like this:
#!/usr/bin/env python
from __future__ import division
signal = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
def maximum_likelihood(s, last=None):
"""
The maximum likelihood estimator selects the parameter value which gives
the observed data the largest possible probability.
http://mathworld.wolfram.com/MaximumLikelihood.html
If `last` is given, only use the last `n` values.
"""
if not last:
return sum(s) / len(s)
return sum(s[:-last]) / last
if __name__ == '__main__':
hits = []
print('p\tpredicted\tcorrect\tsignal')
print('-\t---------\t-------\t------')
for i in range(1, len(signal) - 1):
p = maximum_likelihood(signal[:i]) # p = maximum_likelihood(signal[:i], last=2)
prediction = int(p >= 0.5)
hits.append(prediction == signal[i])
print('%0.3f\t%s\t\t%s\t%s' % (
p, prediction, prediction == signal[i], signal[:i]))
print('accuracy: %0.3f' % (sum(hits) / len(hits)))
The output would like this:
# p predicted correct signal
# - --------- ------- ------
# 1.000 1 False [1]
# 0.500 1 True [1, 0]
# 0.667 1 True [1, 0, 1]
# 0.750 1 False [1, 0, 1, 1]
# 0.600 1 False [1, 0, 1, 1, 0]
# 0.500 1 True [1, 0, 1, 1, 0, 0]
# 0.571 1 False [1, 0, 1, 1, 0, 0, 1]
# 0.500 1 True [1, 0, 1, 1, 0, 0, 1, 0]
# 0.556 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1]
# 0.600 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
# 0.545 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# 0.583 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
# 0.615 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
# 0.643 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1]
# 0.667 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1]
# 0.688 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1]
# 0.647 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
# 0.667 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
# 0.632 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0]
# 0.650 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1]
# accuracy: 0.650
You could vary the window size for performance reasons or to favor recent events.
In above example, if we would estimate the the next value by looking only at the last 3 observed values, we could increase our accuracy to 0.7.
Update: Inspired by Narek's answer I added a logistic regression classifier example to the gist.
You can predict by calculating the probabilities of 0s and 1s and make their probability ranges and then draw a random number between 0 and 1 to predict.....
If these are series of numbers that are generated each time after some reset event, and next numbers are somehow related to previous ones, you could create a tree (binary tree with two branches at each node in your case) and feed in such historical series from the root, adjusting weights (say a count) on each branch you follow.
Could divide such counts by the number of series you entered before using them, or keep a number on each node too, increased before choosing a branch. That way root node contains number of series entered.
Then, as you feed it a new sequence you can see which branch is "hotter" (would make nice visualization as heatmap/tree btw) to follow, especially if sequence is long enough. That is, assuming order of items in sequence plays a role in what comes next.

Special Sorting Algorithm

I 'am developing a technique for sorting a table that contains either 0 or 1 such as:
{{1, 1, 0, 1, 1, 1, 1, 1},
{1, 1, 0, 0, 0, 0, 1, 0},
{1, 1, 1, 1, 1, 1, 1, 0},
{1, 1, 1, 1, 1, 1, 1, 0},
{1, 1, 1, 0, 0, 0, 1, 0},
{1, 1, 1, 1, 1, 1, 1, 0},
{0, 0, 0, 0, 0, 1, 0, 1},
{1, 1, 1, 1, 1, 0, 0, 0},
{1, 1, 1, 1, 1, 1, 0, 1},
{0, 0, 0, 1, 0, 1, 0, 1},
{1, 1, 1, 1, 1, 0, 0, 0},
{1, 1, 1, 1, 1, 0, 0, 0}}
The objective is to count the total per column and sort the table:
I. Descending based on the total per column.
II. coverage. For instance, in the 1st row the 3rd value is 0. We'll have to find the 1st column that has 1 in the 3rd column and re-sort the columns. In other words, 1 stands for coverage and we have to make sure that we cover all within the 1st few columns.
I managed to get the total per column, as follows:
For (i=0; i<m; i++)
For (j=0; j< TS.Size(); j++)
if (tc.detected()==1)
TS_Detect[j][i]= 1
else
TS_Detect[j][i]= 0
TC_Sum=(2, TS.Size())
For (k=0; k<TS.Size(); k++)
TC_Sum(0, k)=k
For (l=0; l< m; l++)
Flag=TS_Detect[l][k]
If (flag == 1)
TC_Sum(1, k)= TC_Sum(1, k)+1
int temp
For (g=0; g<TC_Sum.length-1; g++)
For (b=1; b< TC_Sum.length-1; b++)
If (TC_Sum[b-1]< TC_Sum[b])
temp= TC_Sum[b-1]
TC_Sum[b-1]= TC_Sum[b]
TC_Sum[b]= temp
return TC_Sum
The problem now is that I couldn't sort the original array (TC_Detect) based on the column number from TC_Sum.
Consequently, I would like to re-sort the table so if a column has 0, the next one will be 1.
The expected output for the above example will look like:
{{1, 1, 0, 1, 1, 1, 1, 1},
{1, 1, 1, 1, 1, 1, 1, 0},
{1, 1, 0, 0, 0, 0, 1, 0},
{1, 1, 1, 1, 1, 1, 1, 0},
{0, 0, 0, 0, 0, 1, 0, 1},
{1, 1, 1, 0, 0, 0, 1, 0},
{1, 1, 1, 1, 1, 1, 1, 0},
{0, 0, 0, 1, 0, 1, 0, 1},
{1, 1, 1, 1, 1, 0, 0, 0},
{1, 1, 1, 1, 1, 1, 0, 1},
{1, 1, 1, 1, 1, 0, 0, 0},
{1, 1, 1, 1, 1, 0, 0, 0}}
Any suggestion, please.
I'm not sure what language you are using, but I think my answer is general enough.
I assume that you have a list of lists, let's call it A.
A = [ [0,1,0,0] , [1,0,1,1] , [0,0,0,0] ]
You've used your counting algorithm above to make another list, call it S for sum.
S = [ 3 , 1 , 0 ]
You now want to sort A based on the values of S.
To make things easy, let's define a third list that we'll call I for index.
I = [ 0 , 1 , 2 ]
I would continue up to 3,4,5,6,... depending on the number of elements in your list
What you need now is a sort function that allows you to sort based on a key. Such a sort function usually takes the thing you want to sort along with a function for comparing two items.
In this case, sort I. The sort function is then passed indices. Compare these indices based on the values in S. The result is a list I* containing indices sorted according to S. You can now reorder A based on I*.
I am not sure what language you are using, but the following Python code accomplishes this:
def MyComparison(i,j):
return S[j]-S[i]
A = [ [0,1,0,0] , [1,0,1,1], [0,0,0,0] ]
S = [ 1 , 3 , 0 ]
I = [ 0 , 1 , 2 ]
Istar = sorted(I, cmp=MyComparison)
#The above returns: [2, 0, 1]. If this is the wrong order, reverse the result.
[A[x] for x in Istar]
#The above returns: [[1, 0, 1, 1], [0, 1, 0, 0], [0, 0, 0, 0]]
Note that the comparison function returns -1, 0, or 1 depending on the relative ranking of the items compared.

Understand disaster model in PyMC

I start learning PyMC and strungle to understand the very first tutorial´s example.
disasters_array = \
np.array([ 4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
switchpoint = DiscreteUniform('switchpoint', lower=0, upper=110, doc='Switchpoint[year]')
early_mean = Exponential('early_mean', beta=1.)
late_mean = Exponential('late_mean', beta=1.)
I don´t understand why early_mean and late_mean is modeled as stochastic variable following exponential distribution with rate = 1. My intuition is that they should be deterministic calculated using disasters_array and switchpoint variable e.g.
#deterministic(plot=False)
def early_mean(s=switchpoint):
return sum(disasters_array[:(s-1)])/(s-1)
#deterministic(plot=False)
def late_mean(s=switchpoint):
return sum(disasters_array[s:])/s
disasters_array are the data generated by a Poisson process, under the assumptions of this model. late_mean and early_mean are the parameters associated with this process, depending on when in the time series they occurred. The true values of the parameters are unknown, so they are specified as stochastic variables. Deterministic objects are only for nodes that are completely determined by the values of their parents.
Think of early_mean and late_mean stochastics as model parameters, and the Exponential as the prior distribution for these parameters. In the version of the model here, the deterministic r and likelihood D lead to posteriors on early_mean and late_mean through MCMC sampling.

Ruby - Graph adjacency matrix into variable

I am trying to edit an algorithm found here.
I want the adjacency matrix to be loaded from file (formatting of the file doesn't matter to me, it can be either like this [0,1,1,0] or just 0110) with G = file.read().split("\n")
However, I get an error no implicit conversion of Fixnum into String (TypeError)
And I already know I need to convert this string to ints, but how to do it properly to not lose the formatting required by this DFS method?
I guess it's pretty easy, but I'm a begginer in Ruby (and graphs :v) and can't get it to work...
Edit:
So the code I'm using to read from file to an array of arrays is:
def read_array(file_path)
File.foreach(file_path).with_object([]) do |line, result|
result << line.split.map(&:to_i)
end
end
And the result I get from a file (for example)
01101010
01010101
01010110
10101011
01011111
is this:
=> [[[1101010], [1010101], [1010110], [10101011], [1011111]]]
What I need, however, is:
=> [[[1,1,0,1,0,1,0], [1,0,1,0,1,0,1], [1,0,1,0,1,1,0], [1,0,1,0,1,0,1,1], [1,0,1,1,1,1,1]]]
So that it would work with the algorithm mentioned in the first line of my post (I'll copy it here, if it takes too much place I can delete it and leave link only):
G = [0,1,1,0,0,1,1], # A
[1,0,0,0,0,0,0],
[1,0,0,0,0,0,0],
[0,0,0,0,1,1,0],
[0,0,0,1,0,1,1],
[1,0,0,1,1,0,0],
[1,0,0,0,1,0,0] # G
LABLES = %w(A B C D E F G)
def dfs(vertex)
print "#{LABLES[vertex]} " # visited
edge = 0
while edge < G.size
G[vertex][edge] = 0
edge += 1
end
edge = 0
while edge < G.size
if ( G[edge][vertex] != 0 && edge != vertex)
dfs(edge)
end
edge += 1
end
end
dfs(0)
split's default separator is a whitespace. To make it split every char you need to explicitly say it:
'01101101'.split.map(&:to_i)
# => [ 1101101 ]
'01101101'.split('').map(&:to_i)
# => [ 0, 1, 1, 0, 1, 1, 0, 1 ]
you can also use chars to do the same job:
'01101101'.chars.map(&:to_i)
# => [ 0, 1, 1, 0, 1, 1, 0, 1 ]
I don't know how your read_array is used, but it can be simplified to:
def read_array(file_path)
File.foreach(file_path).map do |line|
line.chomp.chars.map(&:to_i)
end
end
read_array('my_file.txt')
# => [[1, 1, 0, 1, 0, 1, 0], [1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 0, 1, 1], [1, 0, 1, 1, 1, 1, 1]]
If you still get the extra [, you can either take only the first item:
my_array[0]
Or (if there is more than one item the uber-array) - use flat_map:
uber_array = [[[1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 0, 1, 1]],
[[1, 0, 1, 0, 1, 0, 1, 1], [1, 0, 1, 1, 1, 1, 1]]]
uber_array.flat_map { |a| a }
# => [[1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 0, 1, 1], [1, 0, 1, 0, 1, 0, 1, 1], [1, 0, 1, 1, 1, 1, 1]]

Resources