Setting With Enlargement in daru - ruby

There is some way to Setting With Enlargement in daru? Something similar to pandas with loc.

Yes you can.
For Daru::Vector objects use the #push method like so:
require 'daru'
v = Daru::Vector.new([1,2,3], index: [:a,:b,:c])
v.push(23, :r)
v
#=>
#<Daru::Vector:74005360 #name = nil #size = 4 >
# nil
# a 1
# b 2
# c 3
# r 23
For setting a new vector in Daru::DataFrame, call the #[]= method with your new name inside the []. You can either assign a Daru::Vector or an Array.
If you assign Daru::Vector, the data will be aligned so that the indexes of the DataFrame and Vector match.
For example,
require 'daru'
df = Daru::DataFrame.new({a: [1,2,3], b: [5,6,7]})
df[:r] = [11,22,33]
df
# =>
#<Daru::DataFrame:73956870 #name = c8a65ffe-217d-43bb-b6f8-50d2530ec053 #size = 3>
# a b r
# 0 1 5 11
# 1 2 6 22
# 2 3 7 33
You assign a row with the DataFrame#row[]= method. For example, using the previous dataframe df:
df.row[:a] = [23,35,2]
df
#=>
#<Daru::DataFrame:73956870 #name = c8a65ffe-217d-43bb-b6f8-50d2530ec053 #size = 4>
# a b r
# 0 1 5 11
# 1 2 6 22
# 2 3 7 33
# a 23 35 2
Assigning a Daru::Vector will align according to the names of the vectors of the Daru::DataFrame.
You can see further details in these notebooks.
Hope this answers your question.

Related

corpus extraction with changing data type R

i have a corpus of text files, contains just text, I want to extract the ngrams from the texts and save each one with his original file name in matrixes of 3 columns..
library(tokenizer)
myTokenizer <- function(x, n, n_min) {
corp<-"this is a full text "
tok <- unlist(tokenize_ngrams(as.character(x), n = n, n_min = n_min))
M <- matrix(nrow=length(tok), ncol=3,
dimnames=list(NULL, c( "gram" , "num.words", "words")))
}
corp <- tm_map(corp,content_transformer(function (x) myTokenizer(x, n=3, n_min=1)))
writecorpus(corp)
Since I don't have your corpus I created one of my own using the crude dataset from tm. No need to use tm_map as that keeps the data in a corpus format. The tokenizer package can handle this.
What I do is store all your desired matrices in a list object via lapply and then use sapply to store the data in the crude directory as separate files.
Do realize that the matrices as specified in your function will be character matrices. This means that columns 1 and 2 will be characters, not numbers.
library(tm)
data("crude")
crude <- as.VCorpus(crude)
myTokenizer <- function(x, n, n_min) {
tok <- unlist(tokenizers::tokenize_ngrams(as.character(x), n = n, n_min = n_min))
M <- matrix(nrow=length(tok), ncol=3,
dimnames=list(NULL, c( "gram" , "num.words", "words")))
M[, 3] <- tok
M[, 2] <- lengths(strsplit(M[, 3], "\\W+")) # counts the words
M[, 1] <- 1:length(tok)
return(M)
}
my_matrices <- lapply(crude, myTokenizer, n = 3, n_min = 1)
# make sure directory crude exists as a subfolder in working directory
sapply(names(my_matrices),
function (x) write.table(my_matrices[[x]], file=paste("crude/", x, ".txt", sep=""), row.names = FALSE))
outcome of the first file:
"gram" "num.words" "words"
"1" "1" "diamond"
"2" "2" "diamond shamrock"
"3" "3" "diamond shamrock corp"
"4" "1" "shamrock"
"5" "2" "shamrock corp"
"6" "3" "shamrock corp said"
I would recommend to create a document term matrix (DTM). You will probably need this in your downstream tasks anyway. From that you could also extract the information you want, although, it is probably not reasonable to assume that a term (incl. ngrams) only has a single document where its coming from (at least this is what I understood from your question, please correct me if I am wrong). Therefore, I guess that in practice one term will have several documents associated with it - this kind of information is usually stored in a DTM.
An example with text2vec below. If you could elaborate further how you want to use your terms, etc. I could adapt the code according to your needs.
library(text2vec)
# I have set up two text do not overlap in any term just as an example
# in practice, this probably never happens
docs = c(d1 = c("here a text"), d2 = c("and another one"))
it = itoken(docs, tokenizer = word_tokenizer, progressbar = F)
v = create_vocabulary(it, ngram = c(1,3))
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer)
as.matrix(dtm)
# a a_text and and_another and_another_one another another_one here here_a here_a_text one text
# d1 1 1 0 0 0 0 0 1 1 1 0 1
# d2 0 0 1 1 1 1 1 0 0 0 1 0
library(stringi)
docs = c(d1 = c("here a text"), d2 = c("and another one"))
it = itoken(docs, tokenizer = word_tokenizer, progressbar = F)
v = create_vocabulary(it, ngram = c(1,3))
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer)
for (d in rownames(dtm)) {
v = dtm[d, ]
v = v[v!=0]
v = data.frame(number = 1:length(v)
,term = names(v))
v$n = stri_count_fixed(v$term, "_")+1
write.csv(v, file = paste0("v_", d, ".csv"), row.names = F)
}
read.csv("v_d1.csv")
# number term n
# 1 1 a 1
# 2 2 a_text 2
# 3 3 here 1
# 4 4 here_a 2
# 5 5 here_a_text 3
# 6 6 text 1
read.csv("v_d2.csv")
# number term n
# 1 1 and 1
# 2 2 and_another 2
# 3 3 and_another_one 3
# 4 4 another 1
# 5 5 another_one 2
# 6 6 one 1

Elixir: Return value from for loop

I have a requirement for a for loop in Elixir that returns a calculated value.
Here is my simple example:
a = 0
for i <- 1..10
do
a = a + 1
IO.inspect a
end
IO.inspect a
Here is the output:
warning: variable i is unused
Untitled 15:2
2
2
2
2
2
2
2
2
2
2
1
I know that i is unused and can be used in place of a in this example, but that's not the question. The question is how do you get the for loop to return the variable a = 10?
You cannot do it this way as variables in Elixir are immutable. What your code really does is create a new a inside the for on every iteration, and does not modify the outer a at all, so the outer a remains 1, while the inner one is always 2. For this pattern of initial value + updating the value for each iteration of an enumerable, you can use Enum.reduce/3:
# This code does exactly what your code would have done in a language with mutable variables.
# a is 0 initially
a = Enum.reduce 1..10, 0, fn i, a ->
new_a = a + 1
IO.inspect new_a
# we set a to new_a, which is a + 1 on every iteration
new_a
end
# a here is the final value of a
IO.inspect a
Output:
1
2
3
4
5
6
7
8
9
10
10

Print numbers in a range

I am trying to print all numbers between 1 and 50, using the following code:
[1..50].each{|n| puts n}
but the console print
[1..50]
I want to print something like this
1
2
3
4
...
50
Try the following code:
(1..50).each { |n| puts n }
The problem is that you're using [] delimiter instead of () one.
You can use [1..10] with a minor tweak:
[*1..10].each{ |i| p i }
outputs:
1
2
3
4
5
6
7
8
9
10
The * (AKA "splat") "explodes" the range into its components, which are then used to populate the array. It's similar to writing (1..10).to_a.
You can also do:
puts [*1..10]
to print the same thing.
So, try:
[*1..10].join(' ') # => "1 2 3 4 5 6 7 8 9 10"
or:
[*1..10] * ' ' # => "1 2 3 4 5 6 7 8 9 10"
To get the output you want.
The error here is that you are creating an Array object with a range as its only element.
> [1..10].size
=> 1
If you want to call methods like each on a range, you have to wrap the range in parentheses to avoid the method being called on the range's last element rather than on the range itself.
=> (1..10).each { |i| print i }
12345678910
Other ways to achieve the same:
(1..50).each { |n| print n }
1.up_to(50) { |n| print n }
50.times { |n| print n }
You can cast your range (in parentheses) to an array ([1 2 3 4 5 6... 48 49 50]) and join each item (e.g. with ' ' if you want all items in one line).
puts (1..50).to_a.join(' ')
# => 1 2 3 4 5 6 7 ... 48 49 50

trying to create a matrix in ruby

i have a file called terain.dat which contains this matrix:
10
1 1 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
i want to read in the file and use the first number on the first line as the size of the matrix (which is 10 X 10 in this case). And then fill the 10 X 10 matrix with the numbers below.
this is what i have so far:
class Terrain
def initialize file_name
#input = IO.readlines(file_name) #read in file # reads in the file with the terrain detials
#matrix_size = #input[0].to_i # changes the first index to an int (so i can make a10X10 matrix)
#land = Matrix.[#matrix_size, #matrix_size] # will this make a 10 X 10 matrix??
end
end
i was wondering if this will make a 10X10 matrix and how do i fill it??
I'd write:
terrain = open("terrain.data") do |file|
size = file.lines.first.to_i
rows = file.lines.first(size).map { |line| line.split.map(&:to_i) }
Matrix.rows(rows)
end
actually no. The Matrix.[] is used for setting the values of a row.
So Matrix.[10,10] would create a Matrix with 2 rows, and in each column a 10.
What you are searching for is Matrix.build(row_size, column_size) where column_size defaults to row_size. This gives you an enumerator which you can use to set the values. (or you just pass a block to Matrix.build
I'd suggest a different approach:
arr = []
#input.each_index do |index|
arr[index] = #input[index].split ' '
end
#land = Matrix.build(10,10) do |row, column|
arr[row][column].to_i
end
You could skip over the first line, read the other lines, chomp them to remove the new lines and then split on white space. This will give you an array of arrays, which you can feed to Matrix.rows.
No need to declare the size. Try the following:
class Terrain
attr_accessor :m
def initialize file_name
data = IO.readlines(file_name)
data.each_line do |l|
data << l.split.map {|e| e.to_i}
end
#m = Matrix[*#data]
end
end
Or, even better:
class Terrain
attr_accessor :m
def initialize file_name
File.open(file_name).each do |l|
data << l.split.map {|e| e.to_i}
end
#m = Matrix[*#data]
end
end
No need for the size:
class Terrain
def initialize(file_name)
File.open(file_name) do |f|
#m = Matrix[*f.lines.map { |l| l.split.map(&:to_i) }]
end
end
end

Ruby Array - highest integer

brand new to Ruby, and love it. Just playing around with the below code:
public
def highest
highest_number = 0
each do |number|
number = number.to_i
highest_number = number if number > highest_number
puts highest_number
end
end
array = %w{1 2 4 5 3 8 22 929 1000 2}
array.highest
So at the moment the response I get is:
1
2
4
5
5
8
22
929
1000
1000
So it puts the array first, then the highest number from the array as well. However all I want it to is put the highest number only...
I have played around with this and can't figure it out! Sorry for such a newbie question
The problem is that you have the puts statement inside the each loop, so during every iteration it prints out what the highest number currently is. Try moving it outside the each loop so that you have this:
public
def highest
highest_number = 0
each do |number|
number = number.to_i
highest_number = number if number > highest_number
end
puts highest_number
end
array = %w{1 2 4 5 3 8 22 929 1000 2}
array.highest
Which produces the desired output:
1000
You could also save yourself some trouble by using max_by:
>> a = %w{1 2 4 5 3 8 22 929 1000 2}
=> ["1", "2", "4", "5", "3", "8", "22", "929", "1000", "2"]
>> m = a.max_by { |e| e.to_i }
=> "1000"
You could also use another version of max_by:
m = a.max_by(&:to_i)
to avoid the extra noise of the "block that just calls a method".
But this is probably a Ruby blocks learning exercise for you so using existing parts of the standard libraries doesn't count. OTOH, it is good to know what's in the standard libraries so punting to max_by or max would also count as a learning exercise.
You can do this instead and avoid the highest_number variable.
array = %w{1 2 4 5 3 8 22 929 1000 2}
class Array
def highest
collect { |x| x.to_i }. \
sort. \
last.to_i
end
end
array.highest # 1000
The collect { |x| x.to_i } can also be written as collect(&:to_i) in this case.

Resources