I want to get a randomization of treatments with three levels and sample size n = 15. I'm stuck in where
volunteers <- 1:15
set.seed(1); sample(volunteers, size=5, replace=F)
I want three different groups, five each, but I'm new to R.
This is a data setup for ANOVA, not a specific question which gives particular data sets. Also I don't know what it means for set.seed
I think you are looking for something like that:
set.seed(1337)
# replace with you real participants ids
volunteers <- 1:15
# set the number of groups
number.of.groups <- 1:3
# set group size
group.size <- 5
# generate data frame with participant > group order
df <- data.frame(group=sort(rep(number.of.groups,group.size)),
participant=sample(volunteers,length(volunteers)))
# show your groups
df[which(df$group==1),]
# group participant
# 1 1 9
# 2 1 8
# 3 1 1
# 4 1 6
# 5 1 5
df[which(df$group==2),]
# group participant
# 6 2 4
# 7 2 15
# 8 2 3
# 9 2 2
# 10 2 13
df[which(df$group==3),]
# group participant
# 11 3 11
# 12 3 10
# 13 3 14
# 14 3 12
# 15 3 7
And you only need to use set.seed() if want to be able to replicate your samples since this method causes that you always draw the same "random" samples. Consequently, set.seed() is more for testing than for real analysis code. The seed you set is by the way irrelevant. If you want to replicate just make sure to always set the same seed.
How about:
install.packages("randomizr")
library(randomizr)
Z <- complete_ra(15, num_arms = 3)
table(Z)
This gives
> table(Z)
Z
T1 T2 T3
5 5 5
Related
i'm currently solving a problem that states:
A company filed for bankruptcy and decided to pay the employees with the last remaining valuable items in the company only if it can be distributed evenly among them so that all of them have at least received 1 item and that the difference between the employee carrying the most valuable items and the employee carrying the least valuable items can not exceed a certain value x;
Input:
First row contains number of employee;
Second row contains the x value so that the the difference between the employee carrying the most valuable items and the employee carrying the least valuable items can not exceed;
Third row contains all the items with their value;
Output:
First number is the least valuable basket of items value and the second is the most valuable basket;
Example:
Input:
5
4
2 5 3 11 4 3 1 15 7 8 10
Output:
13 15
Input:
5
4
1 1 1 11 1 3 1 2 7 8
Output:
NO (It's impossible to distribute evenly)
Input:
5
10
1 1 1 1
Output:
NO (It's impossible to distribute evenly)
My solution to resolve this problem taking the first input is to, sort the items in ascending or descending order so from
2 5 3 11 4 3 1 15 7 8 10 --> 1 2 3 3 4 5 7 8 10 11 15
then create an adjacency list or just store it in simple variables where we add the biggest number to the lowest basket while iterating the item values array
Element 0: 15
Element 1: 11 <- 3 (sum 14)
Element 2: 10 <- 3 (sum 13)
Element 3: 8 <- 4 <- 1 (sum 13)
Element 4: 7 <- 5 <- 2 (sum 14)
So that my solution will have O(nlogN + 2n), first part using merge sort and then finding max e min value, what do you guys think about this solution?
I have the following data:
a b c d
5 9 6 0
3 1 3 2
Characters in the first row, numbers in the second row.
How do I get the character corresponding to the highest number in the second row, and how do I increase the corresponding number in the second row? (For example, here, column b has the highest number, 9, so increase that number by 10%.)
I use Dyalog version 17.1.
With:
⎕←data←3 4⍴'a' 'b' 'c' 'd' 5 9 6 0 3 1 3 2
a b c d
5 9 6 0
3 1 3 2
You can extract the second row with:
2⌷data
5 9 6 0
Now grade it descending, that is, find the indices that would sort it from highest to lowest:
⍒2⌷data
2 3 1 4
The first number is the column we're looking for:
⊃⍒2⌷data
2
Now we can use this to extract the character from the first row:
data[⊂1,⊃⍒2⌷data]
b
But we only need the column index, not the actual character. The full index of the number we want to increase is:
2,⊃⍒2⌷data
2 2
Extracting the data to see that we got the right index:
data[⊂2,⊃⍒2⌷data]
9
Now we can either create a new array with the target value increased by 10%:
1.1×#(⊂2,⊃⍒2⌷data)⊢data
a b c d
5 9.9 6 0
3 1 3 2
Or change it in-place:
data[⊂2,⊃⍒2⌷data]×←1.1
data
a b c d
5 9.9 6 0
3 1 3 2
Try it online!
Hopefully the below makes sense.
I have a data set a large number of variables (row). Within each variable are sections that are scored 1-15. I need to subset the dataframe based on the three highest scoring sections for each variable. Each section has additional data associated with it that would be needed, but is not required as part of the selection.
Having trouble with this. Any help is appreciated.
Dummy layout below
Variable Aux_score
1 1
1 6
1 3
1 8
1 10
2 3
2 2
2 12
2 10
2 11
3 7
3 2
3 9
3 8
3 12
You can do it like this with base r:
do.call(rbind, lapply(split(df, df$Variable), function(df) df[ tail(order(df$Aux_score), 3), ]))
Or like this with tidyverse:
df %>% group_by(Variable) %>% top_n(3, Aux_score) %>% ungroup()
In Pandas 0.19 I have a large dataframe with a Multiindex of the following form
C0 C1 C2
A B
bar one 4 2 4
two 1 3 2
foo one 9 7 1
two 2 1 3
I want to sort bar and foo (and many more double lines as them) according to "two" to get the following:
C0 C1 C2
A B
bar one 4 4 2
two 1 2 3
foo one 7 9 1
two 1 2 3
I am interested in speed (as I have many columns and many pairs of rows). I am also happy with re-arranging the data if it speeds up the sorting. Many thanks
Here is a mostly numpy solution that should yield good performance. It first selects only the 'two' rows and argsorts them. It then sets this order for each row of the original dataframe. It then unravels this order (after adding a constant to offset each row) and the original dataframe values. It then reorders all the original values based on this unraveled, offset and argsorted array before creating a new dataframe with the intended sort order.
rows, cols = df.shape
df_a = np.argsort(df.xs('two', level=1))
order = df_a.reindex(df.index.droplevel(-1)).values
offset = np.arange(len(df)) * cols
order_final = order + offset[:, np.newaxis]
pd.DataFrame(df.values.ravel()[order_final.ravel()].reshape(rows, cols), index=df.index, columns=df.columns)
Output
C0 C1 C2
A B
bar one 4 4 2
two 1 2 3
foo one 7 9 1
two 1 2 3
Some Speed tests
# create much larger frame
import string
idx = pd.MultiIndex.from_product((list(string.ascii_letters), list(string.ascii_letters) + ['two']))
df1 = pd.DataFrame(index=idx, data=np.random.rand(len(idx), 3), columns=['C0', 'C1', 'C2'])
#scott boston
%timeit df1.groupby(level=0).apply(sortit)
10 loops, best of 3: 199 ms per loop
#Ted
1000 loops, best of 3: 5 ms per loop
Here is a solution, albeit klugdy:
Input dataframe:
C0 C1 C2
A B
bar one 4 2 4
two 1 3 2
foo one 9 7 1
two 2 1 3
Custom sorting function:
def sortit(x):
xcolumns = x.columns.values
x.index = x.index.droplevel()
x.sort_values(by='two',axis=1,inplace=True)
x.columns = xcolumns
return x
df.groupby(level=0).apply(sortit)
Output:
C0 C1 C2
A B
bar one 4 4 2
two 1 2 3
foo one 7 9 1
two 1 2 3
Im trying to fit a swimming pool onto this piece of terrain. The terrain is the first index (10x10 in this case) and the last index is the size the pool will be(2x2).
ive figured out how to read in the terrain and get the mean and standard deviation of it but now i need to find the lowest average height. I know i need to use a while loop but I dont know how to go about this can anyone help me ?
10
1 1 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
21
Here are two answers showing different styles. The first is faster (only important for HUGE terrain sizes), but less "Ruby-esque"; the second is more functional, but creates extra intermediary data. For your own best education, I encourage you to ensure that you understand these thoroughly, and choose how to proceed in a way that is best for you.
Also, I've assumed that the 21 you have in your question is a mistake, and you meant to have a 2 there.
First, both solutions start with the same code that creates an array of arrays for the terrain:
# Load the text file as an array of strings
lines = IO.readlines('pool.txt')
# Turn it into an array of arrays of numbers
terrain = lines.map{ |s| s.scan(/\d+/).map(&:to_i) }
# Throw out the silly grid size; we'll infer it from real data instead!
terrain.shift
# Take the last line (pool size) out of the terrain
pool_size = terrain.pop.first
The first solution walks through the terrain and calculates the average for each sub-grid, keeping track of the lowest number:
# For fun, we'll allow terrain that doesn't have to be square
rows = terrain.length
cols = terrain.first.length
best_size = Float::INFINITY
0.upto(rows-pool_size-1) do |y|
0.upto(cols-pool_size-1) do |x|
# x,y is the upper left corner of a valid pool_size × pool_size grid
average = 0.0
0.upto(pool_size-1) do |m|
0.upto(pool_size-1) do |n|
# Add up each point in the sub-grid
average += terrain[y+n][x+m]
end
end
# The number of points we added is the square of the size
average /= (pool_size*pool_size)
# Mark this as the best seen so far
best_size = average if average < best_size
end
end
p best_size
#=> 1.25
The second solution finds all the sub-grids, and then uses the Enumerable#min_by method to find the best. We also create a method for calculating the average on an array of numbers, just for fun and more self-describing code:
# See http://ruby-doc.org/stdlib-1.9.3/libdoc/matrix/rdoc/Matrix.html
require 'matrix'
class Matrix
# Average all values in the array (as a float)
def average
parts = to_a.flatten
parts.inject(:+) / parts.length.to_f
end
end
# Hey look, a nice 2D grid of elevations!
terrain = Matrix[ *terrain ]
# Create an array of matrices, each one representing a possible pool
rows = 0...(terrain.row_size - size)
cols = 0...(terrain.column_size - size)
pools = rows.flat_map{|x| cols.map{ |y| terrain.minor(x,size,y,size) } }
# Find the lowest pool by calling the above 'average' method on each
lowest = pools.min_by(&:average)
p lowest, lowest.average
#=> Matrix[[1, 1], [1, 2]]
#=> 1.25
On my computer the simple array-of-arrays method takes ~0.6s to find the lowest 3x3 pool in a random 400×400 terrain, while the matrix technique takes ~1.3s. So the matrix style is more than twice as slow, but still plenty fast for your assignment. :)
It's Ruby. You probably want to use iterators, not while loops.
But do your own homework. You'll learn more.