How to remove variables in large dataset i Rstudio?

How to remove variables in large dataset i Rstudio? - rstudio

I'm trying to permanently remove some of the variables in the dataframe 'd' individually, as they are no longer useful.
New to Rstudio and coding. Using Rstudio, version 0.99.491 on Windows. I am using a secure server, so downloading packages are not an option.
I have a very large dataset 'd' containing 122 variables of ~450.000 rows.
I'm using a Danish version of the program, so the error messages have been translated by me and might be incorrect.
I have tried:
Option 1:
> rm (d$variable121)
Error in rm(d$variable121):... must contain name or character strings
Option 2:
> rm('d$variable121')
Warning meaasage:
in rm('d$variable121'): object 'd$variable121' not found
Option 3:
> rm (list=c('d$variable121', 'd$variable122'))
Warning messages:
1: in rm (list=c('d$variable121', 'd$variable122')) object 'variable 121' not found.
2: in rm (list=c('d$variable121', 'd$variable122')) object 'variable 122' not found.
I'm able to remove other dataframes, but not any variable in the 'd' dataframe.
Does anyone know how to do this?

Suppose your data frame is called d with four columns and you want to delete variables with names var1 and var3. You can do
> d <- data.frame(var1=1:10, var2=2:11, var3=3:12, var4=4:13)
> d
var1 var2 var3 var4
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
6 6 7 8 9
7 7 8 9 10
8 8 9 10 11
9 9 10 11 12
10 10 11 12 13
> dropped <- c("var1", "var3")
> d[, !(names(d) %in% dropped)]
var2 var4
1 2 4
2 3 5
3 4 6
4 5 7
5 6 8
6 7 9
7 8 10
8 9 11
9 10 12
10 11 13

Related

How to repeat columns in oracle reports just like matrix

I want below kind of format in oracle reports 6i.
A B C A B C
1 1 1 6 6 6
2 2 2 7 7 7
3 3 3 8 8 8
4 4 4 9 9 9
5 5 5 10 10 10
enter image description here

Algorithm for generating a cross-sum matrix game

I'm trying to generate a matrix, like for a cross-sum game, where in a matrix of random numbers, for a given a sum (or a product, depending on a chosen operation) for each row and column, there's exactly 1 way to "deactivate" (meaning, to exclude the number from the final sum or product) correct numbers so that each row and column end up summing the active numbers to the correct sum.
To illustrate this, let's say I have a 3x3 matrix, and chosen sums (numbers next to * represent the sum):
*12* *5* *3*
4* 1 2 3 *4
9* 4 5 6 *9
7* 7 8 9 *7
In order to solve this, I would need to deactivate numbers 2, 6, 9 and 8.
One way to generate a matrix with needed sums it to just generate the numbers, and then choose which ones to exclude at random. However, the drawback is that for bigger matrices, like 7x7, 8x8, there's a good possibility that there will be more than 1 solution.
Another solution I'm thinking of is to exclude the numbers that can add up to another for each row / column. For example if a required sum is 5, then 4 2 1 3 would be invalid because (4 + 1 and 3 + 2), but this seems rather complicated and inefficient.
If anyone has any pointers, I'd greatly appreciate it. This seems like it's a solved problem, but I have no idea what to look for.

Checking random grids with a solver
For a matrix of limited size, up to 10×10 or so, a simple solver can quickly find the solutions. If there is only one, even the quick'n'dirty solver I wrote in javaScript usually finds in it less than a second.
I used a simple recursive row-by-row solver which works like this:
For each column, iterate over every possible selection of numbers and check whether excluding them gives the column the correct sum. Then check whether any numbers are part of all or none of the valid selections; these will have to be included and avoided in all selections.
For each row, iterate over every possible selection of numbers and check whether excluding them gives the row the correct sum, and whether they contain all of the to-be-included and none of the to-be-avoided numbers identified in the previous step. Store all valid selections per row.
After these preparations, the recursive part of the algorithm is then called:
Receive a matrix with numbers, a list of sums per row, and a list of sums per column.
For the top row, check whether any of the numbers cannot be excluded (because the numbers below it add up to less that the sum for that column).
Iterate over all valid selections of numbers in the top row (as identified in the preparation phase). For each selection, check whether removing it gives the row its correct sum. If it does, recurse with a copy of the matrix with the top row removed, a list of sums per row with the first item removed, and a list of sums per column with the non-excluded numbers in the top row subtracted.
Starting from a pattern like this, where the X's indicate which cell will be excluded:
- - - X - - - X - -
- - - - X - X - - -
X - - - - X - - - -
- X - - - - - - - X
- - X - - - - - X -
- X - - - - - X - -
X - - - - - - - X -
- - - - X - - - - X
- - - X - X - - - -
- - X - - - X - - -
I let the matrix be filled with random numbers from 1 to 9, and then ran the solver on it, and about one in ten attempts results in a grid like this, which has exactly one solution:
4 1 3 8 1 3 4 1 1 8 25
9 9 7 8 1 1 3 2 1 7 44
9 8 8 1 5 5 9 2 2 6 41
4 6 8 1 9 2 1 7 1 5 33
9 4 2 4 4 5 8 6 3 8 48
8 5 6 9 6 6 6 4 1 8 50
4 3 2 4 8 7 6 7 9 1 38
6 7 8 1 9 9 9 4 6 7 50
7 7 1 7 9 6 2 7 1 2 36
3 3 8 8 9 2 4 9 6 8 48
50 42 43 36 51 35 45 44 19 48
When using only numbers from 1 to 9, grids with only one solution are easy to find for smaller grids (more than half of 8×8 grids have only one solution), but become hard to find for grid sizes over 10×10. Most larger grids have many solutions, like this one which has 16:
4 1 5 7 2 2 5 6 5 8 32
5 1 1 6 4 6 5 2 2 9 32
9 2 3 8 7 7 4 8 3 6 41
4 8 1 8 4 3 1 9 7 2 37
4 6 9 8 8 5 8 6 6 5 50
1 5 5 5 1 3 5 7 7 1 28
5 5 1 7 2 9 2 6 3 8 40
9 8 9 2 8 3 1 9 6 8 47
5 1 3 7 1 2 6 1 8 9 34
1 5 1 2 1 1 1 6 4 3 23
33 29 28 46 26 32 32 47 42 49
The number of solutions also depends on the number of excluded numbers per row and column. The results shown above are specifically for the pattern with two excluded numbers per row and column. The more excluded numbers, the greater the average number of solutions (I assume with a peak at 50% excluded numbers).
You can of course use a random pattern of cells to be excluded, or choose the numbers by hand, or have random numbers chosen with a certain distribution, or give the matrix any other property that you think will enhance its usefulness as a puzzle. Multiple solutions don't seem to be a big problem for smaller grids, but it is of course best to check for them; I first ran the solver on a grid I had made by hand, and it turned out to have three solutions.
Choosing the excluded values
Because the value of the excluded numbers can be chosen freely, this is the obvious way to improve the chance of a matrix having only one solution. If you choose numbers that don't occur anywhere else in the row and column, or only once, then the percentage of 10×10 grids that have only one solution rises from 10% to 50%.
(This simple method obviously gives a clue about which numbers should be excluded – it's not the numbers that occur several times in a row or column – so it's probably better to use how many times each number occurs in the whole grid, not just in its own row and column.)
You could of course choose excluded values that add up to a number that can't be made with any other combination of values in the row or column, and that would guarantee only one solution. The problem with this is of course that such a grid doesn't really work as a puzzle; there is only ever one way to exclude values and get the correct sum for every row and column. A variant would be to choose excluded values so that the sum of the row or column can be made in exactly two, or three, or ... ways. This would also give you a way to choose the difficulty level of the puzzle.
Sudoku – avoiding duplicate values
The fact that larger grids have a higher chance of having more than one solution is of course linked to using only values for 1 to 9. Grids of 10×10 and greater are guaranteed to have duplicate values in every row and column.
To check whether grids with no duplicate values per row or column are more likely to lead to only one solution, the obvious test data is the Sudoku.
When using random patterns of 1 to 3 cells per row and column to be excluded, around 90% of cross-sum matrix games based on Sudokus have only one solution, compared to around 60% when using random values.
(It could of course be interesting to create puzzles which work both as a Sudoku and as a cross-sum matrix puzzle. For every Sudoku it should be easy to find a visually pleasing pattern of excluded cells that has only one solution.)
Examples
For those who like a challenge (or want to test a solver), here's a cross-sum Sudoku and an 11×11, 12×12 and 13×13 cross-sum matrix puzzle that have just one solution:
. 3 . 4 . . . . . 36
. 6 . . 9 . . 4 5 35
4 . . . . . 9 . . 33
. . 3 . . 1 . . . 39
. . . . . 8 2 . 3 29
. 7 . . . 2 6 . 9 40
. 2 . . . . . . . 33
3 . 8 . . . . . . 31
. . 7 . 5 . . 6 4 36
33 34 35 37 27 42 34 32 38
6 6 5 2 9 4 4 6 7 1 8 44
1 8 1 1 4 7 3 3 3 1 2 25
5 8 7 7 5 5 6 1 7 6 5 43
8 9 6 2 9 1 6 2 9 8 3 59
8 8 2 3 6 3 7 7 5 9 8 53
8 2 7 2 6 2 9 4 7 1 2 47
3 9 2 8 8 4 2 9 3 6 6 50
3 1 8 2 6 4 1 7 9 4 6 42
8 3 6 7 8 5 4 4 2 8 4 46
8 3 8 6 5 7 9 8 6 9 2 59
9 6 8 4 6 2 4 8 5 6 2 49
52 50 47 40 58 34 46 50 54 48 38
1 5 8 6 6 5 4 9 9 7 7 8 66
5 6 2 5 5 4 8 5 7 7 3 6 54
8 2 8 2 8 6 9 4 9 5 9 9 67
1 2 8 2 3 4 5 8 8 7 6 2 48
8 9 4 8 7 2 8 2 2 3 7 7 57
2 2 1 9 4 1 1 1 5 6 1 5 36
2 1 4 2 9 1 2 8 1 6 9 7 49
3 6 5 7 5 5 7 9 4 7 7 5 59
8 2 3 4 8 2 2 3 3 1 6 1 35
4 2 1 7 7 1 7 9 6 7 9 7 51
7 4 3 2 8 3 6 7 8 3 1 8 54
3 8 9 8 7 6 5 7 1 1 7 3 59
48 45 51 47 62 38 61 59 57 50 60 57
4 3 9 3 7 6 6 9 7 7 5 9 1 71
2 7 4 7 1 1 9 8 8 3 3 5 4 52
6 9 6 5 6 4 6 7 3 6 6 8 8 68
5 7 8 8 1 5 3 4 5 7 2 9 6 60
5 3 1 3 3 5 4 5 9 1 8 2 7 50
3 8 3 1 8 4 8 2 2 9 7 3 6 58
6 6 9 8 3 5 9 1 4 6 9 8 2 69
8 1 8 2 9 7 1 3 8 5 2 1 5 50
9 9 4 5 4 9 7 1 8 8 1 2 6 60
9 2 4 8 4 5 3 3 7 9 6 1 6 58
5 2 7 6 8 5 6 6 1 3 4 7 2 47
8 3 5 2 7 2 4 5 8 1 2 6 2 49
7 1 7 4 9 2 9 8 9 3 5 2 3 59
66 50 69 50 58 49 64 57 65 66 56 47 54

Gnuplot animations, Nbodies solver

I was searching for this for few hours, bud didnt manage to find anything really useful to make everything work.
So ive got a txt file with data.
x1 y1 z1
x2 y2 z2
x3 y3 z3
x1' y1' z1'
x2' y2' z2'
x3' y3' z3'
x1'' y1'' z1''
x2'' y2'' z2''
x3'' y3'' z3''
.
.
.
x1^n y1^n z1^n
x2^n y2^n z2^n
x3^n y3^n z3^n
That means ive got 3 planets with 3d position each. After time step h planet 1(x1,y1,z1) moves to (x1',y1',z1').
How to visualisate that animation in GNUplot? Ive got N time steps and K bodies.
Thank You for any help.

Use gif terminal.
planets.txt:
1 1 1
2 2 2
3 3 3
2 2 2
3 3 3
4 4 4
3 3 3
4 4 4
5 5 5
4 4 4
5 5 5
6 6 6
5 5 5
6 6 6
7 7 7
6 6 6
7 7 7
8 8 8
7 7 7
8 8 8
9 9 9
8 8 8
9 9 9
10 10 10
9 9 9
10 10 10
11 11 11
10 10 10
11 11 11
12 12 12
planets.gp:
set term gif animate optimize delay 50 size 500,500
set output 'planets.gif'
set xrange [0:15]
set yrange [0:15]
set zrange [0:15]
unset key
unset colorbox
set palette rgb 3,11,6
do for [i=0:9] {
splot 'planets.txt' every :::i::i u 1:2:3 with points palette ps 3 pt 6
}

Checking for duplicate values on a line [duplicate]

This question already has answers here:
Bash simplified sudoku
(2 answers)
Closed 7 years ago.
So I am making a script to check whether a Sudoku is correct or not.
The task is to check for duplicate numbers in the lines and columns, now for the columns the job was fairly easy using awk with sort and uniq. But to check for the lines is another thing..Since I can not use sort and the likes of that, keep in mind that my sed knowledge is pretty much nothing.
Input:
5 3 4 6 7 8 9 1 2
6 7 2 1 9 5 3 4 8
1 3 8 3 4 2 5 6 7
8 5 9 7 6 1 4 2 3
4 2 6 8 5 3 7 9 1
7 1 3 9 2 4 8 5 6
9 6 1 1 3 7 2 8 4
2 8 7 4 1 9 6 3 5
3 4 5 2 8 6 1 7 8
Now on some of the lines there are duplicate numbers, and I want the output to tell something like
line 1 : good
line 2 : good
line 3 : bad
line 4 : good
line 5 : good
line 6 : good
line 7 : bad
line 8 : good
line 9 : bad

Assuming you've used awk for checking the columns, and that each column just have numbers from 1 to 9, you can do the check also with awk with something like this:
{ delete arr;
okline = 1;
for (i = 1; i <= NF; ++i) {
if (arr[$i] == 1) {
okline = 0;
break;
}
arr[$i] = 1;
}
if (okline == 1)
{
print "Line: OK";
}
else
{
print "Line: error";
}
}
It does not count the lines, but you can get the idea.

disappearing row names when using apply

Consider the following example (values in vectors are target practice results and I'm trying to automagically sort by shooting score). We generate three vectors. We sort values in columns 1:20 in ascending order and rows in descending order based on out.tot column.
# Generate data
shooter1 <- round(runif(n = 20, min = 1, max = 10))
shooter2 <- round(runif(n = 20, min = 1, max = 10))
shooter3 <- round(runif(n = 20, min = 1, max = 10))
out <- data.frame(t(data.frame(shooter1, shooter2, shooter3)))
colnames(out) <- 1:ncol(out)
out.sort <- t(apply(out, 1, sort, na.last = FALSE))
out.tot <- apply(out , 1, sum)
colnames(out.sort) <- 1:ncol(out.sort)
out2 <- cbind(out.sort, out.tot)
out3 <- apply(out2, 2, sort, decreasing = TRUE, na.last = FALSE)
out2 has row names attached while out3 lost them. The only difference is that I used MARGIN = 2, which is probably the culprit (because it takes in column by column). I can match rows by hand, but is there a way I can keep row names in out3 from disappearing?
> out2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 out.tot
shooter1 1 2 2 3 3 3 4 5 5 5 6 6 6 6 6 7 8 9 9 10 106
shooter2 1 3 3 3 3 4 4 4 5 5 5 5 5 6 7 8 8 9 9 10 107
shooter3 1 1 2 2 2 3 3 4 5 5 5 6 6 6 6 7 8 8 8 9 97
> out3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 out.tot
[1,] 1 3 3 3 3 4 4 5 5 5 6 6 6 6 7 8 8 9 9 10 107
[2,] 1 2 2 3 3 3 4 4 5 5 5 6 6 6 6 7 8 9 9 10 106
[3,] 1 1 2 2 2 3 3 4 5 5 5 5 5 6 6 7 8 8 8 9 97

If I understand your example, going from out2 to out3 you are sorting each column independently - meaning that the values on row 1 may not all come from the data generated from shooter1. It makes sense then that the rownames are dropped in as much as rownames are names of observations and you are no longer keeping data from one observation on one row.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to remove variables in large dataset i Rstudio? - rstudio

Related

How to repeat columns in oracle reports just like matrix

Algorithm for generating a cross-sum matrix game

Gnuplot animations, Nbodies solver

Checking for duplicate values on a line [duplicate]

disappearing row names when using apply

Categories

Resources