How to avoid row names in further analysis in R? - rstudio

I´m just running the following example from GGEBiplotGUI package and of course, it works properly.
library(GGEBiplotGUI)
data("Ontario")
Ontario
GGEBiplot(Data = Ontario)
But when I download "Ontario" data and I want to run the above cited script on my PC. See the example below.
Ontario <- read.csv("Book.csv")
library(GGEBiplotGUI)
GGEBiplot(Data = Ontario)
The result is the following table (from column 0 to 10) taking numbers (From 1 to 17) as genotypes and "X" as another location.
See the result below please.
X BH93 EA93 HW93 ID93 KE93 NN93 OA93 RN93 WP93
1 ann 4.460 4.150 2.849 3.084 5.940 4.450 4.351 4.039 2.672
2 ari 4.417 4.771 2.912 3.506 5.699 5.152 4.956 4.386 2.938
3 aug 4.669 4.578 3.098 3.460 6.070 5.025 4.730 3.900 2.621
4 cas 4.732 4.745 3.375 3.904 6.224 5.340 4.226 4.893 3.451
5 del 4.390 4.603 3.511 3.848 5.773 5.421 5.147 4.098 2.832
6 dia 5.178 4.475 2.990 3.774 6.583 5.045 3.985 4.271 2.776
7 ena 3.375 4.175 2.741 3.157 5.342 4.267 4.162 4.063 2.032
8 fun 4.852 4.664 4.425 3.952 5.536 5.832 4.168 5.060 3.574
9 ham 5.038 4.741 3.508 3.437 5.960 4.859 4.977 4.514 2.859
10 har 5.195 4.662 3.596 3.759 5.937 5.345 3.895 4.450 3.300
11 kar 4.293 4.530 2.760 3.422 6.142 5.250 4.856 4.137 3.149
12 kat 3.151 3.040 2.388 2.350 4.229 4.257 3.384 4.071 2.103
13 luc 4.104 3.878 2.302 3.718 4.555 5.149 2.596 4.956 2.886
14 m12 3.340 3.854 2.419 2.783 4.629 5.090 3.281 3.918 2.561
15 reb 4.375 4.701 3.655 3.592 6.189 5.141 3.933 4.208 2.925
16 ron 4.940 4.698 2.950 3.898 6.063 5.326 4.302 4.299 3.031
17 rub 3.786 4.969 3.379 3.353 4.774 5.304 4.322 4.858 3.382
How can I fix this problem? I mean, in order to avoid "rownames" and "x" as a variables in the GGEBiplotGUI analysis.
I have also tried with these codes and they didn´t work:
attributes(Ontario)$row.names <- NULL
print(Ontario, row.names = F)
row.names(Ontario) <- NULL
Ontario[, -1] ## It deletes the first column not the 0 one.
Many thanks in advance!

This code worked properly.
Ontario <- read.csv("Libro.csv")
rownames(Ontario)<-Ontario$X
Ontario1<-Ontario[,-1]
library(GGEBiplotGUI)
GGEBiplot(Data = Ontario)

Related

why my dxf file not working in AutoCAD giving me ID 11 incorrect: already used

I have generated a dxf file but when I opened it with AutoCAD, crashes AutoCAD and gives a message ID 11 incorrect: already used.
the dxf content: https://github.com/tarikjabiri/dxf/blob/dev/examples/latest.dxf
I can't spot the problem 3 days I am trying to solve it.
I think something wrong with the APPID because it holding the ID 11 or the Handle in the language of DXF.
I have a dxf working: https://github.com/tarikjabiri/dxf/blob/dev/examples/Minimal_DXF_AC1021.dxf
Thanks in advance.
There are two minor issues:
DIMSTYLE table
0
TABLE
2
DIMSTYLE
105 <<< handle group code of the table "head" is 5 as usual
8
100
AcDbSymbolTable
100
AcDbDimStyleTable
70
1
0
DIMSTYLE
5 <<< handle group code of the table entry is 105
12
330
8
100
AcDbSymbolTableRecord
100
AcDbDimStyleTableRecord
2
STANDARD
70
0
40
1
BLOCK_RECORD table entries for *MODEL_SPACE and *PAPER_SPACE
0
TABLE
2
BLOCK_RECORD
5
9
330
0
100
AcDbSymbolTable
70
2
0
BLOCK_RECORD
5
14
330
9
100
AcDbSymbolTableRecord
100
AcDbRegAppTableRecord <<< subclass marker string "AcDbBlockTableRecord"
2
*MODEL_SPACE
70
0
70
0
280
After this changes the file opens in Autodesk DWG Trueview 2022.

SparkR - Retaining the previous value in another column

I have a spark dataFrame that looks like this:
id dates value
1 11 2013-11-15 10
2 11 2013-11-16 15
3 22 2013-11-15 20
4 22 2013-11-16 21
5 22 2013-11-17 3
I wish to retain the value from the previous date per id.
The final result should look like this:
id dates value prev_value
1 11 2013-11-15 10 NA
2 11 2013-11-16 15 10
3 22 2013-11-15 20 NA
4 22 2013-11-16 21 20
5 22 2013-11-17 3 21
The solution from this question would not work for various reasons.
I would appreciate the help!
So after playing with it for a while, here's the workaround that I found:
First of all, here's the example DF
id<-c(11,11,22,22,22)
dates<-as.Date(c('2013-11-15','2013-11-16','2013-11-15','2013-11-16','2013-11-17'), "%Y-%m-%d")
value <- c(10,15,20,21,3)
example<-as.DataFrame(data.frame(id=id,dates=dates, value))
I copy the example DF and add 1 day to the original date, then rename the column
example_p <- example
example_p$dates <- date_add(example_p$dates, 1)
colnames(example_p) <- c("id", "dates", "prev_value")
Finally, I merge the new DF to the original one
result <- select(merge(example, example_p, by = intersect(names(example),names(example_p))
, all.x = T), c("id_x", "dates_x", "value", "prev_value"))
showDF(result)
+----+----------+-----+----------+
|id_x| dates_x|value|prev_value|
+----+----------+-----+----------+
|22.0|2013-11-15| 20.0| null|
|11.0|2013-11-15| 10.0| null|
|11.0|2013-11-16| 15.0| 10.0|
|22.0|2013-11-16| 21.0| 20.0|
|22.0|2013-11-17| 3.0| 21.0|
+----+----------+-----+----------+
Obviously, this is somehow clumsy and I will be happy to give the points to anyone who can suggest a solution that would work faster than this.

How can I convert the qblast XML output into the NCBI BLAST -outfmt 17?

I started my project with the NCBI standalone BLAST and used the -outfmt 17 option. For my purpose that formatting is extremely helpful. However, I had to change to Biopython and I'm now using qblast to align my sequences to the NCBI NT database. Can I save/convert the qblast XML in a format which is comparable to the NCBI BLAST standalone -outfmt 17 format?
Thank you very much for your help!
Cheers,
Philipp
I'm going to assume you meant -outfmt 7 and you need an output with columns.
from Bio.Blast import NCBIWWW, NCBIXML
# This is the BLASTN query which returns an XML handler in a StringIO
r = NCBIWWW.qblast(
"blastn",
"nr",
"ACGGGGTCTCGAAAAAAGGAGAATGGGATGAGAAGGATATATGGGTAGTGTCATTTTTTAACTTGCAGAT" +
"TTCATCCTAGTCTTCCAGTTATCGTTTCCTAGCACTCCATGTTCCCAAGATAGTGTCACCACCCCAAGGA" +
"CTCTCTCTCATTTTCTTTGCCTGGGCCCTCTTTCTACTGAGGAGTCGTGGCCTTCCATCAGTAGAAGCCG",
expect=1E-5)
# Now we read that XML extracting the info
for record in NCBIXML.parse(r):
for alignment in record.alignments:
for hsp in alignment.hsps:
cols = "{}\t" * 10
print(cols.format(hsp.positives / hsp.align_length,
hsp.align_length,
hsp.align_length - hsp.positives,
hsp.gaps,
hsp.query_start,
hsp.query_end,
hsp.sbjct_start,
hsp.sbjct_end,
hsp.expect,
hsp.score))
Outputs something like:
1 210 0 0 1 210 89250 89459 8.73028e-102 420.0
0 206 19 2 5 210 46259 46462 5.16461e-73 314.0
1 210 0 0 1 210 68822 69031 8.73028e-102 420.0
0 206 19 2 5 210 25825 26028 5.16461e-73 314.0
1 210 0 0 1 210 65887 66096 8.73028e-102 420.0
...

Suggestions for data extraction Data in fortran

I use F95/90 and IBM compiler. I am trying to extract the numerical values from block and write in a file. I am facing a strange error in the output which I cannot understand. Every time I execute the program it skips the loop between 'Beta' and 'END'. I am trying to read and store the values.
The number of lines inside the Alpha- and Beta loops are not fixed. So a simple 'do loop' is of no use to me. I tried the 'do while' loop and also 'if-else' but it still skips the 'Beta' part.
Alpha Singles Amplitudes
15 3 23 4 -0.186952
15 3 26 4 0.599918
15 3 31 4 0.105048
15 3 23 4 0.186952
Beta Singles Amplitudes
15 3 23 4 0.186952
15 3 26 4 -0.599918
15 3 31 4 -0.105048
15 3 23 4 -0.186952
END `
The simple short code is :
program test_read
implicit none
integer::nop,a,b,c,d,e,i,j,k,l,m,ios
double precision::r,t,rr
character::dummy*300
character*15::du1,du2,du3
open (unit=10, file="1.txt", status='old',form='formatted')
100 read(10,'(a100)')dummy
if (dummy(1:3)=='END') goto 200
if(dummy(2:14)=='Alpha Singles') then
i=0
160 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
do while(du1.ne.' Bet')
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'AS',du1,b,du2,c,du3,d,du4,e,r
goto 160
end do
elseif (dummy(2:14)=='Beta Singles') then
170 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
if((du1=='END'))then
stop
else
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'BS',du1,b,du2,c,du3,d,du4,e,r
goto 170
end if
end if
goto 100
200 print*,'This is the end'
end program test_read
Your program never gets out of the loop which checks for Beta because when your while loop exits, it has already read the line with Beta. It then goes to 100 which reads the next line after Beta, so you never actually see Beta Singles. Try the following
character(len=2):: tag
read(10,'(a100)')dummy
do while (dummy(1:3).ne.'END')
if (dummy(2:14)=='Alpha Singles') then
tag = 'AS'
else if (dummy(2:14)=='Beta Singles') then
tag = 'BS'
else
read(dummy,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')tag,du1,b,du2,c,du3,d,du4,e,r
end if
read(10, '(a100)') dummy
end do
print*,'This is the end'

faster way to create variable that aggregates a column by id [duplicate]

This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 5 years ago.
Is there a faster way to do this? I guess this is unnecessary slow and that a task like this can be accomplished with base functions.
df <- ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc)))
I'm quite new to R. I have looked at by(), aggregate() and tapply(), but didn't get them to work at all or in the way I wanted. Rather than returning a shorter vector, I want to attach the sum to the original dataframe. What is the best way to do this?
Edit: Here is a speed comparison of the answers applied to my data.
> # My original solution
> system.time( ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc))) )
user system elapsed
14.405 0.000 14.479
> # Paul Hiemstra
> system.time( ddply(df, "id", transform, perc.total = sum(cand.perc)) )
user system elapsed
15.973 0.000 15.992
> # Richie Cotton
> system.time( with(df, tapply(df$cand.perc, df$id, sum))[df$id] )
user system elapsed
0.048 0.000 0.048
> # John
> system.time( with(df, ave(cand.perc, id, FUN = sum)) )
user system elapsed
0.032 0.000 0.030
> # Christoph_J
> system.time( df[ , list(perc.total = sum(cand.perc)), by="id"][df])
user system elapsed
0.028 0.000 0.028
Since you are quite new to R and speed is apparently an issue for you, I recommend the data.table package, which is really fast. One way to solve your problem in one line is as follows:
library(data.table)
DT <- data.table(ID = rep(c(1:3), each=3),
cand.perc = 1:9,
key="ID")
DT <- DT[ , perc.total := sum(cand.perc), by = ID]
DT
ID Perc.total cand.perc
[1,] 1 6 1
[2,] 1 6 2
[3,] 1 6 3
[4,] 2 15 4
[5,] 2 15 5
[6,] 2 15 6
[7,] 3 24 7
[8,] 3 24 8
[9,] 3 24 9
Disclaimer: I'm not a data.table expert (yet ;-), so there might faster ways to do that. Check out the package site to get you started if you are interested in using the package: http://datatable.r-forge.r-project.org/
For any kind of aggregation where you want a resulting vector the same length as the input vector with replicates grouped across the grouping vector ave is what you want.
df$perc.total <- ave(df$cand.perc, df$id, FUN = sum)
Use tapply to get the group stats, then add them back into your dataset afterwards.
Reproducible example:
means_by_wool <- with(warpbreaks, tapply(breaks, wool, mean))
warpbreaks$means.by.wool <- means_by_wool[warpbreaks$wool]
Untested solution for your scenario:
sum_by_id <- with(df, tapply(cand.perc, id, sum))
df$perc.total <- sum_by_id[df$id]
ilprincipe if none of the above fits your needs you could try transposing your data
dft=t(df)
then use aggregate
dfta=aggregate(dft,by=list(rownames(dft)),FUN=sum)
next have back your rownames
rownames(dfta)=dfta[,1]
dfta=dfta[,2:ncol(dfta)]
Transpose back to original orientation
df2=t(dfta)
and bind to original data
newdf=cbind(df,df2)
Why are you using cbind(x, ...) the output of ddply will be append automatically. This should work:
ddply(df, "id", transform, perc.total = sum(cand.perc))
getting rid of the superfluous cbind should speed things up.
You can also load up your favorite foreach backend and try the .parallel=TRUE argument for ddply.

Resources