Construct a correlation matrix from table with non-aligned dates - correlation

I had a look at this post to compute the correlation matrix given an input table.
My issue is that my columns are not consistently aligned.
For instance:
([]date:.z.d+til 100;a:100?10f;b:(10#0n),90?1f;c:(90?1f),(10#0n))
date a b c
---------------------------------------------
2019.11.18 6.018138 0.1357346
2019.11.19 2.365495 0.9805366
2019.11.20 0.5136894 0.2821858
2019.11.21 9.013581 0.4946025
2019.11.22 1.0842 0.967023
2019.11.23 4.543989 0.6901084
2019.11.24 4.597627 0.6303566
2019.11.25 2.18889 0.01415349
2019.11.26 3.050233 0.2783062
2019.11.27 5.259109 0.6675121
2019.11.28 5.175593 0.1684333 0.3706485
2019.11.29 5.14162 0.5885103 0.4183277
I don't want to remove all the rows containing null values before computing the correlation matrix, as I have many columns and the intersection of all the dates could be the empty set.
Instead, I would like to apply n*(n-1)\2 operations to populate the correlation matrix that I would construct myself, by taking the joint series of a and b, and putting the result in my correlation matrix C at C[1,2] and C[2,1].
I insist on the n*(n-1)\2 operations as the answers in the post I mentioned above seem to do n*n operations (my n is roughly equal to 700).

This might get you something close to what you're looking for:
q)m:(1_cols t)!();
q){x{m::m,'key[x]!1f,value(1_x)cor\:y;1_x}/x}1_flip t
q)m^flip m
| a b c
-| ----------------------------------
a| 1 0.01418217 0.04938382
b| 0.01418217 1 -0.06297328
c| 0.04938382 -0.06297328 1
Uses only 3 cor operations.

Related

How to create a Matrix with p values from anova

I performed an ANOVA and corrected it with Tukey's test, so I got several values ​​of P.
Now I would like to build a Heatmap with these values ​​and for that I need to create an matrix with the values ​​of P to be able to make my Heat map
The first question would be how to fill a matrix with the anova p-values?
Then I made an ancova and obtained other p-values.
Now I would like to make a heatmap to compare these p-values ​​between the anova and the ancova.
Can someone help me ?
I will exemplify
anova_model <- aov( X ~ groups , data = T1)
postHocs <- glht(anova_model, linfct = mcp(groups = "Tukey"))
summary(postHocs)
This anova gave me several values ​​of P(!)
ancova_model <- aov( X ~ groups + age , data = T1)
postHocs <- glht(ancova_model, lymphct = mcp(groups = "Tukey"))
summary(postHocs)
This ancova gave me several other values ​​of P(!)
I would now like to create a Heat map to compare these P values. To see for example when age interferes a lot or not. I believe that before the ideal is to create a matrix before but I'm actually kind of lost.
Could someone help me?
Thank you very much

QuickSight add subset of fields

Total AWS QuickSight newbie here. I'm trying to import some cost data in CSV form into QuickSight and add some calculated fields.
The data I have is of the form:
Type
Units Consumed
A
2
B
3
A
1
B
5
... and so on
Unit Cost ($) is not part of the dataset and is something like
Unit Cost
Amount ($)
Unit Cost (A)
1
Unit Cost (B)
2
I would like to compute (either as part of the dataset or as part of an analysis visual, maybe) the total costs for A and B as separate line items. Something like
Total Cost (A) = Sum(Amount where Type = A) * Unit Cost (A)
Total Cost (B) = Sum(Amount where Type = B) * Unit Cost (B)
Here are the things I've tried which don't work:
sumOver({Units Consumed}, Type='A')
sumIf({Units Consumed}, Type='A')
To break it down and test smaller parts, I added a calculated field which simply does
sum({Units Consumed})
But it just adds a column to the dataset with every field as "Undefined".
How can I achieve what I'm trying to do?
I tried to replicate the code
sumIf({Units Consumed}, Type='A')
and it worked. Could you check if Units Consumed is a integer column type?
How to change column type

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.
You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2
use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

Eigenvalues for matrices in a for loop

I need to calculate eigenvalues of a series of matrices and then save them in a separate file. My data has 5 columns and 10,000 rows. I use the following functions:
R<-NULL
A <- setwd("c:/location of the file on this computer")
for(i in 0:1){
X<-read.table(file="Example.prn", skip=i*5, nrow=5)
M <- as.matrix(X)
E=eigen(M, only.values = TRUE)
R<-rbind(R,E)}
print(E)
}
As an example I have used a data set with 10 rows and 5 columns. This gives me the following results:
$`values`
[1] 1.350000e+02+0.000e+00i -4.000000e+00+0.000e+00i 4.365884e-15+2.395e-15i 4.365884e-15-2.395e-15i
[5] 8.643810e-16+0.000e+00i
$vectors
NULL
$`values`
[1] 2.362320e+02+0.000000e+00i -4.960046e+01+1.258757e+01i -4.960046e+01-1.258757e+01i 9.689475e-01+0.000000e+00i
[5] 1.104994e-14+0.000000e+00i
$vectors
NULL
I have three questions and I would really appreciate any help:
I want to save the results in consecutive rows, such as:
Eigenvalue(1) Eigenvalue(3) Eigenvalue(5) Eigenvalue(7) Eigenvalue(9)
Eigenvalue(2) Eigenvalue(4) Eigenvalue(6) Eigenvalue(8) Eigenvalue(10)
any thoughts?
Also, I don't understand the eigenvalues in the output. They are not numbers. For example, one of them is 2.362320e+02+0.000000e+00i. My first though was that this is the sum of five determinants for a 5x5 matrix. However, "2.362320e+02+0.000000e+00i" seems to only have four numbers in it. Any thoughts? Doesn't eigen() function calculate the final values of eigenvalues?
how can I save my outcome on an Excel file? I have used the following codes
However, the result I get from the current codes are:
> class(R)
[1] "matrix"
> print(R)
values vectors
E Complex,5 NULL
E Complex,5 NULL
I think, you can easily get values by the following code:
R<-NULL
A <- setwd("c:/location of the file on this computer")
for(i in 0:1){
X<-read.table(file="Example.prn", skip=i*5, nrow=5)
M <- as.matrix(X)
E=eigen(M, only.values = TRUE)
R<-rbind(R,E$values)}
}
and then use the answer of this question, to save R into a file

Stack multiple columns into one

I want to do a simple task but somehow I'm unable to do it. Assume that I have one column like:
a
z
e
r
t
How can I create a new column with the same value twice with the following result:
a
a
z
z
e
e
r
r
t
t
I've already tried to double my column and do something like :
=TRANSPOSE(SPLIT(JOIN(";",A:A,B:B),";"))
but it creates:
a
z
e
r
t
a
z
e
r
t
I get inspired by this answer so far.
Try this:
=SORT({A1:A5;A1:A5})
Here we use:
sort
{} to combine data
Accounting your comment, then you may use this formula:
=QUERY(SORT(ArrayFormula({row(A1:A5),A1:A5;row(A1:A5),A1:A5})),"select Col2")
The idea is to use additional column of data with number of row, then sort by row, then query to get only values.
And join→split method will do the same:
=TRANSPOSE(SPLIT(JOIN(",",ARRAYFORMULA(CONCAT(A1:A5&",",A1:A5))),","))
Here we use range only two times, so this is easier to use. Also see Concat + ArrayFormula sample.
Few hundreds rows is nothing :)
I created index from 1 to n, then pasted it twice and sorted by index. But it's obviously fancier to do it with a formula :)
Assuming Your list is in column A and (for now) the times of repeat are in C1 (can be changed to a number in the formula), then something simple like this will do (starting in B1):
=INDEX(A:A,(INT(ROW()-1)/$C$1)+1)
Simply copy down as you need it (will give just 0 after the last item). No sorting. No array. No sheets/excel problems. No heavy calculations.

Resources