Pivot Table in Oracle 11g - oracle

Could you please help me to figure out the pivot table? here is the first table :
Date 1 2 3 4 5
-----------------------------------------
20130101 0.12 0.13 0.43 0.32 0.22
20130102 0.22 0.31 0.13 0.31 0.29
20130103 0.32 0.12 0.33 0.12 0.34
I want this table to be like this :
Date Number Values
---------------------------
20130101 1 0.12
20130101 2 0.13
20130101 3 0.43
20130101 4 0.32
20130102 5 0.22
20130102 1 0.22
20130102 2 0.31
20130102 3 0.13
20130102 4 0.31
20130102 5 0.29
20130103 1 0.32
20130103 2 0.12
20130103 3 0.33
20130103 4 0.12
20130103 5 0.34
I've tried to find the specific query for this like using "decode", but it didn't work for me.
here is a website that I've tried :
Advice Using Pivot Table in Oracle.
Could you please help me to figure this out?
Thank you so much for your help.

You don't need a PIVOT but an UNPIVOT
SELECT *
FROM table1
unpivot
(
"Values" FOR "Number" IN ("1","2","3","4","5")
);
Here is a sqlfiddle demo

Related

Fi score -Sklearn

What is the F1-score of the model in the following? I used scikit learn package.
print(classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support
<BLANKLINE>
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
<BLANKLINE>
accuracy 0.60 5
macro avg 0.50 0.56 0.49 5
weighted avg 0.70 0.60 0.61 5
This article explains it pretty well
Basically it's
F1 = 2 * precision * recall / (precision + recall)

Stata: Transposing panel rows to column

I am trying to rearrange the following panel data set into a form where I can merge with another. I would like to transform this:
Gender Year IndA IndB IndC
1 2008 0.22 0.34 0.45
2 2008 0.78 0.66 0.55
1 2009 0.25 0.36 0.49
2 2009 0.75 0.64 0.51
1 2010 0.28 0.38 0.48
2 2010 0.72 0.62 0.52
Into:
(ID) Year Industry 1 2
1 2008 A 0.22 0.78
2 2009 A 0.25 0.75
3 2010 A 0.28 0.72
4 2008 B 0.34 0.66
5 2009 B 0.36 0.64
6 2010 B 0.38 0.62
7 2008 C 0.45 0.55
8 2009 C 0.49 0.51
9 2010 C 0.38 0.62
I am new to Stata and am having difficulties reshaping both the columns and the genders.
See help reshape. One way to do this is consecutive reshapes. You can execute the first line, look at the data in the data browser, then execute the second line to see how this works. You will also need to choose a name other than 1 and 2 for the final variables.
reshape long Ind, i(Year Gender) j(Industry) string
reshape wide Ind, i(Year Industry) j(Gender)
You can also replace the first reshape with a stack (less legible, but can sometimes be faster than a reshape):
stack Gender Year IndA Gender Year IndB Gender Year IndC, into(Gender Year Y) clear
rename _stack Industry
lab define Industry 1 "A" 2 "B" 3 "C"
lab val Industry Industry
reshape wide Y, i(Industry Year) j(Gender)
sort Industry Year
gen id = _n
order id Year Industry
list, sepby(Industry) noobs
As a third variation on the same theme, note that proportions for the two Genders sum to 1, so we only need one.
clear
input Gender Year IndA IndB IndC
1 2008 0.22 0.34 0.45
2 2008 0.78 0.66 0.55
1 2009 0.25 0.36 0.49
2 2009 0.75 0.64 0.51
1 2010 0.28 0.38 0.48
2 2010 0.72 0.62 0.52
end
drop if Gender == 1
drop Gender
reshape long Ind , i(Year) j(Type) string
list , sepby(Year)
+-------------------+
| Year Type Ind |
|-------------------|
1. | 2008 A .78 |
2. | 2008 B .66 |
3. | 2008 C .55 |
|-------------------|
4. | 2009 A .75 |
5. | 2009 B .64 |
6. | 2009 C .51 |
|-------------------|
7. | 2010 A .72 |
8. | 2010 B .62 |
9. | 2010 C .52 |
+-------------------+

How to transform a correlation matrix into a single row?

I have a 200x200 correlation matrix text file that I would like to turn into a single row.
e.g.
a b c d e
a 1.00 0.33 0.34 0.26 0.20
b 0.33 1.00 0.40 0.48 0.41
c 0.34 0.40 1.00 0.59 0.35
d 0.26 0.48 0.59 1.00 0.43
e 0.20 0.41 0.35 0.43 1.00
I want to turn it into:
a_b a_c a_d a_e b_c b_d b_e c_d c_e d_e
0.33 0.34 0.26 0.20 0.40 0.48 0.41 0.59 0.35 0.43
I need a code that can:
1. Join the variable names to make a single row of headers (e.g. turn "a" and "b" into "a_b") and
2. Turn only one half of the correlation matrix (bottom or top triangle) into a single row
A bit of extra information: I have around 500 participants in a study and each of them has a correlation matrix file. I want to consolidate these separate data files into one file where each row is one participant's correlation matrix.
Does anyone know how to do this?
Thanks!!

storing multiway data from for loop

I have the following three-way data (I X J X K) for my polymerization system: Z (23x4x3)
Z(:,:,1) = [0 6.70 NaN NaN
0.14 5.79 27212.52 17735.36
0.26 5.04 26545.98 17279.95
0.35 4.43 26007.91 16902.22
0.43 3.92 25567.61 16586.18
0.49 3.50 25202.48 16319.65
0.54 3.15 24898.99 16094.87
0.59 2.85 24648.07 15906.19
0.63 2.60 24441.06 15748.28
0.66 2.38 24270.42 15616.51
0.68 2.20 24130.05 15506.90
0.71 2.05 24014.78 15415.87
0.73 1.92 23921.74 15341.59
0.74 1.80 23847.57 15281.63
0.76 1.70 23789.06 15233.54
0.77 1.61 23744.29 15195.99
0.78 1.54 23710.83 15167.01
0.79 1.47 23687.05 15145.38
0.80 1.41 23671.47 15129.72
0.81 1.36 23662.99 15119.14
0.81 1.31 23660.58 15112.77
0.82 1.27 23663.32 15109.86
0.82 1.23 23670.44 15109.74];
Z(:,:,2) = [0 6.70 NaN NaN
0.17 5.63 24826.03 16191.26
0.30 4.80 24198.87 15757.83
0.40 4.14 23720.27 15417.52
0.47 3.61 23347.38 15147.16
0.54 3.19 23058.01 14933.52
0.59 2.85 22836.18 14766.65
0.63 2.57 22667.24 14637.38
0.66 2.34 22539.27 14537.68
0.69 2.15 22445.60 14463.08
0.71 2.00 22379.90 14409.04
0.73 1.87 22336.70 14371.44
0.75 1.76 22311.74 14347.04
0.76 1.66 22301.57 14333.13
0.77 1.58 22303.32 14327.31
0.78 1.51 22314.83 14327.75
0.79 1.45 22334.27 14333.00
0.80 1.40 22360.11 14341.81
0.81 1.36 22391.09 14353.22
0.81 1.32 22426.11 14366.39
0.82 1.28 22464.22 14380.67
0.82 1.25 22504.61 14395.53
0.82 1.23 22546.61 14410.57];
Z(:,:,3) = [0 6.70 NaN NaN
0.19 5.45 22687.71 14805.97
0.34 4.53 22119.24 14408.55
0.44 3.84 21720.37 14120.95
0.52 3.31 21437.68 13912.54
0.58 2.90 21244.60 13766.39
0.63 2.59 21117.60 13667.05
0.66 2.34 21040.03 13602.91
0.69 2.14 21000.70 13565.85
0.72 1.98 20990.89 13549.24
0.73 1.85 21003.53 13547.54
0.75 1.74 21033.19 13556.41
0.76 1.65 21075.85 13572.54
0.77 1.58 21128.37 13593.46
0.78 1.52 21188.17 13617.25
0.79 1.47 21253.16 13642.44
0.80 1.42 21321.69 13668.02
0.80 1.39 21392.34 13693.18
0.81 1.36 21463.83 13717.38
0.81 1.33 21535.27 13740.33
0.81 1.31 21605.87 13761.81
0.82 1.29 21674.84 13781.70
0.82 1.27 21741.68 13799.97];
where I is time (y-axis), J is variables (x-axis) and K is batch (z-axis). However, since I want to use this data to do PCA and PLS analysis, I must change this (time x variables x batch) dimension to (batch (I) x variables (J) x time (K)) dimension, means that the new Z is Z(3 x 4 x 23).
To perform this I can extract the first row value from each slab (K dimension) and rearrange them as a new matrix slab using the following command:
T1=squeeze(Z(1,:,:))’
Thus, I use for loop to get the results for all 23 slabs. But I cant (dont know how to) store the results in workspace except for the last one. The command I used:
[I,J,K] = size(Z);
SLAB = zeros(K,J,I); %preallocating the matrix; where I=23,J=4,K=3
for t = 1 : I %here I = 23
slab = squeeze(Z(t,:,:))’; %removing semicolon here I can see the wanted results in command window
SLAB = slab;
end
HOpe anyone here can help me on this.
Thank you
I found the solution;
since I know the slab will have size of (K,J,I), so must provide the same format in the for loop:
[I,J,K] = size(Z);
SLAB = zeros(K,J,I); %preallocating the matrix; where I=23,J=4,K=3
for t = 1 : I %here I = 23
slab(:,:,t) = squeeze(Z(t,:,:))’;
end

Why is running "unique" faster on a data frame than a matrix in R?

I've begun to believe that data frames hold no advantages over matrices, except for notational convenience. However, I noticed this oddity when running unique on matrices and data frames: it seems to run faster on a data frame.
a = matrix(sample(2,10^6,replace = TRUE), ncol = 10)
b = as.data.frame(a)
system.time({
u1 = unique(a)
})
user system elapsed
1.840 0.000 1.846
system.time({
u2 = unique(b)
})
user system elapsed
0.380 0.000 0.379
The timing results diverge even more substantially as the number of rows is increased. So, there are two parts to this question.
Why is this slower for a matrix? It seems faster to convert to a data frame, run unique, and then convert back.
Is there any reason not to just wrap unique in myUnique, which does the conversions in part #1?
Note 1. Given that a matrix is atomic, it seems that unique should be faster for a matrix, rather than slower. Being able to iterate over fixed-size, contiguous blocks of memory should generally be faster than running over separate blocks of linked lists (I assume that's how data frames are implemented...).
Note 2. As demonstrated by the performance of data.table, running unique on a data frame or a matrix is a comparatively bad idea - see the answer by Matthew Dowle and the comments for relative timings. I've migrated a lot of objects to data tables, and this performance is another reason to do so. So although users should be well served to adopt data tables, for pedagogical / community reasons I'll leave the question open for now regarding the why does this take longer on the matrix objects. The answers below address where does the time go, and how else can we get better performance (i.e. data tables). The answer to why is close at hand - the code can be found via unique.data.frame and unique.matrix. :) An English explanation of what it's doing & why is all that is lacking.
In this implementation, unique.matrix is the same as unique.array
> identical(unique.array, unique.matrix)
[1] TRUE
unique.array has to handle multi-dimensional arrays which requires additional processing to ‘collapse’ the extra dimensions (those extra calls to paste()) which are not needed in the 2-dimensional case. The key section of code is:
collapse <- (ndim > 1L) && (prod(dx[-MARGIN]) > 1L)
temp <- if (collapse)
apply(x, MARGIN, function(x) paste(x, collapse = "\r"))
unique.data.frame is optimised for the 2D case, unique.matrix is not. It could be, as you suggest, it just isn't in the current implementation.
Note that in all cases (unique.{array,matrix,data.table}) where there is more than one dimension it is the string representation that is compared for uniqueness. For floating point numbers this means 15 decimal digits so
NROW(unique(a <- matrix(rep(c(1, 1+4e-15), 2), nrow = 2)))
is 1 while
NROW(unique(a <- matrix(rep(c(1, 1+5e-15), 2), nrow = 2)))
and
NROW(unique(a <- matrix(rep(c(1, 1+4e-15), 1), nrow = 2)))
are both 2. Are you sure unique is what you want?
Not sure but I guess that because matrix is one contiguous vector, R copies it into column vectors first (like a data.frame) because paste needs a list of vectors. Note that both are slow because both use paste.
Perhaps because unique.data.table is already many times faster. Please upgrade to v1.6.7 by downloading it from the R-Forge repository because that has the fix to unique you raised in this question. data.table doesn't use paste to do unique.
a = matrix(sample(2,10^6,replace = TRUE), ncol = 10)
b = as.data.frame(a)
system.time(u1<-unique(a))
user system elapsed
2.98 0.00 2.99
system.time(u2<-unique(b))
user system elapsed
0.99 0.00 0.99
c = as.data.table(b)
system.time(u3<-unique(c))
user system elapsed
0.03 0.02 0.05 # 60 times faster than u1, 20 times faster than u2
identical(as.data.table(u2),u3)
[1] TRUE
In attempting to answer my own question, especially part 1, we can see where the time is spent by looking at the results of Rprof. I ran this again, with 5M elements.
Here are the results for the first unique operation (for the matrix):
> summaryRprof("u1.txt")
$by.self
self.time self.pct total.time total.pct
"paste" 5.70 52.58 5.96 54.98
"apply" 2.70 24.91 10.68 98.52
"FUN" 0.86 7.93 6.82 62.92
"lapply" 0.82 7.56 1.00 9.23
"list" 0.30 2.77 0.30 2.77
"!" 0.14 1.29 0.14 1.29
"c" 0.10 0.92 0.10 0.92
"unlist" 0.08 0.74 1.08 9.96
"aperm.default" 0.06 0.55 0.06 0.55
"is.null" 0.06 0.55 0.06 0.55
"duplicated.default" 0.02 0.18 0.02 0.18
$by.total
total.time total.pct self.time self.pct
"unique" 10.84 100.00 0.00 0.00
"unique.matrix" 10.84 100.00 0.00 0.00
"apply" 10.68 98.52 2.70 24.91
"FUN" 6.82 62.92 0.86 7.93
"paste" 5.96 54.98 5.70 52.58
"unlist" 1.08 9.96 0.08 0.74
"lapply" 1.00 9.23 0.82 7.56
"list" 0.30 2.77 0.30 2.77
"!" 0.14 1.29 0.14 1.29
"do.call" 0.14 1.29 0.00 0.00
"c" 0.10 0.92 0.10 0.92
"aperm.default" 0.06 0.55 0.06 0.55
"is.null" 0.06 0.55 0.06 0.55
"aperm" 0.06 0.55 0.00 0.00
"duplicated.default" 0.02 0.18 0.02 0.18
$sample.interval
[1] 0.02
$sampling.time
[1] 10.84
And for the data frame:
> summaryRprof("u2.txt")
$by.self
self.time self.pct total.time total.pct
"paste" 1.72 94.51 1.72 94.51
"[.data.frame" 0.06 3.30 1.82 100.00
"duplicated.default" 0.04 2.20 0.04 2.20
$by.total
total.time total.pct self.time self.pct
"[.data.frame" 1.82 100.00 0.06 3.30
"[" 1.82 100.00 0.00 0.00
"unique" 1.82 100.00 0.00 0.00
"unique.data.frame" 1.82 100.00 0.00 0.00
"duplicated" 1.76 96.70 0.00 0.00
"duplicated.data.frame" 1.76 96.70 0.00 0.00
"paste" 1.72 94.51 1.72 94.51
"do.call" 1.72 94.51 0.00 0.00
"duplicated.default" 0.04 2.20 0.04 2.20
$sample.interval
[1] 0.02
$sampling.time
[1] 1.82
What we notice is that the matrix version spends a lot of time on apply, paste, and lapply. In contrast, the data frame version simple runs duplicated.data.frame and most of the time is spent in paste, presumably aggregating results.
Although this explains where the time is going, it doesn't explain why these have different implementations, nor the effects of simply changing from one object type to another.

Resources