Hello I want to use National Science Foundation dataset but the raw excel file variable names does not transpose the data properly. Does anyone have any sample code on how to transpose the dataset so that it fit the stata format for analysis. Here is the raw excel file so you can understand the problem.
Raw Excel File from NSF website
*EDITED to add view of import
Keeping in mind Nick's apt point about the nature of SO, I do have some example code that may be helpful here. Without knowing what you mean by "transpose" I cannot give an exact answer, but you can adapt the reshape commands below to your purposes.
Import and view
// Import Excel File defining TL and BR of table
import excel "nsb20197-tabs02-012.xlsx", cellrange(A6:K177) clear
list in 1/15
+----------------------------------------------------------------------------------------------------------------------------------------+
| A B C D E F G H I J K |
|----------------------------------------------------------------------------------------------------------------------------------------|
1. | Associate's level |
2. | American Indian or Alaska Native |
3. | All fields 6282 4131 2151 65.8 34.2 8935 5697 3238 63.8 36.2 |
4. | S&E 608 386 222 63.5 36.5 942 495 447 52.5 47.5 |
5. | Engineering 17 4 13 23.5 76.5 50 8 42 16 84 |
|----------------------------------------------------------------------------------------------------------------------------------------|
6. | Natural sciences 384 220 164 57.3 42.7 453 174 279 38.4 61.6 |
7. | Social and behavioral sciences 207 162 45 78.3 21.7 439 313 126 71.3 28.7 |
8. | Non-S&E 5674 3745 1929 66 34 7993 5202 2791 65.09999999999999 34.9 |
9. | Asian or Pacific Islander |
10. | All fields 27313 15522 11791 56.8 43.2 54809 30916 23893 56.4 43.6 |
|----------------------------------------------------------------------------------------------------------------------------------------|
11. | S&E 2649 1284 1365 48.5 51.5 7862 3492 4370 44.4 55.6 |
12. | Engineering 160 23 137 14.4 85.59999999999999 574 111 463 19.3 80.7 |
13. | Natural sciences 2010 939 1071 46.7 53.3 4419 1562 2857 35.3 64.7 |
14. | Social and behavioral sciences 479 322 157 67.2 32.8 2869 1819 1050 63.4 36.6 |
15. | Non-S&E 24664 14238 10426 57.7 42.3 46947 27424 19523 58.4 41.6 |
Structuring the data
// Create supercategories
* level = Column A if column A contains the word level (ignoring case)
gen level = word(A,1) if ustrregexm(A, "level", 1), before(A)
* demographic = Column A if next ob in column A contains the word all fields (ignoring case)
gen demographic = A if ustrregexm(A[_n+1], "all fields", 1), before(A)
* fill down demographic and level, and drop blank rows
foreach v of varlist level demographic {
replace `v' = `v'[_n-1] if missing(`v')
}
drop if mi(demographic) | demographic == A | regexm(A, level)
// rename variables
* rename A
rename A field
* rename count columns
local list "B C D E F"
local year = 2000
rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
local list "G H I J K"
local year = 2017
rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
* destring
destring *_2000 *_2017, replace
Reshaping to long
* reshape long
drop perc*
reshape long all_ male_ female_, i(level demographic field) j(year)
rename *_ degrees*
reshape long degrees, i(level demographic field year) j(gender) string
An example of how one might reshape wide by field
* test reshape wide by field + ensure the variable name is less than 32 characters post reshape
replace field = lower(strtoname(field))
replace field = substr(field, 1, 32 - strlen("degrees") - 1)
reshape wide degrees, i(level demographic year gender) j(field) string
What would be the Big O of this function:
def foo(n):
if (n <= 1):
return 0
else:
return n + foo(n/2) + foo(n/2)
I think it might O(2^logn) because in each call, there are two other calls and n is divided by 2 until it gets to 1, thus the logn.
Yes, it is O(2logn), which really is equivalent to O(n).
You can analyse it as follows:
T(n) = 1 + 2T(n/2)
= 1 + 2(1 + 2T(n/4)) = (2²-1) + 2²T(n/4)
= (2²-1) + 2²(1 + 2T(n/8)) = (2³-1) + 2³T(n/8)
...
= (2logn - 1) + 2logn
= (n-1) + 2n
= O(n)
This may be a surprising result, but a snippet measuring the fraction between n and the number of calls of foo, illustrates this time complexity:
let counter;
function foo(n) {
counter++;
if (n <= 1)
return 0;
else
return n + foo(n/2) + foo(n/2);
}
for (let n = 1; n < 100000; n*=2) {
counter = 0;
foo(n);
console.log(counter/n);
}
Moreover, you can use the master theorem as well (the first case). In your case as the recursive relation is T(n) = 2T(n/2) + 1, if we want to write the relation in the form of T(n) = aT(n/b) + f(n), we will have:
a = 2, b = 2, f(n) = 1
=> c_crit = log2(2) = 1
As f(n) = 1 = O(n^c) for all c > 0, we can find a 0 < c < 1 that c < c_crit = 1. Hence, the first case of the master theorem is satisfied. Therefore, T(n) = \Theta(n).
I've been trying to reproduce a 16-bit Adder, but I can't seem to get any results on my testbench. Any leads?
module adder(a,b,c,s,cout);
input a,b,c;
output s,cout;
xor #1
g1(w1,a,b),
g2(s,w1,c);
and #1
g3(w2,c,b),
g4(w3,c,a),
g5(w4,a,b);
or #1
g6(cout,w2,w3,w4);
endmodule
module sixteenbitAdder(x,y,s,cout,cin);
input [15:0] x,y;
output [15:0] s;
input cin;
output cout;
wire [15:0] c;
adder f0 (x[0],y[0],cin,s[0],c[0]);
adder f1 (x[1],y[1],c[0],s[0],c[1]);
adder f2 (x[2],y[2],c[1],s[2],c[2]);
adder f3 (x[3],y[3],c[2],s[3],c[3]);
adder f4 (x[4],y[4],c[3],s[4],c[4]);
adder f5 (x[5],y[5],c[4],s[5],c[5]);
adder f6 (x[6],y[6],c[5],s[6],c[6]);
adder f7 (x[7],y[7],c[6],s[7],c[7]);
adder f8 (x[8],y[8],c[7],s[8],c[8]);
adder f9 (x[9],y[9],c[8],s[9],c[9]);
adder f10 (x[10],y[10],c[9],s[10],c[10]);
adder f11 (x[11],y[11],c[10],s[11],c[11]);
adder f12 (x[12],y[12],c[11],s[12],c[12]);
adder f13 (x[13],y[13],c[12],s[13],c[13]);
adder f14 (x[14],y[14],c[13],s[14],c[14]);
adder f15 (x[15],y[15],c[14],s[15],cout);
endmodule
And here is my testbench. I don't know how wrong it is.
module test();
wire [15:0] x,y,s;
wire cin, cout;
testAdder testt (x,y,s,cout,cin);
sixteenbitAdder adderr (x,y,s,cout,cin);
endmodule
module testAdder(a,b,s,cout,cin);
input [15:0] s;
input cout;
output [15:0] a,b;
output cin;
reg [15:0] a,b;
reg cin;
initial
begin
$monitor($time,,"a=%d, b=%d, cin=%b, s=%d, cout=%b",a,b,cin,s,cout);
$display($time,,"a=%d, b=%d, cin=%b, s=%d, cout=%b",a,b,cin,s,cout);
#50 a=1; b=2; cin=0;
end
endmodule
This is what I'm given back
0 a= x, b= x, cin=x, s= z, cout=z
0 a= x, b= x, cin=x, s= X, cout=x
50 a= 1, b= 2, cin=0, s= X, cout=x
52 a= 1, b= 2, cin=0, s= X, cout=0
53 a= 1, b= 2, cin=0, s= X, cout=0
55 a= 1, b= 2, cin=0, s= Z, cout=0
Your design has a bug. Change:
adder f1 (x[1],y[1],c[0],s[0],c[1]);
to:
adder f1 (x[1],y[1],c[0],s[1],c[1]);
//-------------------------^
Outputs:
0 a= x, b= x, cin=x, s= x, cout=x
50 a= 1, b= 2, cin=0, s= x, cout=x
52 a= 1, b= 2, cin=0, s= X, cout=0
53 a= 1, b= 2, cin=0, s= X, cout=0
55 a= 1, b= 2, cin=0, s= 3, cout=0
There is no need for $display
This question already has answers here:
Does SparkSQL support subquery?
(2 answers)
Closed 6 years ago.
When I am running this query i got this type of error
select * from raw_2 where ip NOT IN (select * from raw_1);
org.apache.spark.sql.AnalysisException:
Unsupported language features in query:
select * from raw_2 where ip NOT IN (select * from raw_1)
TOK_QUERY 1, 0,24, 14
TOK_FROM 1, 4,6, 14
TOK_TABREF 1, 6,6, 14
TOK_TABNAME 1, 6,6, 14
raw_2 1, 6,6, 14
TOK_INSERT 0, -1,24, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 0, 0,2, 0
TOK_SELEXPR 0, 2,2, 0
TOK_ALLCOLREF 0, 2,2, 0
TOK_WHERE 1, 8,24, 29
NOT 1, 10,24, 29
TOK_SUBQUERY_EXPR 1, 14,10, 33
TOK_SUBQUERY_OP 1, 14,14, 33
IN 1, 14,14, 33
TOK_QUERY 1, 16,24, 51
TOK_FROM 1, 21,23, 51
TOK_TABREF 1, 23,23, 51
TOK_TABNAME 1, 23,23, 51
raw_1 1, 23,23, 51
TOK_INSERT 0, -1,19, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 0, 17,19, 0
TOK_SELEXPR 0, 19,19, 0
TOK_ALLCOLREF 0, 19,19, 0
TOK_TABLE_OR_COL 1, 10,10, 26
ip 1, 10,10, 26
scala.NotImplementedError: No parse rules for ASTNode type: 817, text:
TOK_SUBQUERY_EXPR :
TOK_SUBQUERY_EXPR 1, 14,10, 33
TOK_SUBQUERY_OP 1, 14,14, 33
IN 1, 14,14, 33
TOK_QUERY 1, 16,24, 51
TOK_FROM 1, 21,23, 51
TOK_
Spark 2.0.0+:
since 2.0.0 Spark supports a full range of subqueries. See Does SparkSQL support subquery? for details.
Spark < 2.0.0
Does Spark support subqqueries?
Generally speaking it does. Constructs like SELECT * FROM (SELECT * FROM foo WHERE bar = 1) as tmp perfectly valid queries in the Spark SQL.
As far as I can tell from the Catalyst parser source it doesn't support inner queries in a NOT IN clause:
| termExpression ~ (NOT ~ IN ~ "(" ~> rep1sep(termExpression, ",")) <~ ")" ^^ {
case e1 ~ e2 => Not(In(e1, e2))
}
It is still possible to use outer join followed by filter to obtain the same effect.