I'm trying to run a for loop to make a balance table in Stata (comparing the demographics of my dataset with national-level statistics)
For this, I'm prepping my dataset and attempting to calculate the percentages/averages for some key demographics.
preserve
rename unearnedinc_wins95 unearninc_wins95
foreach var of varlist fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019 { //continuous or binary; to put categorical vars use kwallis test
dis "for variable `var':"
tabstat `var'
summ `var'
local `var'_samplemean=r(mean)
}
clear
set obs 11
gen var=""
gen sample=.
gen F=.
gen pvalue=.
replace var="% Female" if _n==1
replace var="Age" if _n==2
replace var="% Non-white" if _n==3
replace var="HH size" if _n==4
replace var="% Parent" if _n==5
replace var="% Employed" if _n==6
replace var="Savings stock ($)" if _n==7
replace var="Debt stock ($)" if _n==8
replace var="Earned income last mo. ($)" if _n==9
replace var="Unearned income last mo. ($)" if _n==10
replace var="% Under FPL 2019" if _n==11
foreach col of varlist sample {
replace `col'=100*round(`fem_`col'mean', 0.01) if _n==1
replace `col'=round(`age_`col'mean') if _n==2
replace `col'=100*round(`nonwhite_`col'mean', 0.01) if _n==3
replace `col'=round(`hhsize_`col'mean', 0.1) if _n==4
replace `col'=100*round(`parent_`col'mean', 0.01) if _n==5
replace `col'=100*round(`employed_`col'mean', 0.01) if _n==6
replace `col'=round(`savings_wins95_`col'mean') if _n==7
replace `col'=round(`debt_wins95_`col'mean') if _n==8
replace `col'=round(`earnedinc_wins95_`col'mean') if _n==9
replace `col'=round(`unearninc_wins95_`col'mean') if _n==10
replace `col'=100*round(`underfpl2019_`col'mean', 0.01) if _n==11
}
I'm trying to run the following loop, but in the second half of the loop, I keep getting an 'invalid syntax' error. For context, in the first half of the loop (before clearing the dataset), the code stores the average values of the variables as a macro (`var'_samplemean). Can someone help me out and mend this loop?
My sample data:
clear
input byte fem float(age nonwhite) byte(hhsize parent) float employed double(savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95) float underfpl2019
1 35 1 6 1 1 0 2500 0 0 0
0 40 0 4 1 1 0 10000 1043 0 0
0 40 0 4 1 1 0 20000 2400 0 0
0 40 0 4 1 1 .24 20000 2000 0 0
0 40 0 4 1 1 10 . 2600 0 0
Thanks!
Thanks for sharing the snippet of data. Apart from the fact the variable unearninc_wins95 has already been renamed in your sample data, the code runs fine for me without returning an error.
That being said, the columns for your F-statistics and p-values are empty once the loop at the bottom of your code completes. As far as I can see there is no local/varlist called sample which you're attempting to call with the line foreach col of varlist sample{. This could be because you haven't included it in your code, in which case please do, or it could be because you haven't created the local/varlist sample, in which case this could well be the source of your error message.
Taking a step back, there are more efficient ways of achieving what I think you're after. For example, you can get (part of) what you want using the package stat2data (if you don't have it installed already, run ssc install stat2data from the command prompt). You can then run the following code:
stat2data fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019, saving("~/yourstats.dta") stat(count mean)
*which returns:
preserve
use "~/yourstats.dta", clear
. list, sep(11)
+----------------------------+
| _name sN smean |
|----------------------------|
1. | fem 5 .2 |
2. | age 5 39 |
3. | nonwhite 5 .2 |
4. | hhsize 5 4.4 |
5. | parent 5 1 |
6. | employed 5 1 |
7. | savings_wins 5 2.048 |
8. | debt_wins95 4 13125 |
9. | earnedinc_wi 5 1608.6 |
10. | unearninc_wi 5 0 |
11. | underfpl2019 5 0 |
+----------------------------+
restore
This is missing the empty F-statistic and p-value variables you created in your code above, but you can always add them in the same way you have with gen F=. and gen pvalue=.. The presence of these variables though indicates you want to run some tests at some point and then fill the cells with values from them. I'd offer advice on how to do this but it's not obvious to me from your code what you want to test. If you can clarify this I will try and edit this answer to include that.
This doesn't answer your question directly; as others gently point out the question is hard to answer without a reproducible example. But I have several small comments on your code which are better presented in this form.
Assuming that all the variables needed are indeed present in the dataset, I would recommend something more like this:
local myvarlist fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019
local desc `" "% Female" "Age" "% Non-white" "HH size" "% Parent" "% Employed" "Savings stock ($)" "Debt stock ($)" "Earned income last mo. ($)" "Unearned income last mo. ($)" "% Under FPL 2019" "'
local i = 1
gen variable = ""
gen mean = ""
local i = 1
foreach var of local myvars {
summ `var', meanonly
local this : word `i' of `desc'
replace variable = "`this'" in `i'
if inlist(`i', 1, 3, 5, 6, 11) {
replace mean = strofreal(100 * r(mean), "%2.0f") in `i'
}
else if `i' == 4 {
replace mean = strofreal(r(mean), "%2.1f") in `i'
}
else replace mean = strofreal(r(mean), "%2.0f") in `i'
local ++i
}
This has not been tested.
Points arising include:
Using in is preferable for what you want over testing the observation number with if.
round() is treacherous for rounding to so many decimal places. Most of the time you will get what you want, but occasionally you will get bizarre results arising from the fact that Stata works in binary, like any equivalent program. It is safer to treat rounding as a problem in string manipulation and use display formats as offering precisely what you want.
If the text you want to show is just the variable label for each variable, this code could be simplified further.
The code hints at intent to show other stuff, which is easily done compatibly with this design.
IN EXCEL SHEET FOR THE BELOW INPUT, I HAVE TO USE FILTER TO “NET” FIRST WHERE NET=APB AND NEED TO FILTER “CODE VALUES” AS WDL, LRTF & NEED TO USE “PIVOT TABLE” TO GET OUTPUT WITH COUNT AS:
BUT I NEED CODE IN ORACLE TO RUN FOR THE FOLLOWING OUTPUT:-
INPUT:
STTID
AMOUNT
NET
CODE
SVPC12309A
5000
NFS
SOP
SVPC12309A
10000
NFS
WDL
000DHP11291
2500
APB
WDL
SVPC12309A
3000
CMV
LRTF
SVPC12309A
3000
CMV
WDL
DHP12341
4500
APB
LRTF
DHP23451
9500
APB
LRTF
DHP12341
5500
APB
LRTF
OUTPUT AS:
STTID
LRTF
WDL
TOTAL
000DHP11291
0
1
1
DHP12341
2
0
2
DHP23451
1
0
1
It appears you want something like
select sttid,
sum( case when code = 'LRTF' then 1 else 0 end ) ltrf,
sum( case when code = 'WDL' then 1 else 0 end ) wdl,
sum( case when code in ('WDL', 'LTRF') then 1 else 0 end) total
from your_table_name
group by sttid
Same issue I posted Friday but I will be more specific this time. I have this data:
UserId Action Id Date
1 1 1/1/2018
1 2 1/1/2018
1 2 2/1/2018
2 3 3/1/2018
2 4 4/1/2018
And I want a filter that will yield the following:
Count Instances from FirstDate to 2/1/2018
UserId ActionCount
1 3
2 0
In the data load editor you want to group by the User in order to get that first date:
GroupedUserData:
Load
UserId
min(Date) as FirstDate
resident [The name of your original table];
And then you want to use set analysis chart-side:
sum({<FirstDate = {'<=2/1/2018'}>} ActionCount)
Good evening,
I'm trying to solve a problem on Codewars:
In this little assignment you are given a string of space separated numbers, and have to return the highest and lowest number.
Example:
high_and_low("1 2 3 4 5") # return "5 1"
high_and_low("1 2 -3 4 5") # return "5 -3"
high_and_low("1 9 3 4 -5") # return "9 -5"
Notes:
All numbers are valid Int32, no need to validate them.
There will always be at least one number in the input string.
Output string must be two numbers separated by a single space, and highest number is first.
I came up with the following solution however I cannot figure out why the method is only returning "542" and not "-214 542". I also tried using #at, #shift and #pop, with the same result.
Is there something I am missing? I hope someone can point me in the right direction. I would like to understand why this is happening.
def high_and_low(numbers)
numberArray = numbers.split(/\s/).map(&:to_i).sort
numberArray[-1]
numberArray[0]
end
high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6")
EDIT
I also tried this and receive a failed test "Nil":
def high_and_low(numbers)
numberArray = numbers.split(/\s/).map(&:to_i).sort
puts "#{numberArray[-1]}" + " " + "#{numberArray[0]}"
end
When omitting the return statement, a function will only return the result of the last expression within its body. To return both as an Array write:
def high_and_low(numbers)
numberArray = numbers.split(/\s/).map(&:to_i).sort
return numberArray[0], numberArray[-1]
end
puts high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6")
# => [-214, 542]
Using sort would be inefficient for big arrays. Instead, use Enumerable#minmax:
numbers.split.map(&:to_i).minmax
# => [-214, 542]
Or use Enumerable#minmax_by if you like result to remain strings:
numbers.split.minmax_by(&:to_i)
# => ["-214", "542"]
I use F95/90 and IBM compiler. I am trying to extract the numerical values from block and write in a file. I am facing a strange error in the output which I cannot understand. Every time I execute the program it skips the loop between 'Beta' and 'END'. I am trying to read and store the values.
The number of lines inside the Alpha- and Beta loops are not fixed. So a simple 'do loop' is of no use to me. I tried the 'do while' loop and also 'if-else' but it still skips the 'Beta' part.
Alpha Singles Amplitudes
15 3 23 4 -0.186952
15 3 26 4 0.599918
15 3 31 4 0.105048
15 3 23 4 0.186952
Beta Singles Amplitudes
15 3 23 4 0.186952
15 3 26 4 -0.599918
15 3 31 4 -0.105048
15 3 23 4 -0.186952
END `
The simple short code is :
program test_read
implicit none
integer::nop,a,b,c,d,e,i,j,k,l,m,ios
double precision::r,t,rr
character::dummy*300
character*15::du1,du2,du3
open (unit=10, file="1.txt", status='old',form='formatted')
100 read(10,'(a100)')dummy
if (dummy(1:3)=='END') goto 200
if(dummy(2:14)=='Alpha Singles') then
i=0
160 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
do while(du1.ne.' Bet')
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'AS',du1,b,du2,c,du3,d,du4,e,r
goto 160
end do
elseif (dummy(2:14)=='Beta Singles') then
170 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
if((du1=='END'))then
stop
else
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'BS',du1,b,du2,c,du3,d,du4,e,r
goto 170
end if
end if
goto 100
200 print*,'This is the end'
end program test_read
Your program never gets out of the loop which checks for Beta because when your while loop exits, it has already read the line with Beta. It then goes to 100 which reads the next line after Beta, so you never actually see Beta Singles. Try the following
character(len=2):: tag
read(10,'(a100)')dummy
do while (dummy(1:3).ne.'END')
if (dummy(2:14)=='Alpha Singles') then
tag = 'AS'
else if (dummy(2:14)=='Beta Singles') then
tag = 'BS'
else
read(dummy,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')tag,du1,b,du2,c,du3,d,du4,e,r
end if
read(10, '(a100)') dummy
end do
print*,'This is the end'