Related
I'm trying to run a for loop to make a balance table in Stata (comparing the demographics of my dataset with national-level statistics)
For this, I'm prepping my dataset and attempting to calculate the percentages/averages for some key demographics.
preserve
rename unearnedinc_wins95 unearninc_wins95
foreach var of varlist fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019 { //continuous or binary; to put categorical vars use kwallis test
dis "for variable `var':"
tabstat `var'
summ `var'
local `var'_samplemean=r(mean)
}
clear
set obs 11
gen var=""
gen sample=.
gen F=.
gen pvalue=.
replace var="% Female" if _n==1
replace var="Age" if _n==2
replace var="% Non-white" if _n==3
replace var="HH size" if _n==4
replace var="% Parent" if _n==5
replace var="% Employed" if _n==6
replace var="Savings stock ($)" if _n==7
replace var="Debt stock ($)" if _n==8
replace var="Earned income last mo. ($)" if _n==9
replace var="Unearned income last mo. ($)" if _n==10
replace var="% Under FPL 2019" if _n==11
foreach col of varlist sample {
replace `col'=100*round(`fem_`col'mean', 0.01) if _n==1
replace `col'=round(`age_`col'mean') if _n==2
replace `col'=100*round(`nonwhite_`col'mean', 0.01) if _n==3
replace `col'=round(`hhsize_`col'mean', 0.1) if _n==4
replace `col'=100*round(`parent_`col'mean', 0.01) if _n==5
replace `col'=100*round(`employed_`col'mean', 0.01) if _n==6
replace `col'=round(`savings_wins95_`col'mean') if _n==7
replace `col'=round(`debt_wins95_`col'mean') if _n==8
replace `col'=round(`earnedinc_wins95_`col'mean') if _n==9
replace `col'=round(`unearninc_wins95_`col'mean') if _n==10
replace `col'=100*round(`underfpl2019_`col'mean', 0.01) if _n==11
}
I'm trying to run the following loop, but in the second half of the loop, I keep getting an 'invalid syntax' error. For context, in the first half of the loop (before clearing the dataset), the code stores the average values of the variables as a macro (`var'_samplemean). Can someone help me out and mend this loop?
My sample data:
clear
input byte fem float(age nonwhite) byte(hhsize parent) float employed double(savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95) float underfpl2019
1 35 1 6 1 1 0 2500 0 0 0
0 40 0 4 1 1 0 10000 1043 0 0
0 40 0 4 1 1 0 20000 2400 0 0
0 40 0 4 1 1 .24 20000 2000 0 0
0 40 0 4 1 1 10 . 2600 0 0
Thanks!
Thanks for sharing the snippet of data. Apart from the fact the variable unearninc_wins95 has already been renamed in your sample data, the code runs fine for me without returning an error.
That being said, the columns for your F-statistics and p-values are empty once the loop at the bottom of your code completes. As far as I can see there is no local/varlist called sample which you're attempting to call with the line foreach col of varlist sample{. This could be because you haven't included it in your code, in which case please do, or it could be because you haven't created the local/varlist sample, in which case this could well be the source of your error message.
Taking a step back, there are more efficient ways of achieving what I think you're after. For example, you can get (part of) what you want using the package stat2data (if you don't have it installed already, run ssc install stat2data from the command prompt). You can then run the following code:
stat2data fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019, saving("~/yourstats.dta") stat(count mean)
*which returns:
preserve
use "~/yourstats.dta", clear
. list, sep(11)
+----------------------------+
| _name sN smean |
|----------------------------|
1. | fem 5 .2 |
2. | age 5 39 |
3. | nonwhite 5 .2 |
4. | hhsize 5 4.4 |
5. | parent 5 1 |
6. | employed 5 1 |
7. | savings_wins 5 2.048 |
8. | debt_wins95 4 13125 |
9. | earnedinc_wi 5 1608.6 |
10. | unearninc_wi 5 0 |
11. | underfpl2019 5 0 |
+----------------------------+
restore
This is missing the empty F-statistic and p-value variables you created in your code above, but you can always add them in the same way you have with gen F=. and gen pvalue=.. The presence of these variables though indicates you want to run some tests at some point and then fill the cells with values from them. I'd offer advice on how to do this but it's not obvious to me from your code what you want to test. If you can clarify this I will try and edit this answer to include that.
This doesn't answer your question directly; as others gently point out the question is hard to answer without a reproducible example. But I have several small comments on your code which are better presented in this form.
Assuming that all the variables needed are indeed present in the dataset, I would recommend something more like this:
local myvarlist fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019
local desc `" "% Female" "Age" "% Non-white" "HH size" "% Parent" "% Employed" "Savings stock ($)" "Debt stock ($)" "Earned income last mo. ($)" "Unearned income last mo. ($)" "% Under FPL 2019" "'
local i = 1
gen variable = ""
gen mean = ""
local i = 1
foreach var of local myvars {
summ `var', meanonly
local this : word `i' of `desc'
replace variable = "`this'" in `i'
if inlist(`i', 1, 3, 5, 6, 11) {
replace mean = strofreal(100 * r(mean), "%2.0f") in `i'
}
else if `i' == 4 {
replace mean = strofreal(r(mean), "%2.1f") in `i'
}
else replace mean = strofreal(r(mean), "%2.0f") in `i'
local ++i
}
This has not been tested.
Points arising include:
Using in is preferable for what you want over testing the observation number with if.
round() is treacherous for rounding to so many decimal places. Most of the time you will get what you want, but occasionally you will get bizarre results arising from the fact that Stata works in binary, like any equivalent program. It is safer to treat rounding as a problem in string manipulation and use display formats as offering precisely what you want.
If the text you want to show is just the variable label for each variable, this code could be simplified further.
The code hints at intent to show other stuff, which is easily done compatibly with this design.
I am having an hash whose keys are week numbers and values are attendance scores. I am tying to calculate the average attendance for each month based on the week number i.e.keys.
Below is the example of the hash
weekly_attendance = {31 => 40.0, 32 => 100.00, 33 => 34.00, 34 => 23.78, 35 => 56.79, 36 => 44.50, 37 => 67.00, 38 => 55.00 }
Since a month consists of 4 weeks and the beginning week of the month is divisible by 4, the attendance needs to be sorted as follows
Month 1 attendance consists of weeks 31,32 i.e. (40.00+100.00)/2 =70.0
Month 2 attendance consists of weeks 33,34,35,36
i.e. (34.00+23.78+56.79+44.50)/4 = 39.5
Month 3 attendance consists of weeks 37, 38 i.e. (67.00+55.00)/2 = 69.5
The output should be
monthly_attendance = [70.0,39.5,61]
I had tried each and select approaches and used the modulo operator condition i.e. week % 4 == 0 to add the attendance values. But could not effectively group them based on months
tmp = 0
monthly_attendance = []
weekly_attendance.select do |k,v|
tmp += v
monthly_attendance << tmp if k % 4 == 0
end
I am unable to sort the week number in ranges using the above code.
You can try something like this:
results = weekly_attendance.group_by { |week, value| (week + 3) / 4 }.map do |month, groups|
values = groups.map(&:last)
average = values.inject(0) { |sum, val| sum + val } / values.length
[month, average]
end.to_h
p results # {8=>70.0, 9=>39.7675, 10=>61.0}
But the logic of converting weeks to months is flawed here, it's better to use some calendar function instead of just division by 4.
You can get the real month numbers using:
require 'date'
weekly_attendance.group_by { |week, value| Date.commercial(Time.now.year, week, 1).month }
But the result will not match the result you expect, because for example week 31 is in July, while week 32 is in August (this year), instead of being the same month like you expect.
I assume that if x units are produced in a given week, x/7 units are produced on each day of that week. The code below could be easily changed if this assumption were changed.
First construct a hash whose keys are months (1-12) and whose values are hashes whose keys are weeks and whose values are the numbers of days in the given week for the given month. (Whew!)
require 'date'
def months_to_weeks(year)
day = Date.new(year)
days = day.leap? ? 365 : 364
days.times.with_object(Hash.new { |h,k| h[k] = Hash.new(0) }) do |_,h|
h[day.month][day.cweek] += 1
day = day.next
end
end
The doc for Hash#new provides an explanation of the statement:
Hash.new { |h,k| h[k] = Hash.new(0) }
In brief, this creates an empty hash with a default given by the block. If h is the hash that is created, and h does not have a key k, h[k] will cause the block to be executed, which adds that key to the hash and sets its value to an empty hash with a default value of 0. The latter hash is often referred to as a "counting hash". I realize this is still rather a mouthful for a Ruby newbie.
Let's generate this hash for the current year:
year = 2015
mon_to_wks = months_to_weeks(year)
#=> {1 =>{1 =>4, 2 =>7, 3 =>7, 4 =>7, 5=>6},
# 2 =>{5 =>1, 6 =>7, 7 =>7, 8 =>7, 9=>6},
# 3 =>{9 =>1, 10=>7, 11=>7, 12=>7, 13=>7, 14=>2},
# 4 =>{14=>5, 15=>7, 16=>7, 17=>7, 18=>4},
# 5 =>{18=>3, 19=>7, 20=>7, 21=>7, 22=>7},
# 6 =>{23=>7, 24=>7, 25=>7, 26=>7, 27=>2},
# 7 =>{27=>5, 28=>7, 29=>7, 30=>7, 31=>5},
# 8 =>{31=>2, 32=>7, 33=>7, 34=>7, 35=>7, 36=>1},
# 9 =>{36=>6, 37=>7, 38=>7, 39=>7, 40=>3},
# 10=>{40=>4, 41=>7, 42=>7, 43=>7, 44=>6},
# 11=>{44=>1, 45=>7, 46=>7, 47=>7, 48=>7, 49=>1},
# 12=>{49=>6, 50=>7, 51=>7, 52=>7, 53=>3}}
Because of how Date#cweek is defined, the weeks in this hash begin on Mondays. In January, for example, there 4 days are in week 1. These four days, Jan. 1-4, 2015, would be the first Thursday, Friday, Saturday and Sunday of 2015. (Check your calendar.)
If the first day of each week is to be a day other than Monday (Sunday, for example) the hash calculation would have to be changed slightly.
This shows, for example, that in January of 2015, there are 4 days in week 1, 7 days in weeks 2, 3 and 4 and 6 days in week 5. The remaining day of week 5 is the first day in February.
Once this hash has been constructed, it is a simple matter to compute the averages for each month:
weekly_attendance = {31 => 40.00, 32 => 100.00, 33 => 34.00, 34 => 23.78,
35 => 56.79, 36 => 44.50, 37 => 67.00, 38 => 55.00 }
prod_by_mon = (1..12).each_with_object(Hash.new(0)) do |i,h|
mon_to_wks[i].each do |week, days|
h[i] += (days/7.0)*weekly_attendance[week] if weekly_attendance.key?(week)
end
end
#=> {7=>28.571428571428573, 8=>232.3557142857143, 9=>160.14285714285714}
prod_by_mon.merge(prod_by_mon) { |_,v| v.round(2) }
#=> {7=>28.57, 8=>232.36, 9=>160.14}
This shows that production in month 7 was 27.57, and so on. Note that:
28.57 + 232.36 + 160.14 #=> 421.07
weekly_attendance.values.reduce(:+) #=> 421.07
I want to realize multiple processes. I have to send the data which bubble-sorted in different child processes back to parent process then merge data. This is part of my code:
rd1,wt1 = IO.pipe # reader & writer
pid1 = fork {
rd1.close
numbers = Marshal.load(Marshal.dump(copylist[0,p]))
bubble_sort(numbers)
sList[0] = numbers.clone
wt1.write Marshal.dump(sList[0])
Process.exit!(true)
}
Process.waitpid(pid1)
Process.waitpid(pid2)
wt1.close
wt2.close
pid5 = fork {
rd5.close
a = Marshal.load(rd1.gets)
b = Marshal.load(rd2.gets)
mList[0] = merge( a,b).clone
wt5.write Marshal.dump(mList[0])
Process.exit!(true)
}
There are pid1...pid7, rd1...rd7, wt1...wt7. pid1...pid4 are bubble-sort 4 part of data. pid5 and 6 merge data from pid1, 2 and pid 3, 4. Finally, pid7 merges the data from pid5 and 6.
When data size is small, it succeeds, but when I input larger data (10000):
Data example : 121 45 73 89 11 452 515 32 1 99 4 88 41 53 159 482 2013 2 ...
then, errors occur: :in 'load': marshal data too short (ArgumentError) and another kind error: in 'load': instance of IO needed (TypeError). The first error line is in pid5: a = ... and pid6: b = .... The other kind of error line is in pid7: b = .... Are my data too big for this method?
Marshal.load and Marshal.dump work with binary data. The problem with the short reads is here:
a = Marshal.load(rd1.gets)
b = Marshal.load(rd2.gets)
#gets reads up to a new-line (or end of file) and then stops. The trouble is that new-line may be present in the binary data created by Marshal.dump.
Change gets to read in both lines.
I use F95/90 and IBM compiler. I am trying to extract the numerical values from block and write in a file. I am facing a strange error in the output which I cannot understand. Every time I execute the program it skips the loop between 'Beta' and 'END'. I am trying to read and store the values.
The number of lines inside the Alpha- and Beta loops are not fixed. So a simple 'do loop' is of no use to me. I tried the 'do while' loop and also 'if-else' but it still skips the 'Beta' part.
Alpha Singles Amplitudes
15 3 23 4 -0.186952
15 3 26 4 0.599918
15 3 31 4 0.105048
15 3 23 4 0.186952
Beta Singles Amplitudes
15 3 23 4 0.186952
15 3 26 4 -0.599918
15 3 31 4 -0.105048
15 3 23 4 -0.186952
END `
The simple short code is :
program test_read
implicit none
integer::nop,a,b,c,d,e,i,j,k,l,m,ios
double precision::r,t,rr
character::dummy*300
character*15::du1,du2,du3
open (unit=10, file="1.txt", status='old',form='formatted')
100 read(10,'(a100)')dummy
if (dummy(1:3)=='END') goto 200
if(dummy(2:14)=='Alpha Singles') then
i=0
160 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
do while(du1.ne.' Bet')
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'AS',du1,b,du2,c,du3,d,du4,e,r
goto 160
end do
elseif (dummy(2:14)=='Beta Singles') then
170 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
if((du1=='END'))then
stop
else
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'BS',du1,b,du2,c,du3,d,du4,e,r
goto 170
end if
end if
goto 100
200 print*,'This is the end'
end program test_read
Your program never gets out of the loop which checks for Beta because when your while loop exits, it has already read the line with Beta. It then goes to 100 which reads the next line after Beta, so you never actually see Beta Singles. Try the following
character(len=2):: tag
read(10,'(a100)')dummy
do while (dummy(1:3).ne.'END')
if (dummy(2:14)=='Alpha Singles') then
tag = 'AS'
else if (dummy(2:14)=='Beta Singles') then
tag = 'BS'
else
read(dummy,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')tag,du1,b,du2,c,du3,d,du4,e,r
end if
read(10, '(a100)') dummy
end do
print*,'This is the end'
I have written the following function that find if a pixel belongs to an image in matlab.
At the beginning, I wanted to test it as if a number in a set belongs to a vector like the following:
function traverse_pixels(img)
for i:1:length(img)
c(i) = img(i)
end
But, when I run the following commands for example, I get the error shown at the end:
>> A = [ 34 565 456 535 34 54 5 5 4532 434 2345 234 32332434];
>> traverse_pixels(A);
??? Error: File: traverse_pixels.m Line: 2 Column: 6
Unexpected MATLAB operator.
Why is that? How can I fix the problem?
Thanks.
There is a syntax error in the head of your for loop, it's supposed to be:
for i = 1:length(img)
Also, to check if an array contains a specific value you could use:
A = [1 2 3]
if sum(A==2)>0
disp('there is at least one 2 in A')
end
This should be faster since no for loop is included.
for i = 1:length(image)
silly error, not : , it is =