Calculating year dummy variables using a for loop (foreach) in Stata - for-loop

I am attempting to generate a dummy variable for each year from 1996 to 2012 (inclusive) such that the 1996 dummy should equal 1 if it is 1996 and 0 if else using the foreach command in Stata to cut down on time (at least for future projects). What is currently happening is that the dummy for 1996 is being produced, but no others are generated.
I think that it has to do with how I am defining j, but I cannot quite figure out the formatting to achieve the results that I want. I have looked online and in the Stata help files and cannot find anything on this specific topic.
Here is what I have thus far:
local var year
local j = 1996
foreach j of var year {
gen d`j' = 1 if year==`j'
local ++j
}
I will continue to try and figure this out on my own, but if anyone has a suggestion I would be greatly appreciative.

Let us look at this line by line.
local var year
You defined a local macro var with content "year". This is legal but you never refer to that local macro in this code, so the definition is pointless.
local j = 1996
You defined a local macro j with content "1996". This is legal.
foreach j of var year {
You open a loop and define the loop index to be j. That means that within the loop any reference to local macro j will be interpreted in terms of the list of arguments you provide. (The previous definition of j is irrelevant within the loop, and so has no effect in the rest of your code.)
... of var year
You specify that the loop is over a variable list here. Note that the keyword var here is short for varlist and has absolutely nothing to do the local macro name var you just defined. The variable list consists of the single variable name year.
gen d`j' = 1 if year==`j'
This statement will be interpreted, the one and only time the loop is executed, as
gen dyear = 1 if year==year
as references to the local macro j are replaced with its contents, the variable name year. year==year is true for every observation. The effect is a new variable dyear which is 1 in every observation. That is not an indicator or dummy variable as you want it. If you look at your dataset carefully, you will see that is not a dummy variable for year being 1996.
local ++j
You are trying to increment the local macro j by 1. But you just set local macro j to contain the string "year", which is a variable name. But you can't add 1 to a string, and so the error message will be type mismatch. You don't report that error, which is a surprise. It is a little subtle, as in the previous command the context of generate allows interpretation of the reference to year as an instruction to calculate with the variable year, which is naturally numeric. But local commands are all about string manipulation, which may or may not have numeric interpretation, and your command is equivalent, first of all, to instructing Stata to add
"year" + 1
which triggers a type mismatch error.
Turning away from your code: Consider a loop
forval y = 1996/2012 {
gen d`y' = 1 if year == `y'
}
This is closer to what you want but makes clearer another bug in your code. This would create variables d1996 to d2012 but each will be 1 in the year specified but missing otherwise, which is not what you want.
You could fix that by adding a further line in the loop
replace d`y' = 0 if year != `y'
but a much cleaner way to do it is the single line
gen d`y' = year == `y'
The expression
year == `y'
is evaluated as 1 when true and 0 when false, which is what you want.
All this is standard technique documented in [U] or [P].
As #Roberto Ferrer pointed out, however, experienced Stata users would not define dummies this way, as tabulate offers an option to do it without a loop.
A tutorial that brings together comments on local macros, foreach and forvalues loops is within http://www.stata-journal.com/sjpdf.html?articlenum=pr0005
search foreach
within Stata would have pointed to that as one of various pieces you can read.

Looping is not necessary. Try the tabulate command with the gen() option. See help tabulate oneway.
See also help xi and help factor variables.
You are trying to loop through the distinct values of year but the syntax is not correct. You are actually looping through a list of variables with only one element: year. The command levelsof gives you the distinct values, but like I said, looping is not necessary.

Maybe this might help.
/*assuming the data is from 1970-2012*/
/*assuming your year variable name is fyear*/
forvalues x=1970/2012 {
gen fyear `x'=0
replace fyear `x'=1 if fyear==`x'
}
However, I do agree with Roberto Ferrer that loop may not be necessary.

Related

How to call Lua table value explicitly when using integer counter (i,j,k) in a for loop to make the table name/address?

I have to be honest that I don't quite understand Lua that well yet. I am trying to overwrite a local numeric value assigned to a set table address (is this the right term?).
The addresses are of the type:
project.models.stor1.inputs.T_in.default, project.models.stor2.inputs.T_in.default and so on with the stor number increasing.
I would like to do this in a for loop but cannot find the right expression to make the entire string be accepted by Lua as a table address (again, I hope this is the right term).
So far, I tried the following to concatenate the strings but without success in calling and then overwriting the value:
for k = 1,10,1 do
project.models.["stor"..k].inputs.T_in.default = 25
end
for k = 1,10,1 do
"project.models.stor"..j..".T_in.default" = 25
end
EDIT:
I think I found the solution as per https://www.lua.org/pil/2.5.html:
A common mistake for beginners is to confuse a.x with a[x]. The first form represents a["x"], that is, a table indexed by the string "x". The second form is a table indexed by the value of the variable x. See the difference:
for k = 1,10,1 do
project["models"]["stor"..k]["inputs"]["T_in"]["default"] = 25
end
You were almost close.
Lua supports this representation by providing a.name as syntactic sugar for a["name"].
Read more: https://www.lua.org/pil/2.5.html
You can use only one syntax in time.
Either tbl.key or tbl["key"].
The limitation of . is that you can only use constant strings in it (which are also valid variable names).
In square brackets [] you can evaluate runtime expressions.
Correct way to do it:
project.models["stor"..k].inputs.T_in.default = 25
The . in models.["stor"..k] is unnecessary and causes an error. The correct syntax is just models["stor"..k].

Random number in Lua script Load Impact

I'm trying to create a random number generator in Lua. I found out that I can just use math.random(1,100) to randomize a number between 1 and 100 and that should be sufficient.
But I don't really understand how to use the randomize number as variables in the script.
Tried this but of course it didn't work.
$randomCorr = math.random(1,100);
http.request_batch({
{"POST", "https://store.thestore.com/priceAndOrder/selectProduct", headers={["Content-Type"]="application/json;charset=UTF-8"}, data="{\"ChoosenPhoneModelId\":4,\"PricePlanId\":\"phone\",\"CorrelationId\":\"$randomCorr\",\"DeliveryTime\":\"1 vecka\",\"$$hashKey\":\"006\"},\"ChoosenAmortization\":{\"AmortizationLength\":0,\"ChoosenDataPackage\":{\"Description\":\"6 GB\",\"PricePerMountInKr\":245,\"DataAmountInGb\":6,\"$$hashKey\":\"00W\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"Title\":\"Fastpris\",\"Description\":\"Fasta kostnader till fast pris\",\"MonthlyAmount\":0,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":0,\"$$hashKey\":\"00K\"}}", auto_decompress=true},
{"GET", "https://store.thestore.com/api/checkout/getproduct?correlationId=$randomCorr", auto_decompress=true},
})
In Lua, you can not start a variable name with $. This is where your main issue is at. Once the $ is removed from your code, we can easily see how to refer to variables in Lua.
randomCorr = math.random(100)
print("The random number:", randomCorr)
randomCorr = math.random(100)
print("New Random Number:", randomCorr)
Also, concatenation does not work the way you are implying it into your Http array. You have to concatenate the value in using .. in Lua
Take a look at the following example:
ran = math.random(100)
data = "{\""..ran.."\"}"
print(data)
--{"14"}
The same logic can be implied into your code:
data="{\"ChoosenPhoneModelId\":4,\"PricePlanId\":\"phone\",\"CorrelationId\":\""..randomCorr.."\",\"DeliveryTime\":\"1 vecka\",\"$$hashKey\":\"006\"},\"ChoosenAmortization\":{\"AmortizationLength\":0,\"ChoosenDataPackage\":{\"Description\":\"6 GB\",\"PricePerMountInKr\":245,\"DataAmountInGb\":6,\"$$hashKey\":\"00W\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"Title\":\"Fastpris\",\"Description\":\"Fasta kostnader till fast pris\",\"MonthlyAmount\":0,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":0,\"$$hashKey\":\"00K\"}}"
Or you can format the value in using one of the methods provided by the string library
Take a look at the following example:
ran = math.random(100)
data = "{%q}"
print(string.format(data,ran))
--{"59"}
The %q specifier will take whatever you put as input, and safely surround it with quotations
The same logic can be applied to your Http Data.
Here is a corrected version of the code snippet:
local randomCorr = math.random(1,100)
http.request_batch({
{"POST", "https://store.thestore.com/priceAndOrder/selectProduct", headers={["Content-Type"]="application/json;charset=UTF-8"}, data="{\"ChoosenPhoneModelId\":4,\"PricePlanId\":\"phone\",\"CorrelationId\":\"" .. randomCorr .. "\",\"DeliveryTime\":\"1 vecka\",\"$$hashKey\":\"006\"},\"ChoosenAmortization\":{\"AmortizationLength\":0,\"ChoosenDataPackage\":{\"Description\":\"6 GB\",\"PricePerMountInKr\":245,\"DataAmountInGb\":6,\"$$hashKey\":\"00W\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"Title\":\"Fastpris\",\"Description\":\"Fasta kostnader till fast pris\",\"MonthlyAmount\":0,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":0,\"$$hashKey\":\"00K\"}}", auto_decompress=true},
{"GET", "https://store.thestore.com/api/checkout/getproduct?correlationId=" .. randomCorr, auto_decompress=true},
})
There is something called $$hashKey also, in the quoted string. Not sure if that is supposed to be referencing a variable or not. If it is, it also needs to be concatenated into the resulting string, using the .. operator (just like with the randomCorr variable).

Python Birthday paradox math not working

it run corectly but it should have around 500 matches but it only has around 50 and I dont know why!
This is a probelm for my comsci class that I am having isues with
we had to make a function that checks a list for duplication I got that part but then we had to apply it to the birthday paradox( more info here http://en.wikipedia.org/wiki/Birthday_problem) thats where I am runing into problem because my teacher said that the total number of times should be around 500 or 50% but for me its only going around 50-70 times or 5%
duplicateNumber=0
import random
def has_duplicates(listToCheck):
for i in listToCheck:
x=listToCheck.index(i)
del listToCheck[x]
if i in listToCheck:
return True
else:
return False
listA=[1,2,3,4]
listB=[1,2,3,1]
#print has_duplicates(listA)
#print has_duplicates(listB)
for i in range(0,1000):
birthdayList=[]
for i in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x= has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
else:
pass
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000)*100),3),"%"
This code gave me a result in line with what you were expecting:
import random
duplicateNumber=0
def has_duplicates(listToCheck):
number_set = set(listToCheck)
if len(number_set) is not len(listToCheck):
return True
else:
return False
for i in range(0,1000):
birthdayList=[]
for j in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x = has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000.0)*100),3),"%"
The first change I made was tidying up the indices you were using in those nested for loops. You'll see I changed the second one to j, as they were previously bot i.
The big one, though, was to the has_duplicates function. The basic principle here is that creating a set out of the incoming list gets the unique values in the list. By comparing the number of items in the number_set to the number in listToCheck we can judge whether there are any duplicates or not.
Here is what you are looking for. As this is not standard practice (to just throw code at a new user), I apologize if this offends any other users. However, I believe showing the OP a correct way to write a program should be could all do us a favor if said user keeps the lack of documentation further on in his career.
Thus, please take a careful look at the code, and fill in the blanks. Look up the python doumentation (as dry as it is), and try to understand the things that you don't get right away. Even if you understand something just by the name, it would still be wise to see what is actually happening when some built-in method is being used.
Last, but not least, take a look at this code, and take a look at your code. Note the differences, and keep trying to write your code from scratch (without looking at mine), and if it messes up, see where you went wrong, and start over. This sort of practice is key if you wish to succeed later on in programming!
def same_birthdays():
import random
'''
This is a program that does ________. It is really important
that we tell readers of this code what it does, so that the
reader doesn't have to piece all of the puzzles together,
while the key is right there, in the mind of the programmer.
'''
count = 0
#Count is going to store the number of times that we have the same birthdays
timesToRun = 1000 #timesToRun should probably be in a parameter
#timesToRun is clearly defined in its name as well. Further elaboration
#on its purpose is not necessary.
for i in range(0,timesToRun):
birthdayList = []
for j in range(0,23):
random_birthday = random.randint(1,365)
birthdayList.append(random_birthday)
birthdayList = sorted(birthdayList) #sorting for easier matching
#If we really want to, we could provide a check in the above nester
#for loop to check right away if there is a duplicate.
#But again, we are here
for j in range(0, len(birthdayList)-1):
if (birthdayList[j] == birthdayList[j+1]):
count+=1
break #leaving this nested for-loop
return count
If you wish to find the percent, then get rid of the above return statement and add:
return (count/timesToRun)
Here's a solution that doesn't use set(). It also takes a different approach with the array so that each index represents a day of the year. I also removed the hasDuplicate() function.
import random
sim_total=0
birthdayList=[]
#initialize an array of 0's representing each calendar day
for i in range(365):
birthdayList.append(0)
for i in range(0,1000):
first_dup=True
for n in range(365):
birthdayList[n]=0
for b in range(0, 23):
r = random.randint(0,364)
birthdayList[r]+=1
if (birthdayList[r] > 1) and (first_dup==True):
sim_total+=1
first_dup=False
avg = float(sim_total) / 1000 * 100
print "after 1000 simulations with 23 students there were", sim_total,"simulations with atleast one duplicate. The approximate problibility is", round(avg,3),"%"

How to I create a loop that uses the last column in an array as the starting point for the next loop? In matlab

I have output = A(:,Nout) Nout = points along the array..... : = all points in column
So, it is saying the values in the last column.
How do I use output as A at the first column for the next iteration?
Your question is not clear. You may mean a variety of things.
If you want to loop through values in the first column in some order specified by your last column you can:
Asort = A ( A (:, end), :);
and then loop through Asort.
You may also mean to loop N times for each row where N is defined by last column of A.
You can do it using a nested loop:
for Arow = A(:, end)
for ii = 1:Arow
% your code here
end
end
You may also mean several other things, but instead me guessing you could try clarifing a bit. :)
(it should be a comment but I can't add comments yet, sorry)

Creating a SPSS loop function

I need to create a syntax loop that runs a series of transformation
This is a simplified example of what I need to do
I would like to create five fruit variables
apple_variable
banana_variable
mango_variable
papaya_variable
orange_variable
in V1
apple=1
banana=2
mango=3
papaya=4
orange=5
First loop
IF (V1={number}) {fruit}_variable = VX.
IF (V2={number}) {fruit}_variable = VY.
IF (V3={number}) {fruit}_variable = VZ.
Run loop for next fruit
So what I would like is the scripte to check if V1, V2 or V3 contains the fruit number. If one of them does (only one can) The new {fruit}_variable should get the value from VX, VY or VZ.
Is this possible? The script need to create over 200 variables so a bit to time consuming to do manually
The first loop can be put within a DO REPEAT command. Essentially you define your two lists of variables and you can loop over the set of if statements.
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = 1 apple_variable = VA.
END REPEAT.
Now 1 and apple_variable are hard coded in the example above, but we can roll this up into a simple macro statement to take arbitrary parameters.
DEFINE !fruit (!POSITIONAL = !TOKENS(1)
/!POSITIONAL = !TOKENS(1)).
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = !1 !2 = VA.
END REPEAT.
!ENDDEFINE.
!fruit 1 apple_variable.
Now this will still be a bit tedious for over 200 variables, but should greatly simplify the task. After I get this far I typically just do text editing to my list to call the macro 200 times, which in this instance all that it would take is inserting !fruit before the number and the resulting variable name. This works well especially if the list is static.
Other approaches using the in-built SPSS facilities (mainly looping within the defined MACRO) IMO tend to be ugly, can greatly complicate the code and frequently not worth the time (although certainly doable). Although that is somewhat mitigated if you are willing to accept a solution utilizing python commands.
DO REPEAT is a good solution here, but I'm wondering what the ultimate goal is. This smells like a problem that might be solved by using the multiple response facilities in Statistics without the need to go through these transformations. Multiple response functionality is available in the old MULTIPLE RESPONSE procedure and in the newer CTABLES and Chart Builder facilities.
HTH,
Jon Peck
combination of loop statements: for,while, do while with nested if..else and switch case will do the trick. just make sure you have your initial value and final value for the loop to go
let's say:
for (initial; final; increment)
{
if (x == value) {
statements;
}else{
...
}

Resources