Creating a SPSS loop function - syntax

I need to create a syntax loop that runs a series of transformation
This is a simplified example of what I need to do
I would like to create five fruit variables
apple_variable
banana_variable
mango_variable
papaya_variable
orange_variable
in V1
apple=1
banana=2
mango=3
papaya=4
orange=5
First loop
IF (V1={number}) {fruit}_variable = VX.
IF (V2={number}) {fruit}_variable = VY.
IF (V3={number}) {fruit}_variable = VZ.
Run loop for next fruit
So what I would like is the scripte to check if V1, V2 or V3 contains the fruit number. If one of them does (only one can) The new {fruit}_variable should get the value from VX, VY or VZ.
Is this possible? The script need to create over 200 variables so a bit to time consuming to do manually

The first loop can be put within a DO REPEAT command. Essentially you define your two lists of variables and you can loop over the set of if statements.
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = 1 apple_variable = VA.
END REPEAT.
Now 1 and apple_variable are hard coded in the example above, but we can roll this up into a simple macro statement to take arbitrary parameters.
DEFINE !fruit (!POSITIONAL = !TOKENS(1)
/!POSITIONAL = !TOKENS(1)).
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = !1 !2 = VA.
END REPEAT.
!ENDDEFINE.
!fruit 1 apple_variable.
Now this will still be a bit tedious for over 200 variables, but should greatly simplify the task. After I get this far I typically just do text editing to my list to call the macro 200 times, which in this instance all that it would take is inserting !fruit before the number and the resulting variable name. This works well especially if the list is static.
Other approaches using the in-built SPSS facilities (mainly looping within the defined MACRO) IMO tend to be ugly, can greatly complicate the code and frequently not worth the time (although certainly doable). Although that is somewhat mitigated if you are willing to accept a solution utilizing python commands.

DO REPEAT is a good solution here, but I'm wondering what the ultimate goal is. This smells like a problem that might be solved by using the multiple response facilities in Statistics without the need to go through these transformations. Multiple response functionality is available in the old MULTIPLE RESPONSE procedure and in the newer CTABLES and Chart Builder facilities.
HTH,
Jon Peck

combination of loop statements: for,while, do while with nested if..else and switch case will do the trick. just make sure you have your initial value and final value for the loop to go
let's say:
for (initial; final; increment)
{
if (x == value) {
statements;
}else{
...
}

Related

Is there a way to use range with Z3ints in z3py?

I'm relatively new to Z3 and experimenting with it in python. I've coded a program which returns the order in which different actions is performed, represented with a number. Z3 returns an integer representing the second the action starts.
Now I want to look at the model and see if there is an instance of time where nothing happens. To do this I made a list with only 0's and I want to change the index at the times where each action is being executed, to 1. For instance, if an action start at the 5th second and takes 8 seconds to be executed, the index 5 to 12 would be set to 1. Doing this with all the actions and then look for 0's in the list would hopefully give me the instances where nothing happens.
The problem is: I would like to write something like this for coding the problem
list_for_check = [0]*total_time
m = s.model()
for action in actions:
for index in range(m.evaluate(action.number) , m.evaluate(action.number) + action.time_it_takes):
list_for_check[index] = 1
But I get the error:
'IntNumRef' object cannot be interpreted as an integer
I've understood that Z3 isn't returning normal ints or bools in their models, but writing
if m.evaluate(action.boolean):
works, so I'm assuming the if is overwritten in a way, but this doesn't seem to be the case with range. So my question is: Is there a way to use range with Z3 ints? Or is there another way to do this?
The problem might also be that action.time_it_takes is an integer and adding a Z3int with a "normal" int doesn't work. (Done in the second part of the range).
I've also tried using int(m.evaluate(action.number)), but it doesn't work.
Thanks in advance :)
When you call evaluate it returns an IntNumRef, which is an internal z3 representation of an integer number inside z3. You need to call as_long() method of it to convert it to a Python number. Here's an example:
from z3 import *
s = Solver()
a = Int('a')
s.add(a > 4);
s.add(a < 7);
if s.check() == sat:
m = s.model()
print("a is %s" % m.evaluate(a))
print("Iterating from a to a+5:")
av = m.evaluate(a).as_long()
for index in range(av, av + 5):
print(index)
When I run this, I get:
a is 5
Iterating from a to a+5:
5
6
7
8
9
which is exactly what you're trying to achieve.
The method as_long() is defined here. Note that there are similar conversion functions from bit-vectors and rationals as well. You can search the z3py api using the interface at: https://z3prover.github.io/api/html/namespacez3py.html

update aTable set a,b,c = func(x,y,z,…)

I need a quick advice how-to. I mention that the following scenario is based on the use of c_api available already to my monetdblite compilation on 64bit, intention is to use it with some adhoc C written functions.
Short: how can I achieve or simulate the following scenario:
update aTable set a,b,c = func(x,y,z,…)
Long. Many algorithms are returning more than one variable as, for instance, multiple regression.
bool m_regression(IN const double **data, IN const int cols, IN const int rows, OUT double *fit_values, OUT double *residuals, OUT double *std_residuals, OUT double &p_value);
In order to minimize the transfer of data between monetdb and heavy computational function, all those results are generated in one step. Question is how can I transfer them back at once, minimizing computational time and memory traffic between monetdb and external C/C++(/R/Python) function?
My first thought to solve this is something like this:
1. update aTable set dummy = func_compute(x,y,z,…)
where dummy is a temporary __int64 field and func_compute will compute all the necessary outputs and store the result into a dummy pointer. To make sure is no issue with constant estimation, first returned value in the array will be the real dummy pointer, the rest just an incremented value of dummy + i;
2. update aTable set a = func_ret(dummy, 1), b= func_ret (dummy, 2), c= func_ret (dummy, 3) [, dummy=func_free(dummy)];
Assuming the func_ret will get the dummy in the same order that it was returned on first call, I would just copy the prepared result into provided storage; In case the order is not preserved, I will need an extra step to get the minimum (real dummy pointer), then to use the offset of current value to lookup in my array.
__int64 real_dummy = __inputs[0][0];
double *my_pointer_data = (double *) (real_dummy + __inputs[1][0] * sizeof(double)* row_count);
memcpy(__outputs[0], my_pointer_data, sizeof(double)* row_count);
// or ============================
__int64 real_dummy = minimum(__inputs[0]);
double *my_pointer_data = (double *) (real_dummy + __inputs[0][1] * sizeof(double)* row_count);
for (int i=0;i<row_count;i++)
__outputs[0][i] = my_pointer_data[__inputs[0][i] - real_dummy];
It is less relevant how am I going to free the temporary memory, can be in the last statement in update or in a new fake update statement using func_free.
Problem is that it doesn’t look to me that, even if I save some computational (big) time, the passing of the dummy is still done 3 times (any chance that memory is actually not copied?).
Is it any other better way of achieving this?
I am not aware of a good way of doing this, sorry. You could retrieve the table, add your columns as BATs in whichever way you like and write it back.

Employ early bail-out in MATLAB

There is a example for Employ early bail-out in this book (http://www.amazon.com/Accelerating-MATLAB-Performance-speed-programs/dp/1482211297) (#YairAltman). for speed improvement we can convert this code:
data = [];
newData = [];
outerIdx = 1;
while outerIdx <= 20
outerIdx = outerIdx + 1;
for innerIdx = -100 : 100
if innerIdx == 0
continue % skips to next innerIdx (=1)
elseif outerIdx > 15
break % skips to next outerIdx
else
data(end+1) = outerIdx/innerIdx;
newData(end+1) = process(data);
end
end % for innerIdx
end % while outerIdx
to this code:
function bailableProcessing()
for outerIdx = 1 : 5
middleIdx = 10
while middleIdx <= 20
middleIdx = middleIdx + 1;
for innerIdx = -100 : 100
data = outerIdx/innerIdx + middleIdx;
if data == SOME_VALUE
return
else
process(data);
end
end % for innerIdx
end % while middleIdx
end % for outerIdx
end % bailableProcessing()
How we did this conversion? Why we have different middleIdx range in new code? Where is checking for innerIdx and outerIdx in new code? what is this new data = outerIdx/innerIdx + middleIdx calculation?
we have only this information for second code :
We could place the code segment that should be bailed-out within a
dedicated function and return from the function when the bail-out
condition occurs.
I am sorry that I did not clarify within the text that the second code segment is not a direct replacement of the first. If you reread the early bail-out section (3.1.3) perhaps you can see that it has two main parts:
The first part of the section (which includes the top code segment) illustrates the basic mechanism of using break/continue in order to bail-out from a complex processing loop, in order to save processing time in computing values that are not needed.
In contrast, the second part of the section deals with cases when we wish to break out of an ancestor loop that is not the direct parent loop. I mention in the text that there are three alternatives that we can use in this case, and the second code segment that you mentioned is one of them (the other alternatives are to use dedicated flags with break/continue and to use try/catch blocks). The three code segments that I provided in this second part of the section should all be equivalent to each other, but they are NOT equivalent to the code-segment at the top of the section.
Perhaps I should have clarified this in the text, or maybe I should have used the same example throughout. I will think about this for the second edition of the book (if and when it ever appears).
I have used a variant of these code segments in other sections of the book to illustrate various other aspects of performance speedups (for example, 3.1.4 & 3.1.6) - in all these cases the code segments are NOT equivalent to each other. They are merely used to illustrate the corresponding text.
I hope you like my book in general and think that it is useful. I would be grateful if you would place a positive feedback about it on Amazon (direct link).
p.s. - #SamRoberts was correct to surmise that mention of my name would act as a "bat-signal", attracting my attention :-)
it's all far more simple than you think!
How we did this conversion?
Irrationally. Those two codes are completely different.
Why we have different middleIdx range in new code?
Randomness. The point of the author is something different.
Where is checking for innerIdx and outerIdx in new code?
dont need that, as it's not intended to be the same code.
what is this new data = outerIdx/innerIdx + middleIdx calculation?
a random calculation as well as data(end+1) = outerIdx/innerIdx; in the original code.
i suppose the author wants to illustrate something far more profoundly: that if you wrap your code that does (possibly many) loops (fors/whiles, doesnt matter) inside a function and you issue a return statement if you somehow detect that you're done, it will result in an effectively "bailable" computation, e.g. the method that does the work returns earlier than it would normally do. that is illustrated here by the condition that checks on data == SOME_VALUE; you can have your favourite bailout condition there instead :-)
moreover, the keywords [continue/break] inside the first example are meant to illustrate that you can [skip the rest of/leave] the inner-most loop from whereever you call them. in principal, you can implement a bailout using these by e.g.
bailing = false;
for outer = 1:1000
for inner = 1:1000
if <somebailingcondition>
bailing = true;
break;
else
<do stuff>
end
end
if bailing
break;
end
end
but that would be very clumsy as that "cascade" of breaks will need to be as long as you have nested loops and messes up the code.
i hope that could clarify your issues.

Python Birthday paradox math not working

it run corectly but it should have around 500 matches but it only has around 50 and I dont know why!
This is a probelm for my comsci class that I am having isues with
we had to make a function that checks a list for duplication I got that part but then we had to apply it to the birthday paradox( more info here http://en.wikipedia.org/wiki/Birthday_problem) thats where I am runing into problem because my teacher said that the total number of times should be around 500 or 50% but for me its only going around 50-70 times or 5%
duplicateNumber=0
import random
def has_duplicates(listToCheck):
for i in listToCheck:
x=listToCheck.index(i)
del listToCheck[x]
if i in listToCheck:
return True
else:
return False
listA=[1,2,3,4]
listB=[1,2,3,1]
#print has_duplicates(listA)
#print has_duplicates(listB)
for i in range(0,1000):
birthdayList=[]
for i in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x= has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
else:
pass
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000)*100),3),"%"
This code gave me a result in line with what you were expecting:
import random
duplicateNumber=0
def has_duplicates(listToCheck):
number_set = set(listToCheck)
if len(number_set) is not len(listToCheck):
return True
else:
return False
for i in range(0,1000):
birthdayList=[]
for j in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x = has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000.0)*100),3),"%"
The first change I made was tidying up the indices you were using in those nested for loops. You'll see I changed the second one to j, as they were previously bot i.
The big one, though, was to the has_duplicates function. The basic principle here is that creating a set out of the incoming list gets the unique values in the list. By comparing the number of items in the number_set to the number in listToCheck we can judge whether there are any duplicates or not.
Here is what you are looking for. As this is not standard practice (to just throw code at a new user), I apologize if this offends any other users. However, I believe showing the OP a correct way to write a program should be could all do us a favor if said user keeps the lack of documentation further on in his career.
Thus, please take a careful look at the code, and fill in the blanks. Look up the python doumentation (as dry as it is), and try to understand the things that you don't get right away. Even if you understand something just by the name, it would still be wise to see what is actually happening when some built-in method is being used.
Last, but not least, take a look at this code, and take a look at your code. Note the differences, and keep trying to write your code from scratch (without looking at mine), and if it messes up, see where you went wrong, and start over. This sort of practice is key if you wish to succeed later on in programming!
def same_birthdays():
import random
'''
This is a program that does ________. It is really important
that we tell readers of this code what it does, so that the
reader doesn't have to piece all of the puzzles together,
while the key is right there, in the mind of the programmer.
'''
count = 0
#Count is going to store the number of times that we have the same birthdays
timesToRun = 1000 #timesToRun should probably be in a parameter
#timesToRun is clearly defined in its name as well. Further elaboration
#on its purpose is not necessary.
for i in range(0,timesToRun):
birthdayList = []
for j in range(0,23):
random_birthday = random.randint(1,365)
birthdayList.append(random_birthday)
birthdayList = sorted(birthdayList) #sorting for easier matching
#If we really want to, we could provide a check in the above nester
#for loop to check right away if there is a duplicate.
#But again, we are here
for j in range(0, len(birthdayList)-1):
if (birthdayList[j] == birthdayList[j+1]):
count+=1
break #leaving this nested for-loop
return count
If you wish to find the percent, then get rid of the above return statement and add:
return (count/timesToRun)
Here's a solution that doesn't use set(). It also takes a different approach with the array so that each index represents a day of the year. I also removed the hasDuplicate() function.
import random
sim_total=0
birthdayList=[]
#initialize an array of 0's representing each calendar day
for i in range(365):
birthdayList.append(0)
for i in range(0,1000):
first_dup=True
for n in range(365):
birthdayList[n]=0
for b in range(0, 23):
r = random.randint(0,364)
birthdayList[r]+=1
if (birthdayList[r] > 1) and (first_dup==True):
sim_total+=1
first_dup=False
avg = float(sim_total) / 1000 * 100
print "after 1000 simulations with 23 students there were", sim_total,"simulations with atleast one duplicate. The approximate problibility is", round(avg,3),"%"

When are numbers NOT Magic?

I have a function like this:
float_as_thousands_str_with_precision(value, precision)
If I use it like this:
float_as_thousands_str_with_precision(volts, 1)
float_as_thousands_str_with_precision(amps, 2)
float_as_thousands_str_with_precision(watts, 2)
Are those 1/2s magic numbers?
Yes, they are magic numbers. It's obvious that the numbers 1 and 2 specify precision in the code sample but not why. Why do you need amps and watts to be more precise than volts at that point?
Also, avoiding magic numbers allows you to centralize code changes rather than having to scour the code when for the literal number 2 when your precision needs to change.
I would propose something like:
HIGH_PRECISION = 3;
MED_PRECISION = 2;
LOW_PRECISION = 1;
And your client code would look like:
float_as_thousands_str_with_precision(volts, LOW_PRECISION )
float_as_thousands_str_with_precision(amps, MED_PRECISION )
float_as_thousands_str_with_precision(watts, MED_PRECISION )
Then, if in the future you do something like this:
HIGH_PRECISION = 6;
MED_PRECISION = 4;
LOW_PRECISION = 2;
All you do is change the constants...
But to try and answer the question in the OP title:
IMO the only numbers that can truly be used and not be considered "magic" are -1, 0 and 1 when used in iteration, testing lengths and sizes and many mathematical operations. Some examples where using constants would actually obfuscate code:
for (int i=0; i<someCollection.Length; i++) {...}
if (someCollection.Length == 0) {...}
if (someCollection.Length < 1) {...}
int MyRidiculousSignReversalFunction(int i) {return i * -1;}
Those are all pretty obvious examples. E.g. start and the first element and increment by one, testing to see whether a collection is empty and sign reversal... ridiculous but works as an example. Now replace all of the -1, 0 and 1 values with 2:
for (int i=2; i<50; i+=2) {...}
if (someCollection.Length == 2) {...}
if (someCollection.Length < 2) {...}
int MyRidiculousDoublinglFunction(int i) {return i * 2;}
Now you have start asking yourself: Why am I starting iteration on the 3rd element and checking every other? And what's so special about the number 50? What's so special about a collection with two elements? the doubler example actually makes sense here but you can see that the non -1, 0, 1 values of 2 and 50 immediately become magic because there's obviously something special in what they're doing and we have no idea why.
No, they aren't.
A magic number in that context would be a number that has an unexplained meaning. In your case, it specifies the precision, which clearly visible.
A magic number would be something like:
int calculateFoo(int input)
{
return 0x3557 * input;
}
You should be aware that the phrase "magic number" has multiple meanings. In this case, it specifies a number in source code, that is unexplainable by the surroundings. There are other cases where the phrase is used, for example in a file header, identifying it as a file of a certain type.
A literal numeral IS NOT a magic number when:
it is used one time, in one place, with very clear purpose based on its context
it is used with such common frequency and within such a limited context as to be widely accepted as not magic (e.g. the +1 or -1 in loops that people so frequently accept as being not magic).
some people accept the +1 of a zero offset as not magic. I do not. When I see variable + 1 I still want to know why, and ZERO_OFFSET cannot be mistaken.
As for the example scenario of:
float_as_thousands_str_with_precision(volts, 1)
And the proposed
float_as_thousands_str_with_precision(volts, HIGH_PRECISION)
The 1 is magic if that function for volts with 1 is going to be used repeatedly for the same purpose. Then sure, it's "magic" but not because the meaning is unclear, but because you simply have multiple occurences.
Paul's answer focused on the "unexplained meaning" part thinking HIGH_PRECISION = 3 explained the purpose. IMO, HIGH_PRECISION offers no more explanation or value than something like PRECISION_THREE or THREE or 3. Of course 3 is higher than 1, but it still doesn't explain WHY higher precision was needed, or why there's a difference in precision. The numerals offer every bit as much intent and clarity as the proposed labels.
Why is there a need for varying precision in the first place? As an engineering guy, I can assume there's three possible reasons: (a) a true engineering justification that the measurement itself is only valid to X precision, so therefore the display shoulld reflect that, or (b) there's only enough display space for X precision, or (c) the viewer won't care about anything higher that X precision even if its available.
Those are complex reasons difficult to capture in a constant label, and are probbaly better served by a comment (to explain why something is beng done).
IF the use of those functions were in one place, and one place only, I would not consider the numerals magic. The intent is clear.
For reference:
A literal numeral IS magic when
"Unique values with unexplained meaning or multiple occurrences which
could (preferably) be replaced with named constants." http://en.wikipedia.org/wiki/Magic_number_%28programming%29 (3rd bullet)

Resources