SPSS: generate 'fake' survey data using rv.uniform without losing value labels - label

I have a pretty straightforward survey dataset. Each row is a respondent, and each column is a question. Responses have a value that is a whole number, and each number has a label.
Now, I need to replace all of those values with fake data to use in a training. I need something that looks and feels like the original dataset, but isn't actually client data.
I started by replacing my variables with random number values:
COMPUTE Q1=RV.UNIFORM(1,2).
EXECUTE.
COMPUTE Q2=RV.UNIFORM(1,36).
EXECUTE.
COMPUTE Q3=RV.NORMAL(50, 13).
EXECUTE.
(rv.normal/rv.uniform depending on what kind of data I'm trying to fake - age versus multiple-choice question, for example).
This works, but then when I try and generate crosstabs, export the dataset w value labels, etc., the labels aren't applied to the columns with fake data. As far as I can tell, my fake numbers are in the exact same format they were in before - numeric, no decimals, width of 2, nominal. The labels still appear in the variable view, but they aren't actually being applied.
I'd really prefer not to have to manually re-label every one of these columns, because there's quite a few of them. Any ideas for how to get around this issue? Or is there a smarter way to generate fake data?

Your problem is the RV.UNIFORM and the RV.NORMAL functions do not generate integers - they generate decimal numbers. You may have your display hide the decimal numbers by having 0 decimals in the variable view, but they are still there (you can check this by adding decimals in the variable view).
So you neen another step of turning your decimals into integers. For example, the following are two ways to get a random 1 or 2 (integers):
COMPUTE Q1=rnd(RV.UNIFORM(1,2)).
or
COMPUTE Q1=trunc(RV.UNIFORM(1,3)).
Once the numbers generated are integers corresponding to the value labels definition, you should be able to see the labels in the output.

Related

The column of the csv file in google automl tables is recognised as text or categorical instead of numeric as i would like

I tried to train a model using google automl tables but i have the following problem
The csv file is correctly imported, it has 2 columns and about 1870 rows, all numeric.
The system recognises only 1 column as numeric but not the other.
The column, where the problem is, has 5 digits in each row separated with space.
Is there anything i should do in order for the system to properly recognise the data as numeric?
Thanks in advance for your help
The issue is with the Data type Numeric definition, the number needs to be comparable (greater than, smaller than, equal).
Two different list of numbers are not comparable, for example 2 4 7 is not comparable to 1 5 7. To solve this, without using strings and therefore losing the "information" of those numbers, you have several options.
For example:
Create an array of numbers, by inserting [ ] in the limits of the second entrance. Take into consideration the Array Data type relative weighted approach in AutoMl tables as it may affect the "information" extracted from the sequence.
Create additional columns for every entry of the second column so each one is a single number and hence truly numeric.
I would personally go for the second option.
If you are afraid of losing "information" by splitting the numbers take into consideration that after training, the model should deduce by itself the importance of the position and other "information" those number sequences might contain (mean, norm/modulus,relative increase,...) provided the training data is representative.

Is there any option to do FOR loop in excel?

I have an excel that I'm calculating my Scrum Task's completed average. I have Story point item also in the excel. My calculation is:
Result= SP * percentage of completion --> This calculation is for each row and after that I sum up all result and taking the summary.
But sometimes I am adding new task and for each task I am adding the calculation to the average result.
Is there any way to use for loop in the excel?
for(int i=0;i<50;i++){ if(SP!=null && task!=null)(B+i)*(L+i)}
My calculation is like below:
AVERAGE((B4*L4+B5*L5+B6*L6+B7*L7+B8*L8+B9*L9+B10*L10)/SUM(B4:B10))
First of all, AVERAGE is not doing anything in your formula, since the argument you pass to it is just one single value. You already do an average calculation by dividing by the sum. That average is in fact a weighted average, and so you could not even achieve that with a plain AVERAGE function.
I see several ways to make this formula more generic, so it keeps working when you add rows:
1. Use SUMPRODUCT
=SUMPRODUCT(B4:B100,L4:L100)/SUM(B4:B100)
The row number 100 is chosen arbitrarily, but should evidently encompass all data rows. If you have no data occurring below your table, then it is safe to add a large margin. You'll want to avoid the situation where you think you add a line to the table, but actually get outside of the range of the formula. Using proper Excel tables can help to avoid this situation.
2. Use an array formula
This would be a second resort for when the formula becomes more complicated and cannot be executed with a "simple" SUMPRODUCT. But the above would translate to this array formula:
=SUM(B4:B100*L4:L100)/SUM(B4:B100)
Once you have typed this in the formula bar, make sure to press Ctrl+Shift+Enter to enter it. Only then will it act as an array formula.
Again, the same remark about row number 100.
3. Use an extra column
Things get easy when you use an extra column for storing the product of B & L values for each row. So you would put in cell N4 the following formula:
=B4*L4
...and then copy that relative formula to the other rows. You can hide that column if you want.
Then the overal formula can be:
=SUM(N4:N100)/SUM(B4:B100)
With this solution you must take care to always copy a row when inserting a new row, as you need the N column to have the intermediate product formula also for any new row.

Random number that prefers a range

I'm trying to write a program that requires me to generator a random number.
I also need to make it so there's a variable chance to pick a set range.
In this case, I would be generating between 1-10 and the range with the percent chance is 7-10.
How would I do this? Could I be supplied with a formula or something like that?
So if I'm understanding your question, you want two number ranges, and a variable-defined probability that the one range will be selected. This can be described mathematically as a probability density function (PDF), which in this case would also take your "chance" variable as an argument. If, for example, your 7-10 range is more likely than the rest of the 1-10 range, your PDF might look something like:
One PDF such as a flat distribution can be transformed into another via a transformation function, which would allow you to generate a uniformly random number and transform it to your own density function. See here for the rigorous mathematics:
http://www.stat.cmu.edu/~shyun/probclass16/transformations.pdf
But since you're writing a program and not a mathematics thesis, I suggest that the easiest way is to just generate two random numbers. The first decides which range to use, and the second generates the number within your chosen range. Keep in mind that if your ranges overlap (1-10 and 7-10 obviously do) then the overlapping region will be even more likely, so you will probably want to make your ranges exclusive so you can more easily control the probabilities. You haven't said what language you're using, but here's a simple example in Python:
import random
range_chance = 30 #30 percent change of the 7-10 range
if random.uniform(0,100) < range_chance:
print(random.uniform(7,10))
else:
print(random.uniform(1,7)) #1-7 so that there is no overlapping region

Telerik report how to get non rounded queried sums

I'm currently using telerik reports to create bills.
For this I take the customer from the database and sum up the cost of all articles he has confirmed.
Thus the textbox field for the cost has the following code inside:
=Sum(IIf(Fields.IsConfirmed>0,Fields.Cost,0))
This shall make sure that I only sum up costs for the customer where he has confirmed that
he wants it on the bill.
When I use the sum without the IIf it functions as expected, displaying all the costs summed up
(in this case too many costs as also unconfirmed are included). But WITH the IIf included
the costs are off:
Not a single decimal digit is displayed
The sum values themselves are slightly off
In total it looks to me as if the IIf leads to the Fields.Cost values being rounded and THEN summed up which is completely unexpected and unwished behaviour.
An alternative would be that I use a view that does these calculations directly in the database instead of doing it in the report, but I would like to have the whole logic in the report if possible.
So the question is: Is there any way to sum these filtered lines up WITHOUT them getting rounded in the process?
On a special note: I can't reduce the number of returned lines through a where statement as I also need the number of total items the customer has including the nonconfirmed one for another textbox on the same report.. Also possibly relevant, the data is stored in the database as decimal(15,2) and I use the entity framework to get the data out of the database (although like I indicated before if I don't use IIf then the rounding problem does not appear and I have decimal digits).
I've found a solution there. The problem is a typical one from other programming languages and still as easy the overlook in each one.
In effect what I'm doing there is adding up a number of floats, but if the field isconfirmed is <= 0 I'm adding an INTEGER value (0). As is in many other such situations (in different programming languages) a conversion happens then. Thus the Integer value in the SUM field leads to the whole sum being seen as INT. Although what is still a bit of a surprise there is that it seems like that even the partial sums get then converted into INT values (at least that is the impression gained from tests).
The solution is now quite easy there and completely fixes this problem:
=Sum(IIf(Fields.IsConfirmed>0,Fields.Cost,0.00))
The 0.00 leads to the zero value being interpreted as a number with decimals and thus no int conversion happens.

Tableau - Calculated fields / grouping / Custom Dim

Tableau:
This may seem simple, but I ran out of the usual tricks I've used in other systems.
I want a variance column. Essentially adding a member 'Variance' to the Act/Plan dimension which only contains the members 'Actual' and 'Plan'
I've come in where the data structure and reporting is set up like so:
Actual | Plan
Profit measure
measure 2
measure 3
etc
The goal is to have a Variance column (calculated and not part of the Actual/Plan dimension)
Actual | Plan | Variance
Profit measure
measure 2
measure 3
etc
There are solutions where it works for one measure only, and I've looked into that.
ie, create calculated field as such
Profit_Actual | Profit_Plan | Variance
You put this on the columns, and you get a grid that I want... except a grid with only 1 measure.
This does not work if I want to run several measures on rows. Essentially the solution above will only display the Profit measure, not Measure 1_Actual , Measure 2_Plan etc.
So I tried a trick where I grouped a the 3 calculated measures, ie Profit_Actual | Profit_Plan | Profit_Variance as 'Profit_Measure'
Created a parameter list - 'Actual', 'Plan', 'Variance'
Now I can half achieve my goal, by having the parameter on columns and the 'Profit Measure' on Rows (so I can have Measure 123_group etc down on rows too). Trouble is, I found that parameters are single select only. Only if it can display all options in the custom paramater at once, I would've solved my problem.
Any ideas on how I can achieve the Variance column I want?
Virtually adding a member to a dimension/Calculated fieds/tricks/workaround
Thank you
Any leads is appreciated
Gemmo
Okay. First thing, I had a really hard time trying to understand how your data is organized, try to be more clear (say how each entry in your database looks like, and not how a specific view in Tableau looks like).
But I think I got it. I guess you have a collection of entries, and each entry has a number of measure fields (profits and etc.) and an Act/Plan field, to identify whether that entry is an actual value or a planned value. Is that correct?
Well, if that's the case, I'm sorry to say you have to calculate a variance field for each dimension. Think about it, how your original dataset is structured. Do you think you can add a single field "Variance" to represent the variance of each measure? Well, you can, store the values in a string, and then collect it back using some string functions, but it's not very practical. The problem is that each entry have many measures, if it had only 1 measure, than 1 single variance field would suffice.
So, if you can re-organize your data, what would be an easier to work set (but with many more entries) is something with the fields: Measure, Value, Actual/Plan. The measure field would have a string to identify what you're measuring in that entry. Value would be a number to represent the actual measure. And the Actual/Plan is the same. For instance:
Measure Value Actual/Plan
Profit 100 Actual
So, each line in your current model would become n entries, where n is the number of measures you have right now. So a larger dataset in a way, but easier to work with. Think about, now you can have a calculated field, and use some table calculations to calculate the variance only for that measure and/or Actual/Plan. Just use WINDOW_VAR, and put Measure and/or Actual/Plan in the partition.
Table calculations are awesome, take a look at this to understand it better. http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#calculations_tablecalculations_understanding_addressing.html
I generally like to have my data staged such that Actual is its own column and Plan is its own column in the data being fed to Tableau. It makes calculations so much easier.
If your data is such that there is a column called "Actual/Plan" and every row is populated with either "Actual" or "Plan" and there is another column called "Value" or "Measure" that is populated with the values, you can force Tableau to make them columns assuming you can't or won't rearrange your data.
Create a calculated field called "Actual" with the following calc:
IF [Actual/Plan] = 'Actual' THEN [Value] END
Similarly, create a calculated field called "Plan" with the following calc:
IF [Actual/Plan] = 'Plan' THEN [Value] END
Now, you can finally create your "Variance" and "Variance %" calculations (respectively):
SUM([Actual]) - SUM([Plan])
[Variance] / SUM([Plan])

Resources