I have two columns of data in Excel. Using PowerQuery I am trying to divide these two columns and call it column X. The problem is that there are zeros in these two columns meaning that we get a "#NUM!" in Column X when dividing. How can I write an IF statement in PowerQuery so that IF the value of column X (the division) is Nan (#NUM!) then it is set to zero?
The below doesn't change the NaN's to zeros:
if[Column1]/[Column2]="NaN" then 0 else[Column1]/[Column2]
This should be a FAQ but approach is similar in almost every langage. I'd write your statement like this: if [Column2] = 0 then 0 else [column1]/[column2]. Should work for all non-zero denominators.
Other thought, I just used this: Powerquery (and PowerPivot) has a divide function that is divide-by-zero-safe! divide(column1,column2). Shorter to write and should perform better as it is only performing the calculation once. Especially with more complex denominators.
Final thought: because they aren't additive, I tend not to store ratios in the PQ results choosing instead to calculate dynamically in powerpivot or elsewhere in the reporting. In Excel you can use =iferror(a/b, 0).
JR
Related
I have a pretty straightforward survey dataset. Each row is a respondent, and each column is a question. Responses have a value that is a whole number, and each number has a label.
Now, I need to replace all of those values with fake data to use in a training. I need something that looks and feels like the original dataset, but isn't actually client data.
I started by replacing my variables with random number values:
COMPUTE Q1=RV.UNIFORM(1,2).
EXECUTE.
COMPUTE Q2=RV.UNIFORM(1,36).
EXECUTE.
COMPUTE Q3=RV.NORMAL(50, 13).
EXECUTE.
(rv.normal/rv.uniform depending on what kind of data I'm trying to fake - age versus multiple-choice question, for example).
This works, but then when I try and generate crosstabs, export the dataset w value labels, etc., the labels aren't applied to the columns with fake data. As far as I can tell, my fake numbers are in the exact same format they were in before - numeric, no decimals, width of 2, nominal. The labels still appear in the variable view, but they aren't actually being applied.
I'd really prefer not to have to manually re-label every one of these columns, because there's quite a few of them. Any ideas for how to get around this issue? Or is there a smarter way to generate fake data?
Your problem is the RV.UNIFORM and the RV.NORMAL functions do not generate integers - they generate decimal numbers. You may have your display hide the decimal numbers by having 0 decimals in the variable view, but they are still there (you can check this by adding decimals in the variable view).
So you neen another step of turning your decimals into integers. For example, the following are two ways to get a random 1 or 2 (integers):
COMPUTE Q1=rnd(RV.UNIFORM(1,2)).
or
COMPUTE Q1=trunc(RV.UNIFORM(1,3)).
Once the numbers generated are integers corresponding to the value labels definition, you should be able to see the labels in the output.
I have an excel that I'm calculating my Scrum Task's completed average. I have Story point item also in the excel. My calculation is:
Result= SP * percentage of completion --> This calculation is for each row and after that I sum up all result and taking the summary.
But sometimes I am adding new task and for each task I am adding the calculation to the average result.
Is there any way to use for loop in the excel?
for(int i=0;i<50;i++){ if(SP!=null && task!=null)(B+i)*(L+i)}
My calculation is like below:
AVERAGE((B4*L4+B5*L5+B6*L6+B7*L7+B8*L8+B9*L9+B10*L10)/SUM(B4:B10))
First of all, AVERAGE is not doing anything in your formula, since the argument you pass to it is just one single value. You already do an average calculation by dividing by the sum. That average is in fact a weighted average, and so you could not even achieve that with a plain AVERAGE function.
I see several ways to make this formula more generic, so it keeps working when you add rows:
1. Use SUMPRODUCT
=SUMPRODUCT(B4:B100,L4:L100)/SUM(B4:B100)
The row number 100 is chosen arbitrarily, but should evidently encompass all data rows. If you have no data occurring below your table, then it is safe to add a large margin. You'll want to avoid the situation where you think you add a line to the table, but actually get outside of the range of the formula. Using proper Excel tables can help to avoid this situation.
2. Use an array formula
This would be a second resort for when the formula becomes more complicated and cannot be executed with a "simple" SUMPRODUCT. But the above would translate to this array formula:
=SUM(B4:B100*L4:L100)/SUM(B4:B100)
Once you have typed this in the formula bar, make sure to press Ctrl+Shift+Enter to enter it. Only then will it act as an array formula.
Again, the same remark about row number 100.
3. Use an extra column
Things get easy when you use an extra column for storing the product of B & L values for each row. So you would put in cell N4 the following formula:
=B4*L4
...and then copy that relative formula to the other rows. You can hide that column if you want.
Then the overal formula can be:
=SUM(N4:N100)/SUM(B4:B100)
With this solution you must take care to always copy a row when inserting a new row, as you need the N column to have the intermediate product formula also for any new row.
I have a data frame with 9 columns and many rows. I want to filter all the rows that have observations greater than 3.0 in at least 3 columns. Which conditional statements should I use to subset my data frame?
Since I am a n00b, I only came up with this:
data_frame[data_frame > 3,]
Obviously, this gives me all the rows for which all values are > 2, regardless of what I actually need.
Thanks!
I figured that you could also combine logical operators:
data[rowSums(data>2)>=3,]
Like this, you can subset from a data frame the rows for which the sum of observations (higher than 2) occurs three or more times. And no specification for the columns.
Logical operator, in this case, the brain. I used the sum(rowSum(data))>x # x =sum of the limit value times columns available.
I have a boolean/logic variable (values 0, 1) and I need to know dataset size in order to do some math (like calculating percentages)
For example, if my dataset has 250 rows, I want to do something similar to this:
Count([variable]) / 250
The point is that i dont know dataset's length (it will use different datasets each time). Thats why I need a function similar to R length(data$variable) who gives me the amount of rows in the variable.
Ive tried without success different count() combinations. Anyone knows a length() function or similar to know the amount of rows?
Based on your question, I believe Count(RowId()) would work.
I'm currently using telerik reports to create bills.
For this I take the customer from the database and sum up the cost of all articles he has confirmed.
Thus the textbox field for the cost has the following code inside:
=Sum(IIf(Fields.IsConfirmed>0,Fields.Cost,0))
This shall make sure that I only sum up costs for the customer where he has confirmed that
he wants it on the bill.
When I use the sum without the IIf it functions as expected, displaying all the costs summed up
(in this case too many costs as also unconfirmed are included). But WITH the IIf included
the costs are off:
Not a single decimal digit is displayed
The sum values themselves are slightly off
In total it looks to me as if the IIf leads to the Fields.Cost values being rounded and THEN summed up which is completely unexpected and unwished behaviour.
An alternative would be that I use a view that does these calculations directly in the database instead of doing it in the report, but I would like to have the whole logic in the report if possible.
So the question is: Is there any way to sum these filtered lines up WITHOUT them getting rounded in the process?
On a special note: I can't reduce the number of returned lines through a where statement as I also need the number of total items the customer has including the nonconfirmed one for another textbox on the same report.. Also possibly relevant, the data is stored in the database as decimal(15,2) and I use the entity framework to get the data out of the database (although like I indicated before if I don't use IIf then the rounding problem does not appear and I have decimal digits).
I've found a solution there. The problem is a typical one from other programming languages and still as easy the overlook in each one.
In effect what I'm doing there is adding up a number of floats, but if the field isconfirmed is <= 0 I'm adding an INTEGER value (0). As is in many other such situations (in different programming languages) a conversion happens then. Thus the Integer value in the SUM field leads to the whole sum being seen as INT. Although what is still a bit of a surprise there is that it seems like that even the partial sums get then converted into INT values (at least that is the impression gained from tests).
The solution is now quite easy there and completely fixes this problem:
=Sum(IIf(Fields.IsConfirmed>0,Fields.Cost,0.00))
The 0.00 leads to the zero value being interpreted as a number with decimals and thus no int conversion happens.