count variable length in Spotfire - tibco

I have a boolean/logic variable (values 0, 1) and I need to know dataset size in order to do some math (like calculating percentages)
For example, if my dataset has 250 rows, I want to do something similar to this:
Count([variable]) / 250
The point is that i dont know dataset's length (it will use different datasets each time). Thats why I need a function similar to R length(data$variable) who gives me the amount of rows in the variable.
Ive tried without success different count() combinations. Anyone knows a length() function or similar to know the amount of rows?

Based on your question, I believe Count(RowId()) would work.

Related

SPSS: generate 'fake' survey data using rv.uniform without losing value labels

I have a pretty straightforward survey dataset. Each row is a respondent, and each column is a question. Responses have a value that is a whole number, and each number has a label.
Now, I need to replace all of those values with fake data to use in a training. I need something that looks and feels like the original dataset, but isn't actually client data.
I started by replacing my variables with random number values:
COMPUTE Q1=RV.UNIFORM(1,2).
EXECUTE.
COMPUTE Q2=RV.UNIFORM(1,36).
EXECUTE.
COMPUTE Q3=RV.NORMAL(50, 13).
EXECUTE.
(rv.normal/rv.uniform depending on what kind of data I'm trying to fake - age versus multiple-choice question, for example).
This works, but then when I try and generate crosstabs, export the dataset w value labels, etc., the labels aren't applied to the columns with fake data. As far as I can tell, my fake numbers are in the exact same format they were in before - numeric, no decimals, width of 2, nominal. The labels still appear in the variable view, but they aren't actually being applied.
I'd really prefer not to have to manually re-label every one of these columns, because there's quite a few of them. Any ideas for how to get around this issue? Or is there a smarter way to generate fake data?
Your problem is the RV.UNIFORM and the RV.NORMAL functions do not generate integers - they generate decimal numbers. You may have your display hide the decimal numbers by having 0 decimals in the variable view, but they are still there (you can check this by adding decimals in the variable view).
So you neen another step of turning your decimals into integers. For example, the following are two ways to get a random 1 or 2 (integers):
COMPUTE Q1=rnd(RV.UNIFORM(1,2)).
or
COMPUTE Q1=trunc(RV.UNIFORM(1,3)).
Once the numbers generated are integers corresponding to the value labels definition, you should be able to see the labels in the output.

PowerQuery syntax to overcome #NUM! error

I have two columns of data in Excel. Using PowerQuery I am trying to divide these two columns and call it column X. The problem is that there are zeros in these two columns meaning that we get a "#NUM!" in Column X when dividing. How can I write an IF statement in PowerQuery so that IF the value of column X (the division) is Nan (#NUM!) then it is set to zero?
The below doesn't change the NaN's to zeros:
if[Column1]/[Column2]="NaN" then 0 else[Column1]/[Column2]
This should be a FAQ but approach is similar in almost every langage. I'd write your statement like this: if [Column2] = 0 then 0 else [column1]/[column2]. Should work for all non-zero denominators.
Other thought, I just used this: Powerquery (and PowerPivot) has a divide function that is divide-by-zero-safe! divide(column1,column2). Shorter to write and should perform better as it is only performing the calculation once. Especially with more complex denominators.
Final thought: because they aren't additive, I tend not to store ratios in the PQ results choosing instead to calculate dynamically in powerpivot or elsewhere in the reporting. In Excel you can use =iferror(a/b, 0).
JR

Clustsig with modified method.distance

I am attempting to perform a Simprof test using a Pearson correlation as a distance method. I am aware that it is designed for the typical distance methods such as euclidean or bray curtis, but it supposedly allows any function that returns a dist object.
My issue lies with the creation of that function. My original data exists as a set of 35 rows and 2146 columns. I wish to correlate the columns. Below is a small subset of that data (lines 78-82).
I need a function that takes the absolute value of the Pearson correlation coefficient metric to be used as the method.distance function. I can calculate those individually, as seen in lines 84-86, but I have no idea how to make a single function that contains all of that. My attempt is on lines 89-91, but I know that as.dist needs the matrix of correlation coefficients, which you can only get from CorrelationSmall$r. I'm assuming it needs to be nested, but I'm at a loss. I apologize if I'm am asking something ridiculous. I have combed the forums and don't know who else to ask. Many thanks!
library(clustsig)
library(Hmisc)
NetworkAnalysisSmall <- read_csv("C:/Users/WilhelmLab/Desktop/Lena/NetworkAnalysisSmall.csv")
NetworkAnalysisSmallMatrix<-as.matrix(NetworkAnalysisSmall)
#subset of NetworkAnalysisSmall
a<-c(0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000001505,0.0000000000685,0.0000000009909,0.0000000001543,0.0000000000000,0.0000000000000,0.0000000000000)
b<-c(0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000002228,0.0000000000000,0.0000000001375,0.0000000000000,0.0000000000000)
c<-c(0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000546,0.0000000000000,0.0000000000000,0.0000000002293,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000540,0.0000000002085,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000)
subset<-data.frame(a,b,c)
CorrelationSmall<-rcorr(as.matrix(NetworkAnalysisSmall),type=c("pearson"))
CCsmall<-CorrelationSmall$r
CCsmallAbs<-abs(CCsmall)
dist3 = function(x) {
as.dist(rcorr(as.matrix(x),type=c("pearson")))
}
NetworkSimprof<-simprof(NetworkAnalysisSmall,num.expected=1000,num.simulated=1000,method.cluster=c("ward"),method.distance=c("dist3"),method.transform=c("log"),alpha=0.05,sample.orientation="column")

SPSS Partial Correlation not giving a number?

I am trying to use a partial correlation on my data that should include the high temp, low temp, and total count....and control three other factors. When I run a simple correlation, Analyze-Correlate-Bivariate, I am able to attain correlation values. When I run Analyze-Correlate-Partial, then select the high temp, low temp, and total count as my variables and the rest as my controlling for I do not get any correlation values and it gives me a df of 0. There are five rows for each variable, could it be there is just not enough data to do a partial? Please any help as to why the simple correlation works but the partial correlation does not work would be great.
First thing to check would be the pattern of missing values in the variables

Tableau - Calculated fields / grouping / Custom Dim

Tableau:
This may seem simple, but I ran out of the usual tricks I've used in other systems.
I want a variance column. Essentially adding a member 'Variance' to the Act/Plan dimension which only contains the members 'Actual' and 'Plan'
I've come in where the data structure and reporting is set up like so:
Actual | Plan
Profit measure
measure 2
measure 3
etc
The goal is to have a Variance column (calculated and not part of the Actual/Plan dimension)
Actual | Plan | Variance
Profit measure
measure 2
measure 3
etc
There are solutions where it works for one measure only, and I've looked into that.
ie, create calculated field as such
Profit_Actual | Profit_Plan | Variance
You put this on the columns, and you get a grid that I want... except a grid with only 1 measure.
This does not work if I want to run several measures on rows. Essentially the solution above will only display the Profit measure, not Measure 1_Actual , Measure 2_Plan etc.
So I tried a trick where I grouped a the 3 calculated measures, ie Profit_Actual | Profit_Plan | Profit_Variance as 'Profit_Measure'
Created a parameter list - 'Actual', 'Plan', 'Variance'
Now I can half achieve my goal, by having the parameter on columns and the 'Profit Measure' on Rows (so I can have Measure 123_group etc down on rows too). Trouble is, I found that parameters are single select only. Only if it can display all options in the custom paramater at once, I would've solved my problem.
Any ideas on how I can achieve the Variance column I want?
Virtually adding a member to a dimension/Calculated fieds/tricks/workaround
Thank you
Any leads is appreciated
Gemmo
Okay. First thing, I had a really hard time trying to understand how your data is organized, try to be more clear (say how each entry in your database looks like, and not how a specific view in Tableau looks like).
But I think I got it. I guess you have a collection of entries, and each entry has a number of measure fields (profits and etc.) and an Act/Plan field, to identify whether that entry is an actual value or a planned value. Is that correct?
Well, if that's the case, I'm sorry to say you have to calculate a variance field for each dimension. Think about it, how your original dataset is structured. Do you think you can add a single field "Variance" to represent the variance of each measure? Well, you can, store the values in a string, and then collect it back using some string functions, but it's not very practical. The problem is that each entry have many measures, if it had only 1 measure, than 1 single variance field would suffice.
So, if you can re-organize your data, what would be an easier to work set (but with many more entries) is something with the fields: Measure, Value, Actual/Plan. The measure field would have a string to identify what you're measuring in that entry. Value would be a number to represent the actual measure. And the Actual/Plan is the same. For instance:
Measure Value Actual/Plan
Profit 100 Actual
So, each line in your current model would become n entries, where n is the number of measures you have right now. So a larger dataset in a way, but easier to work with. Think about, now you can have a calculated field, and use some table calculations to calculate the variance only for that measure and/or Actual/Plan. Just use WINDOW_VAR, and put Measure and/or Actual/Plan in the partition.
Table calculations are awesome, take a look at this to understand it better. http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#calculations_tablecalculations_understanding_addressing.html
I generally like to have my data staged such that Actual is its own column and Plan is its own column in the data being fed to Tableau. It makes calculations so much easier.
If your data is such that there is a column called "Actual/Plan" and every row is populated with either "Actual" or "Plan" and there is another column called "Value" or "Measure" that is populated with the values, you can force Tableau to make them columns assuming you can't or won't rearrange your data.
Create a calculated field called "Actual" with the following calc:
IF [Actual/Plan] = 'Actual' THEN [Value] END
Similarly, create a calculated field called "Plan" with the following calc:
IF [Actual/Plan] = 'Plan' THEN [Value] END
Now, you can finally create your "Variance" and "Variance %" calculations (respectively):
SUM([Actual]) - SUM([Plan])
[Variance] / SUM([Plan])

Resources