Tableau scatter-plot: splitting measure using dimension from other data set - scatter-plot

I have a question on Tableau scatter-plots. Suppose I have two data sources, "data" and "meta". Data contains two columns, "variable" and "value" (as R melted data frame). Let's say variable ranges from a - d, and values 1 - 10. Meta classifies variables (columns: "variable", "class") so that a and b are "x", and c and d are "y". Tableau automatically links "variable" of data to "variable" of meta.
Now I would like to create a scatter-plot so that x's give x coordinates and y's give y coordinates. (And later split data further and add detail.) I thought this could be done dragging "value" to chart area and "class" to somewhere, but I cannot get it work.
One thing I've tried is to create a calculated field for x's and y's:
IF [meta].[class] == "x" THEN [value] END
This doesn't work, Tableau saying: "All fields must be aggregate or constant when using table calculation functions or fields from multiple data sources."
I feel I must be missing something obvious. I know I could create the scatter-plot reshaping the data, but it's not a good option in the case I have in mind.

Related

how to fetch previous values of a table in oracle forms

My first task is to add two new columns to a table, first column stores the values of M and X fields values in a single column(as a single unit with a pipe separator) and second column stores O and Z fields values in a single column(as a single unit with a pipe separator).
second task selecting agency and external letter rating(shown in image) from drop down and after saving the form the value from fields M and X should move to N and Y and this values should be stored in table column that are created from task one, Now if we save the form the values should move to O and Z fields in forms and this should continue.
Can any one help me how to proceed with this and I don't know how to separate a column value into pieces and display on form.
Better if you propose any new method that does the same work.
Adding columns:
That's a bad idea. Concatenating values is easy; storing them into a column as well. But, then - in the next step - you have to split those values into two values (columns? rows?) to be joined to another value and produce result. Can you do it? Sure. Should you? No.
What to do? If you want to store 4 values, then add 4 columns to a table.
Alternatively, see if you can create a master-detail relationship between two tables so you'd actually create a new table (with a foreign key to existing table) with two additional columns:
one that says is value stored related to M or Y
value itself
It looks like more job to do, but - should pay off in the future.
Layout:
That really looks like a tabular form, which only supports what I previously said. You can't "dynamically" add rows (or, even if you could, that's really something you should avoid because you'd have to add (actually, display) separate items (not rows that share the same item name).

G_sheet: Find the same title and bring to another sheet all values (under the same title)

I am building an application in google sheet for marketing purposes.
In sheet #1, I want my colleagues to copy paste their data (date, clicks, conversions, etc).
In the next sheet (#2) I am using the data to generate ideas for experiments.
The ask:
I want a formula (or script) to use in sheet#2 so I can re-position certain columns in certain order according to my needs.
Why?
In order to be able to generate these ideas, I need to have certain columns in certain order and I have 0 trust that my colleagues will use the the same reporting order (for example if conversions are not in column X, the application doesn't work).
Sheet #1 [DATA input]
Column A: Conversions
Column B: Click
Column C: Conversion rate
In Sheet #2 I want to have
Column A: Conversion rate
Column B: Conversions
I am using:
=ArrayFormula({INDEX('Sheet#1'!$A$2:$Q$997,0,MATCH(A1, 'Sheet#1'!$A$1:$Q$1, 0))})
But it doesn't work all the time.
I need something scalable so my colleagues can use as well.
Example of the Google sheet: https://drive.google.com/file/d/1vRTxDAMrXQAsmZw-LtYJ5IISYjdWA9TM/view?usp=sharing
I'm not sure that I understand exactly what you want, but possibly the following formula pulls the data in the order you want:
=QUERY('IMPORT YOUR DATA'!A:C,"select B, C, A where A<>''",1)
In your sample sheet, you placed Clicks in column A, but your question says you want just Conversion Rate and Conversions. If you don't want the Clicks, change the "B, C, A" portion of my equation to just "C, A".
If this is not what you want, can you manually enter the result you would like to have, given the data you have.
Edit
When I opened your sheet as a Google Sheet, all of your data columns were text strings, even though they look like numeric values. This may have been just a mistake on my part, but in any case, it is easy to fix, by using VALUE, if necesary. But it did mean that I treated thee data as text strings, which meant a minor difference in the formula.
If the data was numeric values, the formula would be:
=QUERY('IMPORT YOUR DATA'!A:C,"select B, C, A where A > 0 ",1)

Possible to address nodes with Xpath in a 2-dimensional hierarchy?

Let's take the Excel Pivot data structure (or concept), where we have a hierarchy on the Rows (x-) and on the Cols (y-axis).
Would it be possible (or have any attempts been made) to address location in the pivot table using XPath? I know there is MDX for a cube which I'm familiar with (of n-Dimensionality, or so it says, but in actuality the display is almost always in 2-dimensions), but what about using Xpath to do the same? For example, to address the Cat (subtotal) row, it seems like the following could be used:
Format: (Rows(Xpath), Cols(Xpath), Vals(List))
(
Rows: '//Animal[#Value="Cat"]',
Cols: '//' (or empty --> means everything)
Vals: '', empty for all values, or a list of the specific values
)
A few more examples:
Row for Dog named Sally
('//Animal[#value="Dog"]/Name[#value="Sally"],,)
Column for F(emale) dogs
(,'//Gender[#value="F"],)
Value ("cell") for Booker, Male
('//Animal[#value="Cat"]/Name[#value="Booker"]', '//Gender[#value="M"]', )
Rows for Book, Pebbles
('//Animal[#value="Cat"]/Name[#value="Booker" or #value="Tood",,)
Would this be a valid way to address a two-dimensional Pivot? What might be the challenges if any of using this approach? Note the above pivot table probably isn't the best example because an animal will be either M or F but not both, so that column is in effect irrelevant, but even so hopefully it's a good-enough of an example to communicate my intent.

How do h2o models determine what columns to use for predictions (position, name, etc.)?

Using h2o python API to train some models and am a bit confused on how to correctly implement some parts of the API. Specifically, what columns should be ignored in a training dataset and how models look for the actual predictor features in a data set when actually using the model's predict() method. Also how weight columns should be handled (when the actual prediction datasets don't really have weights)
The details of the code here (I think) are not majorly important, but the basic training logic looks something like
drf_dx = h2o.h2o.H2ORandomForestEstimator(
# denoting update version name by epoch timestamp
model_id='drf_dx_v'+str(version)+'t'+str(int(time.time())),
response_column='dx_outcome',
ignored_columns=[
'ucl_id', 'patient_id', 'account_id', 'tar_id', 'charge_line', 'ML_data_begin',
'procedure_outcome', 'provider_outcome',
'weight'
],
weights_column='weight',
ntrees=64,
nbins=32,
balance_classes=True,
binomial_double_trees=True)
.
.
.
drf_dx.train(x=X_train, y=Y_train,
training_frame=train_u, validation_frame=val_u,
max_runtime_secs=max_train_time_hrs*60*60)
(note the ignored columns) and the prediction logic just looks like
preds = model.predict(X)
where X is some (h2o)dataframe with more (or less) columns than in X_train used to train the model (includes some columns for post-processing exploration (in a Jupyter notebook)). Eg. X_train columns may look like
<columns to ignore (as seen in the code)> <columns to use a features for training> <outcome label>
and X columns may look like
<columns to ignore (as seen in the code)> <EVEN MORE COLUMNS TO IGNORE> <columns to use a features for training>
My question is: Is this going to confuse the model when making predictions? Ie. is the model getting the columns to use as features by column name (in which case, I don't think the different dataframe width would be a problem) or is it going by column position (in which case adding more data columns to each sample would shift the positions and become a problem) or something else? What happens since these columns were not explicated in the ignored_columns arg in the model constructor?
** Slight aside: should the weights_column name be in the ignored_columns list or not? The example in the docs (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/weights_column.html#weights-column) seems to use it as a predictor feature as well as seems to recommend it
For scoring, all computed metrics will take the observation weights into account (for Gains/Lift, AUC, confusion matrices, logloss, etc.), so it’s important to also provide the weights column for validation or test sets if you want to up/down-weight certain observations (ideally consistently between training and testing).
but these weight values are not something that comes with the data used in actual predictions.
I've summarized your question into a few distinct parts, so the answers will be in a Q/A type fashion.
1). When I use my_model.predict(X), how does H2O-3 know which columns to predict with?
H2O-3 will use the columns that you passed as predictors when you built your model (i.e. whatever you passed to the x argument in the estimator, or all the columns you included in your training_frame which were not: ignored using ignored_columns, passed as a target to the y argument, dropped because the column has a constant value.). My recommendation would be to use the x argument to specify your predictors and ignore the ignore_columns parameter. If X, the new dataframe you are predicting on includes columns that were not used when you were building a model, those columns will be ignored - so column names not column positions.
2) Should the weights column name be in the ignored column list?
No, if you pass the weights column to the ignored column list, that column will not be considered in any fashion during the model building phase. In fact, if you test this out, you should get a null pointer error or something similar.
3) Why is the "weights" column specified as a predictor and as the weights_column in the following code example?
This is a great question! I've created two Jira tickets one to update the documentation to clear up the confusion and another one to potentially add a user warning.
The short answer, is if you pass the same column to the predictors argument x and the weights_column argument, that column will only be used as a weight - it will not be used as a feature.
4) Does the user guide recommend using the weights as a feature and as a weight?
No, in the paragraph you are pointing to, the recommendation is to ensure that the column you pass as your weights_column exists in your training frame and validation frame - not that it should also be included as a feature.

How to skip columns in “List from a range” Criteria?

Is it possible to create a "List from a range" Data Validation rule in Google Sheets where the range skips columns?
For example:
Cells A6:A11 is limited to the range A1:B3. Cells B6:B11 is limited to the range A1:A3 AND C1:C3 (skips column B).
Creating a Data Validation rule for cells A6:A11 is trivial as I simply need to create a Criteria of "List from a range = A1:B3".
However, creating the Data Validation rule for cells B6:B11 is not so intuitive since Google Sheets does not allow me to create a Criteria using the syntax "List from a range = A1:A3, C1:C3".
Does the "List from a range" Criteria support a syntax that allows us to skip columns within a range?
Note: I currently have a work around for this where I defined an array formula in D1 = =ArrayFormula(if({1,""},A1:A3,C1:C3)) and then use D1:E3 as the Data Validation range. But this is a hacky solution and I'm hoping there is a better way to accomplish my goal.
The solution is to use { } to create a combination of columns or rows that will result in some sort of virtual table on-the-fly.
Example:
Assuming you have a spreadsheet with Name, Age, Gender, Phone and Address in A, B, C, D and E, and you want to skip the Gender (column C) while using the UNIQUE statement, you can use something like this.
Put in G1 the following formula:
=UNIQUE({A1:B, D1:E})
From the cell G1, the spreadsheet will populate the columns G, H, I and J with unique combinations of A, B, D and E, excluding the column C (Gender).
The same application of a combined range can be used in any formula and also you can combine multiple different ranges, including cross Spreadsheets and Files.
It is a very useful trick if you need to combine pieces of multiple spreadsheets for data visualization or reports. However, always remember you cannot manipulate the displayed data. You can still search through it, format it, etc., but you cannot change it. On the other hand, it will auto-update always if the data source gets updated, which is very useful.
Note: Try it with LOOKUP, VLOOPUP or HLOOKUP.

Resources