I have two similar data table that look like the following:
Data 1: Data 2:
categorical value categorical value
Sex Sex
Male 2 Male 3
Female 3 Female 1
Weight Weight
Mean 50 Mean 49
Median 53 Median 51
I would like to merge them without having to proc sort. How can I do so? I know classically, I would have to proc sort by categorical, and then merge by categorical but I don't want an alphabetized categorical category.
Desired output:
categorical value value2
Sex
Male 2 3
Female 3 1
Weight
Mean 50 49
Median 53 51
If it's one to one, each line with each line, just omit the BY statement in a data step merge.
data want;
merge t1 t2 (rename=value=new_value);
run;
proc sql;
create table dataMerged as
select data1.categorical, data1.value, data2.value as value2
from data1 LFET JOIN data2
on data1.categorical = data2.categorical;
quit;
Related
I have a matrix visual in PowerBI with two row fields, rf1 and rf2. This goups row field 2 (rf2) by row field 1 (rf1) such that each value in rf1 contains multiple values from rf2. rf1 and rf2 are stored in different tables in the data model, but the tables are connected directly.
I would like to show on the matrix visual the number of unique rf2 values within each rf1 against the corresponding row.
For example (first two colums as collapsable groups as in the matrix visual):
rf1
rf2
rf2 count
Values
group1
3
10
a
3
b
1
c
6
group2
2
5
a
2
d
3
Tot
--------------
5
15
What measure do I need to be able to generate this view?
How do I select a random row from the database based on the probability chance assigned to each row.
Example:
Make Chance Value
ALFA ROMEO 0.0024 20000
AUDI 0.0338 35000
BMW 0.0376 40000
CHEVROLET 0.0087 15000
CITROEN 0.016 15000
........
How do I select random make name and its value based on the probability it has to be chosen.
Would a combination of rand() and ORDER BY work? If so what is the best way to do this?
You can do this by using rand() and then using a cumulative sum. Assuming they add up to 100%:
select t.*
from (select t.*, (#cumep := #cumep + chance) as cumep
from t cross join
(select #cumep := 0, #r := rand()) params
) t
where #r between cumep - chance and cumep
limit 1;
Notes:
rand() is called once in a subquery to initialize a variable. Multiple calls to rand() are not desirable.
There is a remote chance that the random number will be exactly on the boundary between two values. The limit 1 arbitrarily chooses 1.
This could be made more efficient by stopping the subquery when cumep > #r.
The values do not have to be in any particular order.
This can be modified to handle chances where the sum is not equal to 1, but that would be another question.
I want to filter a pivot table in the following set up:
My Table:
Key Value1 Value2
1 23 a
2 33 b
3 1 c
4 5 d
My pivot table (simplified):
Key SUM of Value1 COUNTA of Value2
1 23 1
2 33 1
3 1 1
4 5 1
Grand Total 62 4
I now want to filter the pivot table by the values in this list:
Keys
1
2
4
So the resulting pivot table should look like this:
Key SUM of Value1 COUNTA of Value2
1 23 1
2 33 1
4 5 1
Grand Total 61 3
I thought this should be possible by using a custom formula in the pivot filter but it seems I have no way of using the current cell in the pivot e.g. to make a lookup.
I created a simple example of this setup here: https://docs.google.com/spreadsheets/d/1GlQDYtW8v8ri5L68RhryTZxwTikV_NXZQlccSI6_7pU/edit?usp=sharing
paste this formula in Filters!B1:
=ARRAYFORMULA(IFERROR(VLOOKUP(A1:A, Table!A1:C, {2,3}, 0), ))
and create a resulting pivot table from there:
demo spreadsheet
the data format i have is as follows:
When i use s2<- fill_(s1,c("Time")), it would use the last seen value..
however i would like all values of time listed below to repeat for each value of Animal
Group Animal Sex Time
1 1001 M 0
4
8
24
48
1 1002 M
1 1003 M
I have the following mock up table
#n a b group
1 1 1 1
2 1 2 1
3 2 2 1
4 2 3 1
5 3 4 2
6 3 5 2
7 4 5 2
I am using SAS for this problem. In column group, the rows that are interconnected through a and b are grouped. I will try to explain why these rows are in the same group
row 1 to 2 are in group 2 since they both have a = 1
row 3 is in group 2 since b = 2 in row 2 and 3 and row 2 is in group 1
row 3 and 4 are in group 1 since a = 2 in both rows and row 3 is in group 1
The overall logic is that if a row x contains the same value of a or b as row y, row x also belongs to the same group as y is a part of.
Following the same logic, row 5,6 and 7 are in group 2.
Is there any way to make an algorithm to find these groups?
Case I:
Grouping defined as to be item linkage within contiguous rows.
Use the LAG function to examine both variables prior values. Increase the group value if both have changed. For example
group + ( a ne lag(a) and b ne lag(b) );
Case II:
Grouping determined from pair item slot value linkages over all data.
From grouping pairs by either key
General statement of problem:
-----------------------------
Given: P = p{i} = (p{i,1},p{i,2}), a set of pairs (key1, key2).
Find: The distinct groups, G = g{x}, of P,
such that each pair p in a group g has this property:
key1 matches key1 of any other pair in g.
-or-
key2 matches key2 of any other pair in g.
Demonstrates
… an iterative way using hashes.
Two hashes maintain the groupId assigned to each key value.
Two additional hashes are used to maintain group mapping paths.
When the data can be passed without causing a mapping, then the groups have
been fully determined.
A final pass is done, at which point the groupIds are assigned to each
pair and the data is output to a table.