PROC TABULATE Rearrange header subgroups independent of the data - sorting

This is a follow up to a previous question which considered only a single variable with two subgroups. In that question, the solution was to order the subgroups by sorting the data.
That approach, however, breaks down when there are several variables. It becomes a whack-a-mole of sorting the variables into a sequence which produces the desired result. Each subsequent BY variable subgroup depends on the previous levels. If not all subgroups are present in a higher level group, then not all subgroups can be arranged into the desired order.
Either PROC TABULATE is not the appropriate tool for this task (i.e. obtaining percents of various groups across several variables) or there is a technique that allows for subgroups to be arranged independently of the data.
Do I go digging for the needle in the documentation haystack or do I reinvent the wheel? Any insights you could give me would be appreciated.
Example:
To give an illustration, say that I want to create a table with the subgroups of each variable arranged in (Y, N) order. Notice how var4 is not output in the correct order. By the time the other variables have been sorted, there aren't enough var4 values present to be sorted into the desired order.
data example;
input group $ var1 $ var2 $ var3 $ var4 $;
datalines;
1 Y Y N Y
1 N Y N N
2 Y N Y N
2 Y Y Y N
3 N N N Y
3 N Y Y N
;
run;
proc sort data = example out = sorted;
by descending var1
descending var2
descending var3
descending var4
;
run;
title 'Percent';
proc tabulate data = sorted order = data;
class group var1 var2 var3 var4;
table group='Group',
all = 'Total'*pctn=''
var1 = 'Variable 1'*pctn=''
var2 = 'Variable 2'*pctn=''
var3 = 'Variable 3'*pctn=''
var4 = 'Variable 4'*pctn='';
run;
It may be possible to devise a combination of BY variables in the PROC SORT which give a (Y, N) subgrouping order, but it would involve a bunch of fiddling which is not robust against changes in data. If the table needs to be updated monthly, then each month you would have to fiddle with the sorting.

No workaround or reinventing the wheel required - this is exactly what classdata datasets are for:
data example;
input group (var1-var4) ($1. +1);
datalines;
1 Y Y N Y
1 N Y N N
2 Y N Y N
2 Y Y Y N
3 N N N Y
3 N Y Y N
;
run;
data classtypes;
do group = 1 to 3;
do var1 = 'Y','N';
do var2 = 'Y','N';
do var3 = 'Y','N';
do var4 = 'Y','N';
output;
end;
end;
end;
end;
end;
run;
title 'Percent';
proc tabulate data = example order = data classdata=classtypes;
class group var1 var2 var3 var4;
table group='Group',
all = 'Total'*pctn=''
var1 = 'Variable 1'*pctn=''
var2 = 'Variable 2'*pctn=''
var3 = 'Variable 3'*pctn=''
var4 = 'Variable 4'*pctn='';
run;
As a bonus, this also avoids having to sort your main input dataset - row/column order in the output table is determined by the order of the classdata dataset.

Related

How to extract optimization problem matrices A,b,c using JuMP in Julia

I create an optimization model in Julia-JuMP using the symbolic variables and constraints e.g. below
using JuMP
using CPLEX
# model
Mod = Model(CPLEX.Optimizer)
# sets
I = 1:2;
# Variables
x = #variable( Mod , [I] , base_name = "x" )
y = #variable( Mod , [I] , base_name = "y" )
# constraints
Con1 = #constraint( Mod , [i in I] , 2 * x[i] + 3 * y[i] <= 100 )
# objective
ObjFun = #objective( Mod , Max , sum( x[i] + 2 * y[i] for i in I) ) ;
# solve
optimize!(Mod)
I guess JuMP creates the problem in the form minimize c'*x subj to Ax < b before it is passes to the solver CPLEX. I want to extract the matrices A,b,c. In the above example I would expect something like:
A
2×4 Array{Int64,2}:
2 0 3 0
0 2 0 3
b
2-element Array{Int64,1}:
100
100
c
4-element Array{Int64,1}:
1
1
2
2
In MATLAB the function prob2struct can do this https://www.mathworks.com/help/optim/ug/optim.problemdef.optimizationproblem.prob2struct.html
In there a JuMP function that can do this?
This is not easily possible as far as I am aware.
The problem is stored in the underlying MathOptInterface (MOI) specific data structures. For example, constraints are always stored as MOI.AbstractFunction - in - MOI.AbstractSet. The same is true for the MOI.ObjectiveFunction. (see MOI documentation: https://jump.dev/MathOptInterface.jl/dev/apimanual/#Functions-1)
You can however, try to recompute the objective function terms and the constraints in matrix-vector-form.
For example, assuming you still have your JuMP.Model Mod, you can examine the objective function closer by typing:
using MathOptInterface
const MOI = MathOptInterface
# this only works if you have a linear objective function (the model has a ScalarAffineFunction as its objective)
obj = MOI.get(Mod, MOI.ObjectiveFunction{MOI.ScalarAffineFunction{Float64}}())
# take a look at the terms
obj.terms
# from this you could extract your vector c
c = zeros(4)
for term in obj.terms
c[term.variable_index.value] = term.coefficient
end
#show(c)
This gives indeed: c = [1.;1.;2.;2.].
You can do something similar for the underlying MOI.constraints.
# list all the constraints present in the model
cons = MOI.get(Mod, MOI.ListOfConstraints())
#show(cons)
in this case we only have one type of constraint, i.e. (MOI.ScalarAffineFunction{Float64} in MOI.LessThan{Float64})
# get the constraint indices for this combination of F(unction) in S(et)
F = cons[1][1]
S = cons[1][2]
ci = MOI.get(Mod, MOI.ListOfConstraintIndices{F,S}())
You get two constraint indices (stored in the array ci), because there are two constraints for this combination F - in - S.
Let's examine the first one of them closer:
ci1 = ci[1]
# to get the function and set corresponding to this constraint (index):
moi_backend = backend(Mod)
f = MOI.get(moi_backend, MOI.ConstraintFunction(), ci1)
f is again of type MOI.ScalarAffineFunction which corresponds to one row a1 in your A = [a1; ...; am] matrix. The row is given by:
a1 = zeros(4)
for term in f.terms
a1[term.variable_index.value] = term.coefficient
end
#show(a1) # gives [2.0 0 3.0 0] (the first row of your A matrix)
To get the corresponding first entry b1 of your b = [b1; ...; bm] vector, you have to look at the constraint set of that same constraint index ci1:
s = MOI.get(moi_backend, MOI.ConstraintSet(), ci1)
#show(s) # MathOptInterface.LessThan{Float64}(100.0)
b1 = s.upper
I hope this gives you some intuition on how the data is stored in MathOptInterface format.
You would have to do this for all constraints and all constraint types and stack them as rows in your constraint matrix A and vector b.
Use the following lines:
Pkg.add("NLPModelsJuMP")
using NLPModelsJuMP
nlp = MathOptNLPModel(model) # the input "< model >" is the name of the model you created by JuMP before with variables and constraints (and optionally the objective function) attached to it.
x = zeros(nlp.meta.nvar)
b = NLPModelsJuMP.grad(nlp, x)
A = Matrix(NLPModelsJuMP.jac(nlp, x))
I didn't try it myself. But the MathProgBase package seems to be able to provide A, b, and c in matrix form.

Pseudocode or C# algorithm that returns all possible combinations sets for a number of variables

I have 3 variables with some possible values.
For example:
Var1 - possible values: 1,2,3
Var2 - possible values: a, b, c
var3 - possible values: false, true
Can you please help with an approach that returns all possible combinations?
The result be like:
 1,a,false
 1,a,true,
 1,b,false
 1,b,true,
 1,c,false
 1,c,true
 2,a,false
 2,a,true
 2,b,false
 Etc..
I wish the algorithm could apply to any levels of combinations, for example, the algorithm to work on 4 or 5 varibles with other possible values.
It looks like you're trying to enumerate Cartesian products. Assuming your items are in list_of_lists, this recursive function in pseudo-code will do it:
enumerate_cartesian_prducts(list_of_lists):
if list_of_lists is empty:
return [[]]
this_list = list_of_lists[0]
other_lists = list_of_lists[1: ]
other_cartesian_products = []
return [(e + other_cartesian_product) \
for e in this_list and other_cartesian_product in other_cartesian_products]
Note how the last line would probably be a double loop in most languages: it iterates over all the elements in the first list, all the lists in the cartesian products of the rest, and creates a list of all the appended results.
The simplest solution is to have n nested loops:
for each possible value v1 in var1
for each possible value v2 in var2
for each possible value v3 in var3
print(v1,v2,v3);
end for v3
end for v2
end for v1
In more general case, let's assume you have list of lists that contains n lists(one for every var) and each of these lists contains possible values for each variable. You can solve problem with following recursive function all_combinations.
list_of_lists=[[1...][a...][false...]];
current_comb=[];
all_combinations(list_of_lists,current_comb);
function all_combinations(list_of_lists,current_comb)
if (list_of_lists=[])
print(current_comb);
return;
end if
current_list=list_of_lists[0];
remaining_lists=list_of_lists[1:end];
for each v in current_list
tmp=current_comb;tmp.Append(v);
all_combinations(remaining_lists,tmp);
end for v
Of course when adding variables, soon you will need to deal with combinatorial explosion.
The only clean solution is:
have a function mix( A, B ) which takes two lists and returns a list. That's trivial.
Your final code just looks like this:
result = null
result = mix( result, one of your lists );
result = mix( result, another of your lists );
result = mix( result, yet another of your lists );
result = mix( result, yet another list );
result = mix( result, one more list );
example of mix(A,B) ...
mix(A,B)
result = null
for each A
for each B
result += AB
return result
Assume that each variable has a set or vector associated with is. That is:
set1 = [1, 2, 3]
set2 = [a, b, c]
set3 = [F, T]
Then, one way is to loop over these sets in nested "for" loops. Assume that your output structure is a list of 3-element lists. That is, your output desired looks like this:
[[1,a,F], [1,a,T], [1,b,F],......]
Also assume that (like in Python) you can use a function like "append" to append a 2-element list to your big list. Then try this:
myList = [] #empty list
for i in set1:
for j in set2:
for k in set3:
myList.append([i, j, k]) #Appends 3-element list to big list
You may need to do a deepcopy in the append statement so that all the i's, j's, and k's arene't updated in your master list each time you run through an iteration. This may not be the most efficient, but I think it's relatively straightforward.
Here's something in JavaScript that's pseudocode-like. (I've never coded in C#; maybe I'll try to convert it.)
var sets = [[1,2,3],["a","b","c"],[false,true]],
result = [];
function f(arr,i){
if (i == sets.length){
result.push(arr);
return;
}
for (var j=0; j<sets[i].length; j++){
_arr = arr.slice(); // make a copy of arr
_arr.push(sets[i][j]);
f(_arr,i+1);
}
}
f([],0)
Output:
console.log(result);
[[1,"a",false]
,[1,"a",true]
,[1,"b",false]
,[1,"b",true]
,[1,"c",false]
,[1,"c",true]
,[2,"a",false]
,[2,"a",true]
,[2,"b",false]
,[2,"b",true]
,[2,"c",false]
,[2,"c",true]
,[3,"a",false]
,[3,"a",true]
,[3,"b",false]
,[3,"b",true]
,[3,"c",false]
,[3,"c",true]]
You really ought to look for this elsewhere, and it's not a good stackoverflow question. It's homework and there is an algorithm for this already if you search more using the proper terms.
It's quite simple in fact, if you generalize the algorithm for generating all combinations of digits in a binary string, you should be able to get it:
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
Notice that the right-most column alternates its values every cell, while the second-from-right column alternates every 2 cells, the next column over from that alternates every 4 cells, and the final digit alternates every 8 cells.
For your case, think of the above as what happens when your sets are:
Var1 - possible values: 0,1
Var2 - possible values: 0,1
Var3 - possible values: 0,1
Var4 - possible values: 0,1
Start a counter that keeps track of your position in each set, and start by cycling through the "rightmost" set a full time before bumping the position of the "next-from-right" set by 1. Continue cycling the the sets in this way, bumping a set when the one to its "right" cycles over, until you've finished cycling the set in the "most significant position". You will have generated all possible combinations in the sets.
The other answers have focused on "give the codez", which really just rewards you for posting your homework question here... so I thought I would at least explain a little.

Compute matrix product in base sas ( not using IML)

In order to Compute the product of 2 matrices I am using this method:
First I put my matrices in the long format (col,row,value)
I use proc sql to compute the product of 2 matrices.
I use proc transpose to put the result of precedent step in the wide format.
My question is there is simpler method? Or at least how can I simplify my code?
here my code:
/* macro to put a matrix in the long format*/
%macro reshape(in_A = , ou_A= );
data &ou_A.;
set &in_A.;
array arr_A{*} _numeric_;
row = _n_;
do col = 1 to dim(arr_A);
value = arr_A{col};
output;
end;
keep row col value;
run;
%mend;
%macro prod_mat( in_A = , in_B= ,ou_AB =);
/* put the matrix in the long format */
%reshape(in_A=&in_A.,ou_A=lA);
%reshape(in_A=&in_B.,ou_A=lB);
/* compute product */
PROC SQL ;
CREATE TABLE PAB AS
SELECT lA.row, lB.col, SUM(lA.value * lB.value) as value
FROM lA JOIN lB ON lA.col = lB.row
GROUP BY lA.row, lB.col;
QUIT;
/* reshape the output to the wide format */
proc transpose data=PAB out=&ou_AB.(DROP=_name_) prefix=x;
by row ;
id col;
var value;
run;
%mend;
data A ;
input x1 x2 x3;
datalines ;
1 2 3
3 4 4
5 6 9
;
data B ;
input x1 x2;
datalines ;
1 2
3 4
4 5
;
%prod_mat(in_A =A,in_B=B,ou_AB=AB)
Well, here's my variant. It's not that the code itself is shorter then yours, but for big matrices it'll work faster because it avoids using SQL-join with cartesian product of all elements.
The main idea - full join (cartesian product) of rows of A and transposed B and then multiplying corresponding columns. E.g. in case of 3x3 and 3x2 matrices, we'll need to:
1) multiply and sum up in each row of the merged dataset column1*column4+column2*column5+column3*column6;
2) repeat it for the second row;
3) output the both values in one row.
%macro prod_mat_merge(in_A =,in_B=,ou_AB=);
/*determine number of rows and columns in the 2nd matrix*/
%let B_id=%sysfunc(open(&in_B));
%let B_rows=%sysfunc(attrn(&B_id,nobs));
%let B_cols=%sysfunc(attrn(&B_id,nvars));
%let rc=%sysfunc(close(&B_id));
/*transpose the 2nd matrix*/
proc transpose data=&in_B out=t&in_B(drop=_:);run;
/*making Cartesian product of the 1st and transposed 2nd matrices*/
data &ou_AB;
do until(eofA);
set &in_A end=eofA;
do i=1 to n;
set t&in_B nobs=n point=i;
output;
end;
end;
run;
/*multiplication*/
data &ou_AB;
/*new columns for products, equal to number of columns in the 2nd matrix*/
array p[&B_cols];
do j=1 to &B_cols;
p[j]=0;
set &ou_AB;
array col _ALL_;
/*multiply corresponding pairs of columns*/
do i=&B_cols+2 to &B_cols+1+&B_rows;
p[j]+col[i]*col[i+&B_rows];
end;
end;
output;
keep p:;
run;
%mend prod_mat_merge;
I've tested the both methods multiplying two random matrices 100x100 each. The method with reshaping and SQL-join takes ~1.5 sec, while the method with merging takes ~0.2 sec.

Using non-continuous integers as identifiers in cells or structs in Matlab

I want to store some results in the following way:
Res.0 = magic(4); % or Res.baseCase = magic(4);
Res.2 = magic(5); % I would prefer to use integers on all other
Res.7 = magic(6); % elements than the first.
Res.2000 = 1:3;
I want to use numbers between 0 and 3000, but I will only use approx 100-300 of them. Is it possible to use 0 as an identifier, or will I have to use a minimum value of 1? (The numbers have meaning, so I would prefer if I don't need to change them). Can I use numbers as identifiers in structs?
I know I can do the following:
Res{(last number + 1)} = magic(4);
Res{2} = magic(5);
Res{7} = magic(6);
Res{2000} = 1:3;
And just remember that the last element is really the "number zero" element.
In this case I will create a bunch of empty cell elements [] in the non-populated positions. Does this cause a problem? I assume it will be best to assign the last element first, to avoid creating a growing cell, or does this not have an effect? Is this an efficient way of doing this?
Which will be most efficient, struct's or cell's? (If it's possible to use struct's, that is).
My main concern is computational efficiency.
Thanks!
Let's review your options:
Indexing into a cell arrays
MATLAB indices start from 1, not from 0. If you want to store your data in cell arrays, in the worst case, you could always use the subscript k + 1 to index into cell corresponding to the k-th identifier (k ≥ 0). In my opinion, using the last element as the "base case" is more confusing. So what you'll have is:
Res{1} = magic(4); %// Base case
Res{2} = magic(5); %// Corresponds to identifier 1
...
Res{k + 1} = ... %// Corresponds to indentifier k
Accessing fields in structures
Field names in structures are not allowed to begin with numbers, but they are allowed to contain them starting from the second character. Hence, you can build your structure like so:
Res.c0 = magic(4); %// Base case
Res.c1 = magic(5); %// Corresponds to identifier 1
Res.c2 = magic(6); %// Corresponds to identifier 2
%// And so on...
You can use dynamic field referencing to access any field, for instance:
k = 3;
kth_field = Res.(sprintf('c%d', k)); %// Access field k = 3 (i.e field 'c3')
I can't say which alternative seems more elegant, but I believe that indexing into a cell should be faster than dynamic field referencing (but you're welcome to check that out and prove me wrong).
As an alternative to EitanT's answer, it sounds like matlab's map containers are exactly what you need. They can deal with any type of key and the value may be a struct or cell.
EDIT:
In your case this will be:
k = {0,2,7,2000};
Res = {magic(4),magic(5),magic(6),1:3};
ResMap = containers.Map(k, Res)
ResMap(0)
ans =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
I agree with the idea in #wakjah 's comment. If you are concerned about the efficiency of your program it's better to change the interpretation of the problem. In my opinion there is definitely a way that you could priorotize your data. This prioritization could be according to the time you acquired them, or with respect to the inputs that they are calculated. If you set any kind of priority among them, you can sort them into an structure or cell (structure might be faster).
So
Priority (Your Current Index Meaning) Data
1 0 magic(4)
2 2 magic(5)
3 7 magic(6)
4 2000 1:3
Then:
% Initialize Result structure which is different than your Res.
Result(300).Data = 0; % 300 the maximum number of data
Result(300).idx = 0; % idx or anything that represent the meaning of your current index.
% Assigning
k = 1; % Priority index
Result(k).idx = 0; Result(k).Data = magic(4); k = k + 1;
Result(k).idx = 2; Result(k).Data = magic(5); k = k + 1;
Result(k).idx = 7; Result(k).Data = magic(6); k = k + 1;
...

Matrix input based on random value output (Matlab)

I am looking to use the output of a random value to choose the column which will be input into a new matrix, called Matrix1.
I have something like the following:
a = [1 2 3 4; 5 3 6 2; 9 8 1 4];
n = length(a(1,:))-1;
RandomValue = round(rand()*n+1);
Matrix1 = [];
L=3;
for i=n:-1:1
RandomValue
if RandomValue < L
Matrix1 = [a(:,i) Matrix1];
a(:, i) = [];
Matrix1
end
end
E.g. If the random value is 2, I would like to place [2;3;8] into the Matrix1 (based on the value of the first row). How could I modify the code, so instead of i it is that Randomvalue number?
I don't follow you exactly, but I can't see why some variant of
Matrix1 = [a(:,round(rand()*n+1)) Matrix1]
isn't appropriate. Better than rounding a rand would be to use the randi function which returns a pseudo-random integer, maybe
Matrix1 = [a(:,randi(n)) Matrix1]
But if, as #angainor has suggested, you are trying to permute the columns of your input matrix, then look to the permute function.

Resources