PivotTables in Excel (or, cross tabulations) are quite useful. Has anyone already thought about how to implement a similar function in Mathematica?
I am not familiar with the use of pivot tables, but taking the example on the page linked above, I propose this:
Needs["Calendar`"]
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /# Range#12,
Range#15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 150] & /# choices // Transpose;
This creates data that looks like:
{"East", "Girl", "Golf", "03/2011", 6, 12.29`, 6.18`},
{"West", "Boy", "Fancy", "08/2011", 6, 13.01`, 12.39`},
{"North", "Girl", "Golf", "05/2011", 1, 14.87`, 12.89`},
{"East", "Girl", "Golf", "09/2011", 3, 13.99`, 6.25`},
{"North", "Girl", "Golf", "09/2011", 13, 12.66`, 8.57`},
{"East", "Boy", "Fancy", "10/2011", 2, 14.46`, 6.85`},
{"South", "Boy", "Golf", "11/2011", 13, 12.45`, 11.23`}
...
Then:
h1 = Union#data[[All, "Region" /. key]];
h2 = Union#data[[All, "Ship Date" /. key]];
Reap[
Sow[#[[{"Units", "Ship Date"} /. key]], #[["Region" /. key]]] & ~Scan~ data,
h1,
Reap[Sow ### #2, h2, Total # #2 &][[2]] &
][[2]];
TableForm[Join ## %, TableHeadings -> {h1, h2}]
This is a rough example, but it gives an idea of how this may be done. If you have more specific requirements I will attempt to address them.
Here is an update in the manner of Sjoerd's answer.
The Manipulate block is largely copied, but I believe my pivotTableData is more efficient, and I sought to localize symbols correctly, since this is now presented as usable code rather than a rough example.
I start with the same sample data, but I embed the field headings, since I feel this is more representative of normal use.
data = ImportString[#, "TSV"][[1]] & /# Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]];
data = Transpose[{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}];
PrependTo[data,
{"Education", "South", "Sex", "Experience", "Union", "Wage", "Age", "Race", "Occupation", "Sector", "Marriatal status"}
];
My pivotTableData is self contained.
pivotTableData[data_, field1_, field2_, dependent_, op_] :=
Module[{key, sow, h1, h2, ff},
(key## = #2[[1]]) & ~MapIndexed~ data[[1]];
sow = #[[key /# {dependent, field2}]] ~Sow~ #[[key#field1]] &;
{h1, h2} = Union#data[[2 ;;, key##]] & /# {field1, field2};
ff = # /. {{} -> Missing#"NotAvailable", _ :> op ## #} &;
{
{h1, h2},
Join ## Reap[sow ~Scan~ Rest#data, h1, ff /# Reap[Sow ### #2, h2][[2]] &][[2]]
}
]
pivotTable relies only on pivotTableData:
pivotTable[data_?MatrixQ] :=
DynamicModule[{raw, t, header = data[[1]], opList =
{Mean -> "Mean of \[Rule]",
Total -> "Sum of \[Rule]",
Length -> "Count of \[Rule]",
StandardDeviation -> "SD of \[Rule]",
Min -> "Min of \[Rule]",
Max -> "Max of \[Rule]"}},
Manipulate[
raw = pivotTableData[data, f1, f2, f3, op];
t = ConstantArray["", Length /# raw[[1]] + 2];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{f2, header}];
t[[2, 1]] = Control[{f1, header}];
t[[1, 2]] = Control[{f3, header}];
{{t[[3 ;; -1, 1]], t[[2, 3 ;; -1]]}, t[[3 ;; -1, 3 ;; -1]]} = raw;
TableView[N#t, Dividers -> All],
Initialization :> {op = Mean, f1 = data[[1,1]], f2 = data[[1,2]], f3 = data[[1,3]]}
]
]
Use is simply:
pivotTable[data]
A quick-and-dirty pivot table visualization:
I'll start with a more interesting real-life data set:
data = ImportString[#, "TSV"][[1]] & /#
Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]
];
A bit of post-processing:
data =
{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical",
4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}\[Transpose];
header = {"Education", "South", "Sex", "Experience", "Union", "Wage",
"Age", "Race", "Occupation", "Sector", "Marriatal status"};
MapIndexed[(headerNumber[#1] = #2[[1]]) &, header];
levelNames = Union /# Transpose[data];
levelLength = Length /# levelNames;
Now for the real stuff. It also uses the function SelectEquivalents defined in What is in your Mathematica tool bag?
pivotTableData[levelName1_, levelName2_, dependent_, op_] :=
Table[
SelectEquivalents[data,
FinalFunction -> (If[Length[#] == 0, Missing["NotAvailable"], op[# // Flatten]] &),
TagPattern ->
_?(#[[headerNumber[levelName1]]] == levelMember1 &&
#[[headerNumber[levelName2]]] == levelMember2 &),
TransformElement -> (#[[headerNumber[dependent]]] &)
],
{levelMember1, levelNames[[headerNumber[levelName1]]]},
{levelMember2, levelNames[[headerNumber[levelName2]]]}
]
DynamicModule[
{opList =
{Mean ->"Mean of \[Rule]", Total ->"Sum of \[Rule]", Length ->"Count of \[Rule]",
StandardDeviation -> "SD of \[Rule]", Min -> "Min of \[Rule]",
Max -> "Max of \[Rule]"
}, t},
Manipulate[
t=Table["",{levelLength[[headerNumber[h1]]]+2},{levelLength[[headerNumber[h2]]]+2}];
t[[3 ;; -1, 1]] = levelNames[[headerNumber[h1]]];
t[[2, 3 ;; -1]] = levelNames[[headerNumber[h2]]];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{h2, header}];
t[[2, 1]] = Control[{h1, header}];
t[[1, 2]] = Control[{h3, header}];
t[[3 ;; -1, 3 ;; -1]] = pivotTableData[h1, h2, h3, op] // N;
TableView[t, Dividers -> All],
Initialization :> {op = Mean, h1 = "Sector", h2 = "Union", h3 = "Wage"}
]
]
There's still a bit of work to do. The DynamicModule should be turned into a fully standalone function, with the header stuff more streamlined, but this should be sufficient for a first impression.
#Mr.Wizard's answer is indeed robust and long-lasting as it grounds on ReapSow method suitable for some map reduce jobs in Mathematica. Due to the fact that MMA itself develops, consider a new option as well.
GroupBy (introduced in Mathematica v.10.0) provides a generalization of the map reduce operation.
So, the above data job may be implemented as follows (partly an overkill for readability):
headings = Union # data[[All, #]] & /# {1, 4}
{{"East", "North", "South", "West"}, {"01/2011", "02/2011", "03/2011",
"04/2011", "05/2011", "06/2011", "07/2011", "08/2011", "09/2011",
"10/2011", "11/2011", "12/2011"}}
We may use Outer to set up a rectangular template for TableForm:
template = Outer[List, Apply[Sequence][headings]];
Main job with GroupBy and Total as third argument:
pattern = Append[Normal #
GroupBy[data, (#[[{1, 4}]] &) -> (#[[-1]] &), Total],
_ -> Null];
Finally, inject pattern into template (and apply TableForm headings for beauty):
TableForm[Replace[template, pattern, {2}], TableHeadings -> headings]
This outputs some:
Note: we have made a total of last column in data. (Many other aggregations are, of course, possible.)
Use http://www.wolfram.com/products/applications/excel_link/ , this way you have the best of both worlds. This product creates a flawless link between Excel and mma, 2-ways.
Here's what I've come up with. It uses the function SelectEquivalents defined in What is in your Mathematica tool bag?. Function1 and Function2 are meant to have different grouping possibilities of criteria1 and criteria2. FilterFunction is here in order to define an arbitrary filter formula on the data based on the header names.
Using the data example of Mr. Wizard here are some usages of this function.
criteria={"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
criteria1 = "Region";
criteria2 = "Ship Date";
consideredData = "Units";
PivotTable[data,criteria,criteria1,criteria2,consideredData]
A neat example
function2 = If[ToExpression#StringTake[#, 2] <= 6, "First Semester", "Second Semester"] &;
PivotTable[data,criteria,criteria1,criteria2,consideredData,FilterFunction->("Gender"=="Girl"&&"Units"*"Price"<=100&),Function2->function2]
Here's the definition of the function
keysToIndex[keys_] :=
Module[{keyIndex},
(keyIndex[#1] = #2[[1]])&~MapIndexed~keys;
keyIndex
];
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
Options[PivotTable]={Function1->Identity,Function2->Identity,FilterFunction->(True &),AggregationFunction->Total,FormatOutput->True};
PivotTable[data_,criteria_,criteria1_,criteria2_,consideredData_,OptionsPattern[]]:=
Module[{criteriaIndex, criteria1Index, criteria2Index, consideredDataIndex, criteria1Function, criteria2Function, filterFunctionTranslated, filteredResult, keys1, keys1Index, keys2, keys2Index, resultTable, function1, function2, filterFunction, aggregationFunction, formatOutput,p,sharp},
function1 = OptionValue#Function1;
function2 = OptionValue#Function2;
filterFunction = OptionValue#FilterFunction;
aggregationFunction = OptionValue#AggregationFunction;
formatOutput=OptionValue#FormatOutput;
criteriaIndex=keysToIndex[criteria];
criteria1Index=criteriaIndex#criteria1;
criteria2Index=criteriaIndex#criteria2;
consideredDataIndex=criteriaIndex#consideredData;
criteria1Function=Composition[function1,#[[criteria1Index]]&];
criteria2Function=Composition[function2,#[[criteria2Index]]&];
filterFunctionTranslated = filterFunction/.(# -> p[sharp, criteriaIndex##]& /# criteria /. sharp -> #)/.p->Part;
filteredResult=
SelectEquivalents[
data
,
TagElement->({criteria1Function##,criteria2Function##,filterFunctionTranslated##}&)
,
TransformElement->(#[[consideredDataIndex]]&)
,
TagPattern->_?(#[[3]]&)
,
TransformResults->(Append[Most##1,aggregationFunction##2]&)
];
If[formatOutput,
keys1=filteredResult[[All,1]]//Union//Sort;
keys2=filteredResult[[All,2]]//Union//Sort;
resultTable=
SelectEquivalents[
filteredResult
,
TagElement->(#[[{1,2}]]&)
,
TransformElement->(#[[3]]&)
,
TagPattern->Flatten[Outer[List, keys1, keys2], 1]
,
FinalFunction-> (InverseFlatten[Flatten[#/.{}->Missing[]],{Length#keys1,Length#keys2}]&)
];
TableForm[resultTable,TableHeadings->{keys1,keys2}]
,
filteredResult
]
];
I little latter in the game. Here is another self contained solution with object like form.
Using random data created by #Mr.Wizard:
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /# Range#12,
Range#15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 5000] & /# choices // Transpose;
Using an MapIndexed and SparseArray as key functions, here is the code:
Options[createPivotTable]={"RowColValueHeads"-> {1,2,3},"Function"-> Total};
createPivotTable[data_,opts:OptionsPattern[{createPivotTable}]]:=Module[{r,c,v,aggDataIndex,rowRule,colRule,pivot},
{r,c,v}=OptionValue["RowColValueHeads"];
pivot["Row"]= Union#data[[All,r]];
pivot["Col"]= Union#data[[All,c]];
rowRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Row"]];
colRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Col"]];
aggDataIndex={#[[1,r]]/.rowRule,#[[1,c]]/.colRule}->OptionValue["Function"]##[[All,v]]&/#GatherBy[data,#[[{r,c}]]&];
pivot["Data"]=Normal#SparseArray#aggDataIndex;
pivot["Properties"]={"Data","Row","Col"};
pivot["Table"]=TableForm[pivot["Data"], TableHeadings -> {pivot["Row"], pivot["Col"]}];
Format[pivot]:="PivotObject";
pivot
]
That you can use as:
pivot=createPivotTable[data,"RowColValueHeads"-> ({"Ship Date","Region","Units"}/.key)];
pivot["Table"]
pivot["Data"]
pivot["Row"]
pivot["Col"]
To get:
I believe that the speed is faster than #Ms.Wizard, but I have to make a better test, and don't have time now.
Ruby (and Perl) has a concept of the flip flop:
file = File.open("ordinal")
while file.gets
print if ($_ =~ /third/) .. ($_ =~ /fifth/)
end
which given a list of ordinals, such as
first
second
third
fourth
fifth
sixth
would start printing when it reached "third" and stop when it reached "fifth":
third
fourth
fifth
Is there a functional programming concept analogous to this, or would this normally be described in terms of takewhiles? I'm not asking about a particular language, just what term you'd use to describe it.
In a functional language such as haskell, you would pass in the flip and flop conditions as predicates, and filter an input list based on it. For example, the following is a definition of flipflop in haskell (don't worry about the implementation if you don't know haskell - the key part is how it is used):
flipflop flip flop =
uncurry (++) . second (take 1) . break flop . dropWhile (not . flip)
This is how it can be used:
> flipflop (== 3) (== 5) [1..10]
[3,4,5]
It is an example of making an effectively new language construct just by using higher ordered function.
I don't know if there is a special name for that construct in functional languages.
Depends on functional language. How about this?
ff_gen =
lambda{ |first, *conditions|
flipflop = false
condition = first
lambda{ |v|
if condition && condition[v]
condition = conditions.shift
flipflop = !flipflop
true
else
flipflop
end
}
}
ff = ff_gen[lambda{|v| v == 3}, lambda{|v| v == 5}, lambda{|v| v == 7}, lambda{|v| v == 11}]
puts (0..20).select{ |i| ff[i] }.inspect # => [3, 4, 5, 7, 8, 9, 10, 11]
Added: Of course, Ruby is not a pure functional language, so I decided to rewrite it in Erlang:
#!/usr/bin/env escript
flipflop(E, {[H|T] = Conditions, FlipFlop}) ->
case H(E) of
true ->
{true, {T, not FlipFlop}};
false ->
{FlipFlop, {Conditions, FlipFlop}}
end;
flipflop(_, {[], FlipFlop}) ->
{FlipFlop, {[], FlipFlop}}.
flipflop_init(Conditions) ->
{[], {Conditions, false}}.
main([]) ->
{L, _} =
lists:foldl(
fun(E, {L2, FFState}) ->
case flipflop(E, FFState) of
{true, FFState2} ->
{[E|L2], FFState2};
{false, FFState2} ->
{L2, FFState2}
end
end,
flipflop_init([
fun(E) -> E == 3 end,
fun(E) -> E == 5 end,
fun(E) -> E == 7 end,
fun(E) -> E == 11 end
]),
lists:seq(0,20)
),
io:format("~p~n", [lists:reverse(L)]),
ok.
Note: In fact, classic flip-flop should work like dropwhile(!first) -> takewhile(!second), so Ruby's flip-flop is ad hoc one (compare with flip-flop in electronics).
Same as #nanothief's solution, but in Scala:
def flipFlop[A](flip: A => Boolean, flop: A => Boolean, seq: Seq[A]): Seq[A] = {
val (p, q) = seq.dropWhile(!flip(_)).span(!flop(_))
p ++ q.take(1)
}
Sample runs:
> flipFlop[Int](_ == 3, _ == 5, Nil)
List()
> flipFlop[Int](_ == 3, _ == 5, 1 to 19)
Vector(3, 4, 5)