Mathematica : Write matrix-data to XML ; Read matrix-data from XML - wolfram-mathematica

( Part of ) a matrix representation of the objects I work with is the following:
{
{1, A,{100, 20, 30},10},
{2, B,{100}, 0},
{3, X,{120,20},0},
{4, C,{},11}
}
I want to store this data externally in XML Format as follows
<data>
<row key="1" val1="A" val2="10"> <occ>100</occ><occ>20</occ><occ>30</occ></row>
<row key="2" val1="B" val2="0"><occ>100</occ></row>
<row key="3" val1="X" val2="0"><occ>120</occ><occ>20</occ></row>
<row key="4" val1="C" val2="11"></row>
</data>
I am looking for an example on how to:
- transfer the the matrix to XML ( which Mathematica commands ? )
- parse the XML string back to matrix format after having imported the XML file.

Here we import your data as symbolic XML:
In[50]:= xml = Import["C:\\Temp\\matrixData.xml"]
Out[50]= XMLObject["Document"][{},
XMLElement["data", {},
{XMLElement["row", {"key" -> "1", "val1" -> "A", "val2" -> "10"},
{XMLElement["occ", {}, {"100"}], XMLElement["occ", {}, {"20"}],
XMLElement["occ", {}, {"30"}]}],
XMLElement["row", {"key" -> "2", "val1" -> "B", "val2" -> "0"},
{XMLElement["occ", {}, {"100"}]}],
XMLElement[ "row", {"key" -> "3", "val1" -> "X","val2" -> "0"},
{XMLElement["occ", {}, {"120"}], XMLElement["occ", {}, {"20"}]}],
XMLElement["row", {"key" -> "4", "val1" -> "C", "val2" -> "11"}, {}]}], {}]
Here we parse into a matrix:
In[51]:= matr =
xml /. XMLObject["Document"][{}, data_, _] :> data /.
XMLElement["data", _, children_] :> children /.
XMLElement["row", attrs_, vals_] :> {"key" /. attrs,
"val1" /. attrs, vals /. XMLElement["occ", _, {val_}] :> val,
"val2" /. attrs} /. s_String :> ToExpression[s]
Out[51]= {{1, A, {100, 20, 30}, 10}, {2, B, {100}, 0}, {3,X, {120, 20}, 0}, {4, C, {}, 11}}
The code is short and economical due to the use of rule-within-a-rule technique. I discuss it here. One nice application of this technique can be found here.
Here is the reverse:
XMLObject["Document"][{}, XMLElement["data", {},
Replace[matr, {key_, val1_, vals_List, val2_} :>
XMLElement[ "row", {"key" -> ToString[key], "val1" -> ToString[val1],
"val2" -> ToString[val2]},
XMLElement["occ", {}, {ToString[#]}] & /# vals], {1}]], {}]
I do not show the result but it is the same as the imported symbolic XML. You can call Export with this symbolic XML and it will know how to handle it.

Related

Can I get AST for wolfram language expressions?

In Mathematica, we use FullForm or TreeForm or Developer'WriteExpressionJSONString to get the syntax details for given expressions. How can I get a complete AST (Abstract Syntax Tree) for any expression? For example, is there any function toAST such that
toAST["a +b c\nSin[%];"]
which will give the result like this:
{
{
Plus,
0,
6,
{a, 0, 1},
{Multiply, 3, 6, {b, 3, 4}, {c, 5, 6}}
},
{
CompoundExpression,
7,
14,
{Sin, 7, 13, {Out, 11, 12}},
{Null, 14, 14}
}
}
Probably your best option ATM is codeparser. It'll be bundled with the next version of Mathematica, but you can use PacletInstall["CodeParser"] to install it for now.
The function you want to use (i.e. for ASTs) is CodeParse. (You can get CSTs with CodeConcreteParse.) The documentation seems to be a bit scarce.
Needs["CodeParser`"];
ast = CodeParse[parseStr]
(* Output *)
ContainerNode[String, {CallNode[
LeafNode[Symbol,
"Plus", <||>], {LeafNode[Symbol,
"a", <|Source -> {{1, 1}, {1, 2}}|>],
CallNode[
LeafNode[Symbol,
"Times", <||>], {LeafNode[Symbol,
"b", <|Source -> {{1, 4}, {1, 5}}|>],
LeafNode[Symbol,
"c", <|Source -> {{1, 6}, {1, 7}}|>]}, <|Source -> {{1, 4}, {1,
7}}|>]}, <|Source -> {{1, 1}, {1, 7}}|>],
CallNode[LeafNode[Symbol,
"CompoundExpression", <||>], {CallNode[
LeafNode[Symbol,
"Sin", <|Source -> {{2, 1}, {2, 4}}|>], {CallNode[
LeafNode[Symbol,
"Out", <||>], {}, <|Source -> {{2, 5}, {2,
6}}|>]}, <|Source -> {{2, 1}, {2, 7}}|>],
LeafNode[Symbol,
"Null", <|Source -> {{2, 8}, {2, 8}}|>]}, <|Source -> {{2,
1}, {2, 8}}|>]}, <||>]
You can use Developer`WriteExpressionJSONString that you mentioned or ExportString[ast, "ExpressionJSON"] to get output pretty close to what you wanted, albeit more verbose (so I've squashed it down here):
ExportString[ast[[2;;]], "ExpressionJSON", Compact -> 3]
(* Output *)
[
"ContainerNode",
[
"List",
[
"CallNode",
["LeafNode","Symbol","'Plus'",["Association"]],
["List",["LeafNode","Symbol","'a'",["Association",["Rule","Source",["List",["List",1,1],["List",1,2]]]]],["CallNode",["LeafNode","Symbol","'Times'",["Association"]],["List",["LeafNode","Symbol","'b'",["Association",["Rule","Source",["List",["List",1,4],["List",1,5]]]]],["LeafNode","Symbol","'c'",["Association",["Rule","Source",["List",["List",1,6],["List",1,7]]]]]],["Association",["Rule","Source",["List",["List",1,4],["List",1,7]]]]]],
["Association",["Rule","Source",["List",["List",1,1],["List",1,7]]]]
],
[
"CallNode",
["LeafNode","Symbol","'CompoundExpression'",["Association"]],
["List",["CallNode",["LeafNode","Symbol","'Sin'",["Association",["Rule","Source",["List",["List",2,1],["List",2,4]]]]],["List",["CallNode",["LeafNode","Symbol","'Out'",["Association"]],["List"],["Association",["Rule","Source",["List",["List",2,5],["List",2,6]]]]]],["Association",["Rule","Source",["List",["List",2,1],["List",2,7]]]]],["LeafNode","Symbol","'Null'",["Association",["Rule","Source",["List",["List",2,8],["List",2,8]]]]]],
["Association",["Rule","Source",["List",["List",2,1],["List",2,8]]]]
]
],
[
"Association"
]
]

Mathematica: Using CheckboxBar to generate a string as a combination of phrases

I started out with something really long and annoying to write like this
Manipulate[
thetext = Switch[Total[obj],
0, Text["", {0, 5}],
1, Text["How", {0, 5}],
2, Text["Does One", {0, 5}],
4, Text["Use CheckboxBar", {0, 5}],
8, Text["=A=", {0, 5}],
3, Text["How Does One", {0, 5}],
5, Text["How Use CheckboxBar", {0, 5}],
9, Text["How =A=", {0, 5}],
6, Text["Does One Use CheckboxBar", {0, 5}],
10, Text["Does One =A=", {0, 5}],
12, Text["Use CheckboxBar =A=", {0, 5}],
7, Text["How Does One Use CheckboxBar", {0, 5}],
11, Text["How Does One =A=", {0, 5}],
13, Text["How Use CheckboxBar =A=", {0, 5}],
14, Text["Does One Use CheckboxBar =A=", {0, 5}],
15, Text["How Does One Use CheckboxBar =A=", {0, 5}]];
Graphics[thetext],
{{obj, {1, 2, 4, 8}, "Text"}, {1 -> "How", 2 -> "Does One",
4 -> "Use CheckBoxBar", 8 -> "=A="}, CheckboxBar}]
But I quickly realized that I could probably replace 1,2,4,8 with four binary digits, ie 0000 = 0, 1101 = 11 etc. So I wrote this function:
g[{d_, f_, g_, h_}] :=
StringJoin[
If[d == 1, "d", ""], If[f == 1, "f", ""], If[g == 1, "g", ""], If[h == 1, "h", ""]];
which works fine. However, CheckboxBar generates a list that only includes entries that are checked. In Addition to that, the order of the checked entries changes based on which one you check first.
Question: How should i make a function that is much shorter/more concise than what I have right now. preferably with checkboxbar, since it was a requirement in my assignment. Without checkboxbar is ok too, since my assignment isnt graded on how concise my code is. I just want to improve it for the sake of improvement.
CheckboxBar has the (sometimes annoying) property that is will add elements to a list in the order you click the boxes. In order to overcome this limitation, you can add some numbers to the list elements you throw into CheckboxBar and use those number to order the phases. So suppose I have the phases phrases = {"I ", "guess ", "this ", "works?"}, then I want to call CheckboxBar like this:
CheckboxBar[
Dynamic[stringList],
{
{{1}, "I "} -> "I ",
{{2}, "guess "} -> "guess ",
{{3}, "this "} -> "this ",
{{4}, "works?"} -> "works?"
}
]
Next, you can get your string by sorting stringList by the first element (i.e, SortBy[stringList, First]) and then stringing the elements together by Applying (##) StringJoin to the list of strings you want to combine. Putting that all together, I propose this solution:
DynamicModule[{
stringList = {},
phrases = {"I ", "guess ", "this ", "works?"}
},
Column[{
CheckboxBar[
Dynamic[stringList],
MapIndexed[{#2, #1} -> #1 &, phrases]
],
Dynamic[StringJoin ## SortBy[stringList, First][[All, 2]]]
}]
]
edit
You can also handle the sorting in the 2nd argument of Dynamic if you want. That's probably a bit cleaner (though a bit more difficult to understand):
DynamicModule[{
stringList = {},
phrases = {"I ", "guess ", "this ", "works?"}
},
Column[{
CheckboxBar[
Dynamic[
stringList,
Function[{val, expr}, stringList = SortBy[val, First]]
],
MapIndexed[{#2, #1} -> #1 &, phrases]
],
Dynamic[StringJoin ## stringList[[All, 2]]]
}]
]

Combining Lists of Word Frequency Data

This seems like it should be an obvious question, but the tutorials and documentation on lists are not forthcoming. Many of these issues stem from the sheer size of my text files (hundreds of MB) and my attempts to boil them down to something manageable by my system. As a result, I'm doing my work in segments and am now trying to combine the results.
I have multiple word frequency lists (~40 of them). The lists can either be taken through Import[ ] or as variables generated in Mathematica. Each list appears as the following and has been generated using the Tally[ ] and Sort[ ] commands:
{{"the", 42216}, {"of", 24903}, {"and", 18624}, {"n", 16850}, {"in",
16164}, {"de", 14930}, {"a", 14660}, {"to", 14175}, {"la", 7347},
{"was", 6030}, {"l", 5981}, {"le", 5735}, <<51293>>, {"abattoir",
1}, {"abattement", 1}, {"abattagen", 1}, {"abattage", 1},
{"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 1},
{"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}}
Here is an example of the second file:
{{"the", 30419}, {"n", 20414}, {"de", 19956}, {"of", 16262}, {"and",
14488}, {"to", 12726}, {"a", 12635}, {"in", 11141}, {"la", 10739},
{"et", 9016}, {"les", 8675}, {"le", 7748}, <<101032>>,
{"abattement", 1}, {"abattagen", 1}, {"abattage", 1}, {"abated",
1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 1}, {"aase", 1},
{"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}}
I want to combine them so that the frequency data aggregates: i.e. if the second file has 30,419 occurrences of 'the' and is joined to the first file, it should return that there are 72,635 occurrences (and so on as I move through the entire collection).
It sounds like you need GatherBy.
Suppose your two lists are named data1 and data2, then use
{#[[1, 1]], Total[#[[All, 2]]]} & /# GatherBy[Join[data1, data2], First]
This easily generalizes to any number of lists, not just two.
Try using a hash table, like this. First set things up:
ClearAll[freq];
freq[_] = 0;
Now eg freq["safas"] returns 0. Next, if the lists are defined as
lst1 = {{"the", 42216}, {"of", 24903}, {"and", 18624}, {"n",
16850}, {"in", 16164}, {"de", 14930}, {"a", 14660}, {"to",
14175}, {"la", 7347}, {"was", 6030}, {"l", 5981}, {"le",
5735}, {"abattoir", 1}, {"abattement", 1}, {"abattagen",
1}, {"abattage", 1}, {"abated", 1}, {"abandonn", 1}, {"abaiss",
1}, {"aback", 1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah",
1}, {"aaa", 1}};
lst2 = {{"the", 30419}, {"n", 20414}, {"de", 19956}, {"of",
16262}, {"and", 14488}, {"to", 12726}, {"a", 12635}, {"in",
11141}, {"la", 10739}, {"et", 9016}, {"les", 8675}, {"le",
7748}, {"abattement", 1}, {"abattagen", 1}, {"abattage",
1}, {"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback",
1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}};
you may run this
Scan[(freq[#[[1]]] += #[[2]]) &, lst1]
after which eg
freq["the"]
(*
42216
*)
and then the next list
Scan[(freq[#[[1]]] += #[[2]]) &, lst2]
after which eg
freq["the"]
72635
while still
freq["safas"]
(*
0
*)
Here is a direct Sow/Reap function:
Reap[#2~Sow~# & ### data1~Join~data2;, _, {#, Tr##2} &][[2]]
Here is a concise form of acl's method:
Module[{c},
c[_] = 0;
c[#] += #2 & ### data1~Join~data2;
{#[[1, 1]], #2} & ### Most#DownValues#c
]
This appears to be a bit faster than Szabolcs code on my system:
data1 ~Join~ data2 ~GatherBy~ First /.
{{{x_, a_}, {x_, b_}} :> {x, a + b}, {x : {_, _}} :> x}
There's an old saying, "if all you have is a hammer, everything becomes a nail." So, here's my hammer: SelectEquivalents.
This can be done a little quicker using SelectEquivalents:
SelectEquivalents[data1~Join~data2, #[[1]]&, #[[2]]&, {#1, Total[#2]}&]
In order, the first param is obviously just the joined lists, the second one is what they're grouped by (in this case the first element), the third param strips off the string leaving just the count, and the fourth param puts it back together with the string as #1 and the counts in a list as #2.
Try ReplaceRepeated.
Join the lists. Then use
//. {{f1___, {a_, c1_}, f2___, {a_, c2_}, f3___} -> {f1, f2, f3, {a, c1 + c2}}}

How would you do a PivotTable function in Mathematica?

PivotTables in Excel (or, cross tabulations) are quite useful. Has anyone already thought about how to implement a similar function in Mathematica?
I am not familiar with the use of pivot tables, but taking the example on the page linked above, I propose this:
Needs["Calendar`"]
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /# Range#12,
Range#15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 150] & /# choices // Transpose;
This creates data that looks like:
{"East", "Girl", "Golf", "03/2011", 6, 12.29`, 6.18`},
{"West", "Boy", "Fancy", "08/2011", 6, 13.01`, 12.39`},
{"North", "Girl", "Golf", "05/2011", 1, 14.87`, 12.89`},
{"East", "Girl", "Golf", "09/2011", 3, 13.99`, 6.25`},
{"North", "Girl", "Golf", "09/2011", 13, 12.66`, 8.57`},
{"East", "Boy", "Fancy", "10/2011", 2, 14.46`, 6.85`},
{"South", "Boy", "Golf", "11/2011", 13, 12.45`, 11.23`}
...
Then:
h1 = Union#data[[All, "Region" /. key]];
h2 = Union#data[[All, "Ship Date" /. key]];
Reap[
Sow[#[[{"Units", "Ship Date"} /. key]], #[["Region" /. key]]] & ~Scan~ data,
h1,
Reap[Sow ### #2, h2, Total # #2 &][[2]] &
][[2]];
TableForm[Join ## %, TableHeadings -> {h1, h2}]
This is a rough example, but it gives an idea of how this may be done. If you have more specific requirements I will attempt to address them.
Here is an update in the manner of Sjoerd's answer.
The Manipulate block is largely copied, but I believe my pivotTableData is more efficient, and I sought to localize symbols correctly, since this is now presented as usable code rather than a rough example.
I start with the same sample data, but I embed the field headings, since I feel this is more representative of normal use.
data = ImportString[#, "TSV"][[1]] & /# Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]];
data = Transpose[{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}];
PrependTo[data,
{"Education", "South", "Sex", "Experience", "Union", "Wage", "Age", "Race", "Occupation", "Sector", "Marriatal status"}
];
My pivotTableData is self contained.
pivotTableData[data_, field1_, field2_, dependent_, op_] :=
Module[{key, sow, h1, h2, ff},
(key## = #2[[1]]) & ~MapIndexed~ data[[1]];
sow = #[[key /# {dependent, field2}]] ~Sow~ #[[key#field1]] &;
{h1, h2} = Union#data[[2 ;;, key##]] & /# {field1, field2};
ff = # /. {{} -> Missing#"NotAvailable", _ :> op ## #} &;
{
{h1, h2},
Join ## Reap[sow ~Scan~ Rest#data, h1, ff /# Reap[Sow ### #2, h2][[2]] &][[2]]
}
]
pivotTable relies only on pivotTableData:
pivotTable[data_?MatrixQ] :=
DynamicModule[{raw, t, header = data[[1]], opList =
{Mean -> "Mean of \[Rule]",
Total -> "Sum of \[Rule]",
Length -> "Count of \[Rule]",
StandardDeviation -> "SD of \[Rule]",
Min -> "Min of \[Rule]",
Max -> "Max of \[Rule]"}},
Manipulate[
raw = pivotTableData[data, f1, f2, f3, op];
t = ConstantArray["", Length /# raw[[1]] + 2];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{f2, header}];
t[[2, 1]] = Control[{f1, header}];
t[[1, 2]] = Control[{f3, header}];
{{t[[3 ;; -1, 1]], t[[2, 3 ;; -1]]}, t[[3 ;; -1, 3 ;; -1]]} = raw;
TableView[N#t, Dividers -> All],
Initialization :> {op = Mean, f1 = data[[1,1]], f2 = data[[1,2]], f3 = data[[1,3]]}
]
]
Use is simply:
pivotTable[data]
A quick-and-dirty pivot table visualization:
I'll start with a more interesting real-life data set:
data = ImportString[#, "TSV"][[1]] & /#
Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]
];
A bit of post-processing:
data =
{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical",
4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}\[Transpose];
header = {"Education", "South", "Sex", "Experience", "Union", "Wage",
"Age", "Race", "Occupation", "Sector", "Marriatal status"};
MapIndexed[(headerNumber[#1] = #2[[1]]) &, header];
levelNames = Union /# Transpose[data];
levelLength = Length /# levelNames;
Now for the real stuff. It also uses the function SelectEquivalents defined in What is in your Mathematica tool bag?
pivotTableData[levelName1_, levelName2_, dependent_, op_] :=
Table[
SelectEquivalents[data,
FinalFunction -> (If[Length[#] == 0, Missing["NotAvailable"], op[# // Flatten]] &),
TagPattern ->
_?(#[[headerNumber[levelName1]]] == levelMember1 &&
#[[headerNumber[levelName2]]] == levelMember2 &),
TransformElement -> (#[[headerNumber[dependent]]] &)
],
{levelMember1, levelNames[[headerNumber[levelName1]]]},
{levelMember2, levelNames[[headerNumber[levelName2]]]}
]
DynamicModule[
{opList =
{Mean ->"Mean of \[Rule]", Total ->"Sum of \[Rule]", Length ->"Count of \[Rule]",
StandardDeviation -> "SD of \[Rule]", Min -> "Min of \[Rule]",
Max -> "Max of \[Rule]"
}, t},
Manipulate[
t=Table["",{levelLength[[headerNumber[h1]]]+2},{levelLength[[headerNumber[h2]]]+2}];
t[[3 ;; -1, 1]] = levelNames[[headerNumber[h1]]];
t[[2, 3 ;; -1]] = levelNames[[headerNumber[h2]]];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{h2, header}];
t[[2, 1]] = Control[{h1, header}];
t[[1, 2]] = Control[{h3, header}];
t[[3 ;; -1, 3 ;; -1]] = pivotTableData[h1, h2, h3, op] // N;
TableView[t, Dividers -> All],
Initialization :> {op = Mean, h1 = "Sector", h2 = "Union", h3 = "Wage"}
]
]
There's still a bit of work to do. The DynamicModule should be turned into a fully standalone function, with the header stuff more streamlined, but this should be sufficient for a first impression.
#Mr.Wizard's answer is indeed robust and long-lasting as it grounds on ReapSow method suitable for some map reduce jobs in Mathematica. Due to the fact that MMA itself develops, consider a new option as well.
GroupBy (introduced in Mathematica v.10.0) provides a generalization of the map reduce operation.
So, the above data job may be implemented as follows (partly an overkill for readability):
headings = Union # data[[All, #]] & /# {1, 4}
{{"East", "North", "South", "West"}, {"01/2011", "02/2011", "03/2011",
"04/2011", "05/2011", "06/2011", "07/2011", "08/2011", "09/2011",
"10/2011", "11/2011", "12/2011"}}
We may use Outer to set up a rectangular template for TableForm:
template = Outer[List, Apply[Sequence][headings]];
Main job with GroupBy and Total as third argument:
pattern = Append[Normal #
GroupBy[data, (#[[{1, 4}]] &) -> (#[[-1]] &), Total],
_ -> Null];
Finally, inject pattern into template (and apply TableForm headings for beauty):
TableForm[Replace[template, pattern, {2}], TableHeadings -> headings]
This outputs some:
Note: we have made a total of last column in data. (Many other aggregations are, of course, possible.)
Use http://www.wolfram.com/products/applications/excel_link/ , this way you have the best of both worlds. This product creates a flawless link between Excel and mma, 2-ways.
Here's what I've come up with. It uses the function SelectEquivalents defined in What is in your Mathematica tool bag?. Function1 and Function2 are meant to have different grouping possibilities of criteria1 and criteria2. FilterFunction is here in order to define an arbitrary filter formula on the data based on the header names.
Using the data example of Mr. Wizard here are some usages of this function.
criteria={"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
criteria1 = "Region";
criteria2 = "Ship Date";
consideredData = "Units";
PivotTable[data,criteria,criteria1,criteria2,consideredData]
A neat example
function2 = If[ToExpression#StringTake[#, 2] <= 6, "First Semester", "Second Semester"] &;
PivotTable[data,criteria,criteria1,criteria2,consideredData,FilterFunction->("Gender"=="Girl"&&"Units"*"Price"<=100&),Function2->function2]
Here's the definition of the function
keysToIndex[keys_] :=
Module[{keyIndex},
(keyIndex[#1] = #2[[1]])&~MapIndexed~keys;
keyIndex
];
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
Options[PivotTable]={Function1->Identity,Function2->Identity,FilterFunction->(True &),AggregationFunction->Total,FormatOutput->True};
PivotTable[data_,criteria_,criteria1_,criteria2_,consideredData_,OptionsPattern[]]:=
Module[{criteriaIndex, criteria1Index, criteria2Index, consideredDataIndex, criteria1Function, criteria2Function, filterFunctionTranslated, filteredResult, keys1, keys1Index, keys2, keys2Index, resultTable, function1, function2, filterFunction, aggregationFunction, formatOutput,p,sharp},
function1 = OptionValue#Function1;
function2 = OptionValue#Function2;
filterFunction = OptionValue#FilterFunction;
aggregationFunction = OptionValue#AggregationFunction;
formatOutput=OptionValue#FormatOutput;
criteriaIndex=keysToIndex[criteria];
criteria1Index=criteriaIndex#criteria1;
criteria2Index=criteriaIndex#criteria2;
consideredDataIndex=criteriaIndex#consideredData;
criteria1Function=Composition[function1,#[[criteria1Index]]&];
criteria2Function=Composition[function2,#[[criteria2Index]]&];
filterFunctionTranslated = filterFunction/.(# -> p[sharp, criteriaIndex##]& /# criteria /. sharp -> #)/.p->Part;
filteredResult=
SelectEquivalents[
data
,
TagElement->({criteria1Function##,criteria2Function##,filterFunctionTranslated##}&)
,
TransformElement->(#[[consideredDataIndex]]&)
,
TagPattern->_?(#[[3]]&)
,
TransformResults->(Append[Most##1,aggregationFunction##2]&)
];
If[formatOutput,
keys1=filteredResult[[All,1]]//Union//Sort;
keys2=filteredResult[[All,2]]//Union//Sort;
resultTable=
SelectEquivalents[
filteredResult
,
TagElement->(#[[{1,2}]]&)
,
TransformElement->(#[[3]]&)
,
TagPattern->Flatten[Outer[List, keys1, keys2], 1]
,
FinalFunction-> (InverseFlatten[Flatten[#/.{}->Missing[]],{Length#keys1,Length#keys2}]&)
];
TableForm[resultTable,TableHeadings->{keys1,keys2}]
,
filteredResult
]
];
I little latter in the game. Here is another self contained solution with object like form.
Using random data created by #Mr.Wizard:
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /# Range#12,
Range#15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 5000] & /# choices // Transpose;
Using an MapIndexed and SparseArray as key functions, here is the code:
Options[createPivotTable]={"RowColValueHeads"-> {1,2,3},"Function"-> Total};
createPivotTable[data_,opts:OptionsPattern[{createPivotTable}]]:=Module[{r,c,v,aggDataIndex,rowRule,colRule,pivot},
{r,c,v}=OptionValue["RowColValueHeads"];
pivot["Row"]= Union#data[[All,r]];
pivot["Col"]= Union#data[[All,c]];
rowRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Row"]];
colRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Col"]];
aggDataIndex={#[[1,r]]/.rowRule,#[[1,c]]/.colRule}->OptionValue["Function"]##[[All,v]]&/#GatherBy[data,#[[{r,c}]]&];
pivot["Data"]=Normal#SparseArray#aggDataIndex;
pivot["Properties"]={"Data","Row","Col"};
pivot["Table"]=TableForm[pivot["Data"], TableHeadings -> {pivot["Row"], pivot["Col"]}];
Format[pivot]:="PivotObject";
pivot
]
That you can use as:
pivot=createPivotTable[data,"RowColValueHeads"-> ({"Ship Date","Region","Units"}/.key)];
pivot["Table"]
pivot["Data"]
pivot["Row"]
pivot["Col"]
To get:
I believe that the speed is faster than #Ms.Wizard, but I have to make a better test, and don't have time now.

Can I check with the Stackoverflow API which SO answerers are sleep-deprived?

In how-do-i-access-the-stackoverflow-api-from-mathematica I outlined how one could use the SO API to get Mathematica to make some interesting reputation graphs of top answerers. Could this API also be used to provide some privacy-invading insights in the answerers' nocturnal habits?
Certainly, for instance using this MMA8 code:
getActionDates[userID_Integer] :=
Module[{total},
total =
"total" /.
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID] <> "/timeline?pagesize=1&page=1", "JSON"];
DateList[# + AbsoluteTime["January 1, 1970"]] & /# Join ##
Table[
"creation_date" /. ("user_timelines" /.
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID] <> "/timeline?pagesize=100&page=" <>
ToString[p], "JSON"])
, {p, Ceiling[total/100]}
]
]
makeWeekHistogram[userID_Integer] :=
Module[{dates2Positions},
dates2Positions =
ToExpression[
DateString[#, {"{", "DayNameShort", "+", "Hour", "+", "Minute",
"/60./.{Sun->0,Mon->24,Tue->2*24,Wed->3*24,Thu->4*24,Fri->5*\
24,Sat->6*24}}"}]] & /# getActionDates[userID] // Flatten;
Histogram[dates2Positions, {1}, "Count",
GridLines -> {Table[24 i, {i, 1, 6}], None},
BaseStyle -> {FontFamily -> "Arial-Bold", FontSize -> 16},
FrameTicks -> {{Automatic,
None}, {{{12, "Sun"}, {24 + 12, "Mon"}, {2 24 + 12,
"Tue"}, {3 24 + 12, "Wed"}, {4 24 + 12, "Thu"}, {5 24 + 12,
"Fri"}, {6 24 + 12, "Sat"}}, None}},
FrameLabel -> {"Day of week", "Number of actions",
First["display_name" /. ("users" /.
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID], "JSON"])], ""}, Frame -> True,
PlotRangePadding -> 0]
]
makeDayHistogram[userID_Integer] :=
Module[{dates2Positions},
dates2Positions =
ToExpression[DateString[#, {"Hour", "+", "Minute", "/60."}]] & /#
getActionDates[userID] // Flatten;
Histogram[dates2Positions, {1}, "Count",
FrameTicks -> {{Automatic,
None}, {Table[{i + 0.5, i}, {i, 0, 20, 5}], None}},
BaseStyle -> {FontFamily -> "Arial-Bold", FontSize -> 16},
FrameLabel -> {"Hour", "Number of actions",
First["display_name" /. ("users" /.
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID], "JSON"])], ""}, Frame -> True,
PlotRangePadding -> 0]
]
Of course, we only have server time and dates, but the pattern should tell something about localisation, not? Although... Mr.Wizard... you got no life!
makeWeekHistogram[353410]
EDIT
Hourly histogram requested by Mr.Wizard:

Resources