How do I access the StackOverflow API from Mathematica - wolfram-mathematica

I was wondering the other day if StackOverflow had an API I could access from Mathematica, and apparently it does: "Saving plot annotations"
What's the best way to get data from StackOverflow into Mathematica? Sjoerd used the information to make a plot. I'm interested in adding SO-related notifications into a docked cell I keep in my notebooks, so I can tell when there are updates or responses without leaving Mathematica.

By popular demand, the code to generate the top-10 SO answerers plot (except annotations) using the SO API (it's a pretty neat and complete API; lots of goodies there. Easy too - see my code).
Update: added App-key to ensure the code co-operates better with the SO-API (higher daily call cap). Please use it only for this app.
April 2011
August 2011
MMA 8 version! MMA7 version further down
getRepChanges[userID_Integer] :=
Module[{totalChanges},
totalChanges =
"total" /.
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID] <> "/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=1&page=1",
"JSON"
];
Join ##
Table[
"rep_changes" /.
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID] <>
"/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=100&page="
<> ToString[page],
"JSON"
],
{page, 1, Ceiling[totalChanges/100]}
]
]
topAnswerers =
({"display_name","user_id", "email_hash"} /. #) & /#
("user" /.
("top_users" /.
Import[
"http://api.stackoverflow.com/1.1/tags/mathematica/top-answerers/all-time",
"JSON"
]
)
)
topAnswerers = {#, #2,
Import["http://www.gravatar.com/avatar/" <> #3 <> ".jpg?s=36&d=identicon&d=identicon"]
} & ### topAnswerers
repChangesTopUsers =
Table[
repChange =
ReleaseHold[
(
Hold[
{
DateList["on_date" + AbsoluteTime["January 1, 1970"]],
"positive_rep" - "negative_rep"
}
] /. #
) & /# getRepChanges[userID]
] // Sort;
accRepChange = {repChange[[All, 1]],Accumulate[repChange[[All, 2]]]}\[Transpose],
{userID, topAnswerers[[All, 2]]}
];
pl = DateListLogPlot[
Tooltip ###
Take[({repChangesTopUsers, Row /# topAnswerers[[All, {3, 1}]]}\[Transpose]),
10], Joined -> True, Mesh -> None, ImageSize -> 1000,
PlotRange -> {All, {10, All}},
BaseStyle -> {FontFamily -> "Arial-Bold", FontSize -> 16},
DateTicksFormat -> {"MonthNameShort", " ", "Year"},
GridLines -> {True, None},
FrameLabel -> (Style[#, FontSize -> 18] & /# {"Date", "Reputation",
"Top-10 answerers", ""})]
EDIT
Note that you can plot up to and including a top-20 by changing the value in the Take function. It gets busy pretty soon.
Tried to improve the readability of Markup code somewhat. I'm afraid this will yield some spurious spaces when copied.
EDIT
Page size back to 100 elements/page ==> fewer API calls
Please note that the first call to the API is to determine the amount of posts the user has. This data is present no matter the page size, so this is preferably chosen small (10 or so, possibly 1, didn't check). Then the data is fetched in successive pages until the last page is reached. You can use the maximum page size (100) for that. Just take care that the maximum number of pages in the loop count is adjusted accordingly.
EDIT: better MMA 7 code (Fri Apr 22)
MMA 7 doesn't do JSON imports, so I do a text import instead followed by a bare-bones JSON translation. I've tested this version several times now (in MMA 8) and it seems to work without the errors I got yesterday.
getRepChanges[userID_Integer] :=
Module[{totalChanges},
totalChanges =
"total" /.
ImportString[
StringReplace[(Import[
"http://api.stackoverflow.com/1.1/users/" <>
ToString[userID] <>
"/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=1&page=1", "Text"]), {":" ->
"->", "[" -> "{", "]" -> "}"}], "NB"];
Join ##
Table["rep_changes" /.
ImportString[
StringReplace[
Import["http://api.stackoverflow.com/1.1/users/" <>
ToString[userID] <>
"/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=100&page=" <> ToString[page],
"Text"], {":" -> "->", "[" -> "{", "]" -> "}"}],
"NB"], {page, 1, Ceiling[totalChanges/100]}]]
topAnswerers = ({"display_name", "user_id",
"email_hash"} /. #) & /# ("user" /. ("top_users" /.
ImportString[
StringReplace[
" " <> Import[
"http://api.stackoverflow.com/1.1/tags/mathematica/top-answerers/all-time", "Text"], {":" -> "->", "[" -> "{", "]" -> "}"}],
"NB"]))
topAnswerers = {#, #2,
Import["http://www.gravatar.com/avatar/" <> #3 <>
".jpg?s=36&d=identicon&d=identicon"]} & ### topAnswerers
repChangesTopUsers =
Table[repChange =
ReleaseHold[(Hold[{DateList[
"on_date" + AbsoluteTime["January 1, 1970"]],
"positive_rep" - "negative_rep"}] /. #) & /#
getRepChanges[userID]] // Sort;
accRepChange = {repChange[[All, 1]],
Accumulate[repChange[[All, 2]]]}\[Transpose], {userID,
topAnswerers[[All, 2]]}];
DateListLogPlot[
Tooltip ###
Take[({repChangesTopUsers,
Row /# topAnswerers[[All, {3, 1}]]}\[Transpose]), 10],
Joined -> True, Mesh -> None, ImageSize -> 1000,
PlotRange -> {All, {10, All}},
BaseStyle -> {FontFamily -> "Arial-Bold", FontSize -> 16},
DateTicksFormat -> {"MonthNameShort", " ", "Year"},
GridLines -> {True, None},
FrameLabel -> (Style[#, FontSize -> 18] & /# {"Date", "Reputation",
"Top-10 answerers", ""})]
EDIT: auxiliary functions to filter on post tags
These functions can be used to filter reputation gains, in order to find gains for certain tags only.
tagLookup gets a post_ID integer as input and yields the specific post's tags. getQuestionIDs and getAnswerIDsFrom... go the other way. Given a tag they find all the question and answer IDs so that one can test with MemberQ whether a given post_ID belongs to this tag. Both tagLookup and getAnswerIDs are slow since many API calls are necessary. I couldn't test the last two function as either API access is down or my IP has been capped.
tagLookup[postID_Integer] :=
Module[{im},
im = Import["http://api.stackoverflow.com/1.1/questions/" <> ToString[postID],"JSON"];
If[("questions" /. im) != {},
First[("tags" /. ("questions" /. im))],
im = Import["http://api.stackoverflow.com/1.1/answers/" <> ToString[postID],"JSON"];
First[("tags" /. ("questions" /. Import["http://api.stackoverflow.com/1.1/questions/" <>
ToString[First["question_id" /. ("answers" /. im)]], "JSON"]))]
]
]
getQuestionIDs[tagName_String] := Module[{total},
total =
"total" /.
Import["http://api.stackoverflow.com/1.1/questions?tagged=" <>
tagName <> "&pagesize=1", "JSON"];
Join ##
Table[("question_id" /. ("questions" /.
Import["http://api.stackoverflow.com/1.1/questions?key=NgVJ4Y6vFkuF-oqI-eOvOw&tagged=" <>
tagName <> "&pagesize=100&page=" <> ToString[i],
"JSON"])), {i, 1, Ceiling[total/100]}]
]
getAnswerIDsFromQuestionID[questionID_Integer] :=
Module[{total},
total =
Import["http://api.stackoverflow.com/1.1/questions/" <>
ToString[questionID] <> "/answers?key=NgVJ4Y6vFkuF-oqI-eOvOw&pagesize=1", "JSON"];
If[total === $Failed, Return[$Failed], total = "total" /. total];
Join ## Table[
"answer_id" /. ("answers" /.
Import["http://api.stackoverflow.com/1.1/questions/" <>
ToString[questionID] <> "/answers?key=NgVJ4Y6vFkuF-oqI-eOvOw&pagesize=100&page=" <>
ToString[i], "JSON"]), {i, 1, Ceiling[total/100]}]
]
getAnswerIDsFromTag[tagName_String] :=
Module[{},
Join ## (getAnswerIDsFromQuestionID /#
Cases[getQuestionIDs[tagName], Except[$Failed]])
]

Brett, unrelated to SO API, but you could use RSS feed for the newest Mathematica-tagged questions. Here is my naive implementation:
QuestionHyperlink[data_] :=
Function[{name, title, link},
Hyperlink[Tooltip[title, name], link]] ## Join[
Cases[data,
XMLElement[
"author", _, {___, XMLElement["name", {}, {name_}], ___}] :>
name],
Cases[data, XMLElement["title", _, {title_}] :> title],
Cases[data, XMLElement["link", rules_, {}] :> ("href" /. rules)]]
Cases[Import[
"http://stackoverflow.com/feeds/tag?tagnames=mathematica&sort=\
newest", "XML"],
XMLElement["entry", attrs_, data_] :>
QuestionHyperlink[data], Infinity]

Related

any trick to add a Divider inside a Grid between rows at the point it is needed

To add a Divider between rows in a Grid, one way I know how to do it is by using the Dividers option. But this is used at the end, and one must then know the row number to which a divider needs to be below it. So, for large grid, I find myself using trial and error until I find the correct row number. Then when I change the grid later, I have to do the trial and error again to put the divider in the correct place in case it moved after my changes.
In this example, there is a grid with 3 rows and I want to add divider below say the second row, so I can do this:
Grid[{
{1},
{2},
{3}
}, Frame -> True, Dividers -> {All, 3 -> True}]
Another way, is to put False and True, in correct order where I want a divider, like this
Grid[{
{1},
{2},
{3}
}, Frame -> True, Dividers -> {All, {True, False, True, True}}]
It would be nice if I could do something like this (like one can do for Manipulate) (ofcourse this below does not work here)
Grid[{
{1},
{2},
Delimiter,
{3}
}, Frame -> True]
or
Grid[{
{1},
{Item[2,Dividers->True]},
{3}
}, Frame -> True]
or such thing.
It will make code maintenance easier.
I looked at using Item and such for this, but not able to figure it out.
Any one knows of a trick to do it?
edit:
btw, This trick if possible, does not only apply to Dividers. But it would be useful to be able to do many of the other Grid options, which now is done at the Grid level, be done as well at the item level. For example, if I want to add an extra space after some row, it will easier to say so right at the pace I wanted this done. Also, if I wanted to change the Item size, easier to do it at the spot, same for Spacings, etc.. So that when moving/copying code for a whole row or Item, it is self contained and copy it with all its options together.
I think now this might require a new option add to Mathematica Grid to make it work right and be consistent with the over all grid design.
This is all until a GUI builder is made for Mathematica.
I find that I spend more than 60% of my time when I write a demo getting the GUI to fit and look right. With a GUI builder, I can spend this time working on the algorithm instead. When I use Matlab GUIDE to make GUI, it takes me less than 5% of my time to make a similar GUI.
I wish that WRI would make a GUI builder for Mathematica, I think this will be the killer app for Mathematica if you ask me. But no body asked me :)
edit(2)
comment on Mr Wizard nice solution below.
I mainly wanted this feature for the Grids I use to layout controls for Manipulate. Here is a simple example:
Manipulate[x,
Grid[{
{Control[{{x, 0, "x"}, 0, 10, 1}]},
{Control[{{y, 0, "y"}, 0, 10, 1}]}
}, Frame -> None, Dividers -> {None, {False, True, False}}
]
]
(and I have to use Grid for setting up the controls). I can't use function calls here. I can't write, using Wizard's solution below, the following:
Manipulate[x,
myGrid[{
{Control[{{x, 0, "x"}, 0, 10, 1}]},
spec["Divider"],
{Control[{{y, 0, "y"}, 0, 10, 1}]}
}, Frame -> None
],
Initialization :>
{
specfunc["Divider", lst_] := Dividers -> {None, # -> True & /# lst};
myGrid[dat_, opts : OptionsPattern[]] :=
Module[{x = 1},
Grid[#, opts, Sequence ## #2] & ##
Reap[If[MatchQ[#, _spec], Sow[x, #[[1]]]; ## &[], x++; #] & /#
dat, _, specfunc]
]
}
]
This gives an error, since Mathematica tries to first read the body of Manipulate to parse it, BEFORE reading and processing the initialization section.
But outside Manipulate, it will ofcourse work:
myGrid[{
{Control[{{x, 0, "x"}, 0, 10, 1}]},
spec["Divider"],
{Control[{{y, 0, "y"}, 0, 10, 1}]}
}, Frame -> None
]
specfunc["Divider", lst_] := Dividers -> {None, # -> True & /# lst};
myGrid[dat_, opts : OptionsPattern[]] :=
Module[{x = 1},
Grid[#, opts, Sequence ## #2] & ##
Reap[If[MatchQ[#, _spec], Sow[x, #[[1]]]; ## &[], x++; #] & /#
dat, _, specfunc]
]
I need to spend more time on this, to see if I can get it to work inside Manipulate.
btw, getting stuff like this to work inside Manipulate is really hard. Only trick I know is using macros with the With[{},.... Grid....] pattern I learned from Leonid.
For example of such difficulties, see this question of mine
How to define constants for use with With[] in one place and then apply them later?
edit(3)
My be I doing something wrong, but I am getting some errors inside Manipulate:
first example:
Manipulate[x,
Evaluate#grid[{
{Control[{{x, 0, "x"}, 0, 10, 1}]}
}
],
Initialization :>
{
grid[tab_, opts___] :=
Module[{divlocal, divglobal, div, pos},
divglobal = (Dividers /. opts) /. Dividers -> {False, False};
If[Depth[divglobal] == 1, divglobal = {divglobal, divglobal}];
If[Length[divglobal] == 1, AppendTo[divglobal, False]];
pos = Position[tab, Dividers -> _, 1];
divlocal =
MapIndexed[# - #2[[1]] + 1 -> Dividers /. tab[[#]] &,
Flatten[pos]];
divglobal[[2]] = {divglobal[[2]], divlocal};
Grid[Delete[tab, pos], Dividers -> divglobal, opts]];
}
]
gives error:
ReplaceAll::rmix: Elements of {False,{}} are a mixture of lists and nonlists. >>
same if I replace the above with
Evaluate#grid[{
Dividers -> {Thick, Blue},
{Control[{{x, 0, "x"}, 0, 10, 1}]}
}
],
Tried Dynamic# in place of Evaluate# no luck. may be small fix is all what is needed? or I am not using it right?
This solutions should allow you to combine specifications for dividers between rows specified within the table with those specified using the Dividers option.
grid[tab_, opts___] :=
Module[{divlocal, divglobal, div, pos},
(* extract option value of Dividers from opts to divglobal *)
(* default value is {False, False} *)
divglobal = (Dividers /. {opts}) /. Dividers -> {False, False};
(* transform divglobal so that it is in the form {colspecs, rowspecs} *)
If[Head[divglobal] =!= List, divglobal = {divglobal, divglobal}];
If[Length[divglobal] == 1, AppendTo[divglobal, False]];
(* Extract positions of dividers between rows from tab *)
pos = Position[tab, Dividers -> _, 1];
(* Build list of rules for divider specifications between rows *)
divlocal = MapIndexed[# - #2[[1]] + 1 -> Dividers /. tab[[#]] &, Flatten[pos]];
(* Final settings for dividers are {colspecs, {rowspecs, divlocal}} *)
divglobal[[2]] = {divglobal[[2]], divlocal};
Grid[Delete[tab, pos], Dividers -> divglobal, opts]]
To specify a divider between rows you need to insert Dividers->spec at the desired position where spec is either False, True, or a graphics directive (colour, thickness, etc.). For example
tab = {{1, 2, 3}, Dividers -> {Thick, Blue}, {4, 5, 6}, {7, 8, 9},
Dividers -> False, {10, 11, 12}};
grid[tab, Dividers -> All]
Edit
I've added some comments to my code at the request of Mr. Wizard.
I propose that you use a new head that can hold instructions (spec below), and a new function that processes these instructions (specfunc below) as individually required. The benefit is that this is easily generalized for different interleaved instructions, and each can be processed in an arbitrary way.
specfunc["Divider", lst_] := Dividers -> {All, # -> True & /# lst}
myGrid[dat_, opts:OptionsPattern[]] :=
Module[{x = 1},
Grid[#, opts, Sequence ## #2] & ##
Reap[
If[MatchQ[#, _spec], Sow[x, #[[1]]]; ## &[], x++; #] & /# dat,
_,
specfunc
]
]
Usage:
dat =
{
{1, 2, 3},
{4, 5, 6},
spec["Divider"],
{7, 8, 9},
spec["Divider"],
{"a", "b", "c"}
};
myGrid[dat, Frame -> True]
If each instruction can be a single string such as "Divider" and you have no conflict using them this way, you could eliminate spec and use MatchQ[#, _String] and Sow[x, #].
Addressing your updated question, as I noted in a comment below I believe that it makes more sense to use the most basic elements in your final Manipulate object, and write tools to help you generate this object most easily. I believe that attempting to make these kinds of customization inside the Manipulate block itself is doomed to fail, and probably in weird and opaque ways.
Nevertheless, for this particular case this appears to work, though I doubt it is robust:
Manipulate[x,
Evaluate[
{specfunc["Divider", lst_] := Dividers -> {All, # -> True & /# lst};
myGrid[dat_, opts : OptionsPattern[]] :=
Module[{x = 1},
Grid[#, opts, Sequence ## #2] & ##
Reap[If[MatchQ[#, _spec], Sow[x, #[[1]]]; ## &[], x++; #] & /#
dat, _, specfunc]]};
myGrid[{{Control[{{x, 0, "x"}, 0, 10, 1}]},
spec["Divider"], {Control[{{y, 0, "y"}, 0, 10, 1}]}},
Frame -> True]
]
]

histogram without vertical lines in Mathematica

I am trying to make an histogram without vertical lines. I'd like to have a plot which looks like a function. Like this:
The same question has been asked for R before ( histogram without vertical lines ) but I'm on Mathematica.
I have been looking into the ChartStyle options without success.
You could also use ListPlot with InterpolationOrder->0:
(* example data *)
data = RandomVariate[NormalDistribution[], 10^3];
hist = HistogramList[data, {.5}];
ListPlot[Transpose[{hist[[1]], ArrayPad[hist[[2]], {0, 1}, "Fixed"]}],
InterpolationOrder -> 0,
Joined -> True,
AxesOrigin -> {hist[[1, 1]], 0}]
There probably are ways to do this by fiddling with EdgeForm[] and FaceForm[] in Histogram, but I've found it simpler to roll one on my own, whenever I need it. Here's a very simple and quick example:
histPlot[data_, bins_, color_: Blue] := Module[{
countBorder =
Partition[Riffle[Riffle[#1, #1[[2 ;;]]], Riffle[#2, #2]], 2] & ##
HistogramList[data, bins, "PDF"]
},
ListLinePlot[countBorder, PlotStyle -> color]
]
Doing histPlot[RandomReal[NormalDistribution[],{1000}],{-3,3,0.1}] gives
You can then extend this to take any option instead of just "PDF", and for cases when you'd like to choose the bins automatically. I dislike automatic binning, because I like to control my bin widths and extents for predictability and easy comparison against other plots.
Here are two methods that work in version 7, using post-processing:
rdat = RandomReal[NormalDistribution[0, 1], 200];
MapAt[
{Blue,
Line[# /. {{Rectangle[{x_, y_}, {X_, Y_}]}} :> Sequence[{x, Y}, {X, Y}]] } &,
Histogram[rdat, PerformanceGoal -> "Speed"],
{1, 2, 2, 2}
]
Cases[
Histogram[rdat, PerformanceGoal -> "Speed"],
Rectangle[{x_, y_}, {X_, Y_}] :> {{x, Y}, {X, Y}},
\[Infinity]
];
Graphics[Line[Join ## %], AspectRatio -> 1/GoldenRatio, Axes -> True]

Nested Manipulate in Mathematica

Please consider:
Function[subID,
pointSO[subID] = RandomInteger[{1, 4}, {5, 2}]] /# {"subA", "subB"};
Manipulate[
Manipulate[
Graphics[{
Black, Rectangle[{0, 0}, {5, 5}],
White,Point#pointSO[subID][[i]]
},
ImageSize -> {400, 300}],
{i,Range[Length#pointSO[subID]]}],
{subID, {"subA", "subB"}}]
Provided that pointSO[subID] actually yields to lists of different length, is there a way to avoid having 2 Manipulate given that one of the manipulated variable depends on the other?
I am not sure that I got exactly what you are asking for, but I figured what you want is something like the following:
Given a UI with one variable, say an array that can change in size, and another (dependent) variable, which represents say an index into the current array that you want to use from the UI to index into the array.
But you do not want to fix the index variable layout in the UI, since it depends, at run time, on the size of the array, which can change using the second variable.
Here is a one manipulate, which has a UI that has an index control variable, which updates dynamically on the UI as the size of the array changes.
I used SetterBar for the index (the dependent variable) but you can use slider just as well. SetterBar made it more clear on the UI what is changing.
When you change the length of the array, the index control variable automatically updates its maximum allowed index to be used to match the current length of the array.
When you shrink the array, the index will also shrink.
I am not sure if this is what you want, but if it, you can adjust this approach to fit into your problem
Manipulate[
Grid[{
{Style[Row[{"data[[", i, "]]=", data[[i]]}], 12]},
{MatrixForm[data], SpanFromLeft}
},
Alignment -> Left, Spacings -> {0, 1}
],
Dynamic#Grid[{
{Text["select index into the array = "],
SetterBar[Dynamic[i, {i = #} &], Range[1, Length[data]],
ImageSize -> Tiny,
ContinuousAction -> False]
},
{
Text["select how long an array to build = "],
Manipulator[
Dynamic[n, {n = #; If[i > n, i = n];
data = Table[RandomReal[], {n}]} &],
{1, 10, 1}, ImageSize -> Tiny, ContinuousAction -> False]
, Text[Length[data]], SpanFromLeft
}
}, Alignment -> Left
],
{{n, 2}, None},
{{i, 2}, None},
{{data, Table[RandomReal[], {2}]}, None},
TrackedSymbols -> {n, i}
]
update 8:30 PM
fyi, just made a fix to the code above to add a needed extra logic.
To avoid the problem of i being too large when switching lists, you could add an If[] statement at the beginning of the Manipulate, e.g.
Clear[pointSO];
MapThread[(pointSO[#] = RandomInteger[{1, 4}, {#2, 2}]) &,
{{"subA", "subB"}, {5, 7}}];
Manipulate[
If[i > Length[pointSO[subID]], i = Length[pointSO[subID]]];
Graphics[{Black, Rectangle[{0, 0}, {5, 5}], White,
Point#pointSO[subID][[i]]}, ImageSize -> {400, 300}],
{{subID, "subA"}, {"subA", "subB"}, SetterBar},
{{i, {}}, Range[Length#pointSO[subID]], SetterBar}]
Maybe nicer is to reset i when switching between lists. This can be done by doing something like
Manipulate[
Graphics[{Black, Rectangle[{0, 0}, {5, 5}], White,
Point#pointSO[subID][[i]]}, ImageSize -> {400, 300}],
{{subID, "subA"},
SetterBar[Dynamic[subID, (i = {}; subID = #) &], {"subA", "subB"}] &},
{{i, {}}, Range[Length#pointSO[subID]], SetterBar}
]
An alternative implementation that preserves selection settings for each data set:
listlength["subA"] = 5; listlength["subB"] = 9;
Function[subID,
pointSO[subID] =
RandomInteger[{1, 4}, {listlength[subID], 2}]] /# {"subA", "subB"};
Manipulate[
Graphics[{Black, Rectangle[{0, 0}, {5, 5}],
Dynamic[If[subID == "subA", Yellow, Cyan]], PointSize -> .05,
Dynamic#Point#pointSO[subID][[k]]}, ImageSize -> {400, 300}],
Row[{Panel[
SetterBar[
Dynamic[subID,
(subID = #; k = If[subID == "subA", j, i]) &],{"subA", "subB"},
Appearance -> "Button", Background -> GrayLevel[.8]]], " ",
PaneSelector[{"subA" ->
Dynamic#Panel[
SetterBar[Dynamic[j, (k = j; j = #) &],
Range[Length#pointSO["subA"]], Appearance -> "Button",
Background -> Yellow]],
"subB" ->
Dynamic#Panel[
SetterBar[Dynamic[i, (k = i; i = #) &],
Range[Length#pointSO["subB"]], Appearance -> "Button",
Background -> Cyan]]}, Dynamic[subID]]}]]
Output examples:

How would you do a PivotTable function in Mathematica?

PivotTables in Excel (or, cross tabulations) are quite useful. Has anyone already thought about how to implement a similar function in Mathematica?
I am not familiar with the use of pivot tables, but taking the example on the page linked above, I propose this:
Needs["Calendar`"]
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /# Range#12,
Range#15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 150] & /# choices // Transpose;
This creates data that looks like:
{"East", "Girl", "Golf", "03/2011", 6, 12.29`, 6.18`},
{"West", "Boy", "Fancy", "08/2011", 6, 13.01`, 12.39`},
{"North", "Girl", "Golf", "05/2011", 1, 14.87`, 12.89`},
{"East", "Girl", "Golf", "09/2011", 3, 13.99`, 6.25`},
{"North", "Girl", "Golf", "09/2011", 13, 12.66`, 8.57`},
{"East", "Boy", "Fancy", "10/2011", 2, 14.46`, 6.85`},
{"South", "Boy", "Golf", "11/2011", 13, 12.45`, 11.23`}
...
Then:
h1 = Union#data[[All, "Region" /. key]];
h2 = Union#data[[All, "Ship Date" /. key]];
Reap[
Sow[#[[{"Units", "Ship Date"} /. key]], #[["Region" /. key]]] & ~Scan~ data,
h1,
Reap[Sow ### #2, h2, Total # #2 &][[2]] &
][[2]];
TableForm[Join ## %, TableHeadings -> {h1, h2}]
This is a rough example, but it gives an idea of how this may be done. If you have more specific requirements I will attempt to address them.
Here is an update in the manner of Sjoerd's answer.
The Manipulate block is largely copied, but I believe my pivotTableData is more efficient, and I sought to localize symbols correctly, since this is now presented as usable code rather than a rough example.
I start with the same sample data, but I embed the field headings, since I feel this is more representative of normal use.
data = ImportString[#, "TSV"][[1]] & /# Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]];
data = Transpose[{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}];
PrependTo[data,
{"Education", "South", "Sex", "Experience", "Union", "Wage", "Age", "Race", "Occupation", "Sector", "Marriatal status"}
];
My pivotTableData is self contained.
pivotTableData[data_, field1_, field2_, dependent_, op_] :=
Module[{key, sow, h1, h2, ff},
(key## = #2[[1]]) & ~MapIndexed~ data[[1]];
sow = #[[key /# {dependent, field2}]] ~Sow~ #[[key#field1]] &;
{h1, h2} = Union#data[[2 ;;, key##]] & /# {field1, field2};
ff = # /. {{} -> Missing#"NotAvailable", _ :> op ## #} &;
{
{h1, h2},
Join ## Reap[sow ~Scan~ Rest#data, h1, ff /# Reap[Sow ### #2, h2][[2]] &][[2]]
}
]
pivotTable relies only on pivotTableData:
pivotTable[data_?MatrixQ] :=
DynamicModule[{raw, t, header = data[[1]], opList =
{Mean -> "Mean of \[Rule]",
Total -> "Sum of \[Rule]",
Length -> "Count of \[Rule]",
StandardDeviation -> "SD of \[Rule]",
Min -> "Min of \[Rule]",
Max -> "Max of \[Rule]"}},
Manipulate[
raw = pivotTableData[data, f1, f2, f3, op];
t = ConstantArray["", Length /# raw[[1]] + 2];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{f2, header}];
t[[2, 1]] = Control[{f1, header}];
t[[1, 2]] = Control[{f3, header}];
{{t[[3 ;; -1, 1]], t[[2, 3 ;; -1]]}, t[[3 ;; -1, 3 ;; -1]]} = raw;
TableView[N#t, Dividers -> All],
Initialization :> {op = Mean, f1 = data[[1,1]], f2 = data[[1,2]], f3 = data[[1,3]]}
]
]
Use is simply:
pivotTable[data]
A quick-and-dirty pivot table visualization:
I'll start with a more interesting real-life data set:
data = ImportString[#, "TSV"][[1]] & /#
Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]
];
A bit of post-processing:
data =
{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical",
4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}\[Transpose];
header = {"Education", "South", "Sex", "Experience", "Union", "Wage",
"Age", "Race", "Occupation", "Sector", "Marriatal status"};
MapIndexed[(headerNumber[#1] = #2[[1]]) &, header];
levelNames = Union /# Transpose[data];
levelLength = Length /# levelNames;
Now for the real stuff. It also uses the function SelectEquivalents defined in What is in your Mathematica tool bag?
pivotTableData[levelName1_, levelName2_, dependent_, op_] :=
Table[
SelectEquivalents[data,
FinalFunction -> (If[Length[#] == 0, Missing["NotAvailable"], op[# // Flatten]] &),
TagPattern ->
_?(#[[headerNumber[levelName1]]] == levelMember1 &&
#[[headerNumber[levelName2]]] == levelMember2 &),
TransformElement -> (#[[headerNumber[dependent]]] &)
],
{levelMember1, levelNames[[headerNumber[levelName1]]]},
{levelMember2, levelNames[[headerNumber[levelName2]]]}
]
DynamicModule[
{opList =
{Mean ->"Mean of \[Rule]", Total ->"Sum of \[Rule]", Length ->"Count of \[Rule]",
StandardDeviation -> "SD of \[Rule]", Min -> "Min of \[Rule]",
Max -> "Max of \[Rule]"
}, t},
Manipulate[
t=Table["",{levelLength[[headerNumber[h1]]]+2},{levelLength[[headerNumber[h2]]]+2}];
t[[3 ;; -1, 1]] = levelNames[[headerNumber[h1]]];
t[[2, 3 ;; -1]] = levelNames[[headerNumber[h2]]];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{h2, header}];
t[[2, 1]] = Control[{h1, header}];
t[[1, 2]] = Control[{h3, header}];
t[[3 ;; -1, 3 ;; -1]] = pivotTableData[h1, h2, h3, op] // N;
TableView[t, Dividers -> All],
Initialization :> {op = Mean, h1 = "Sector", h2 = "Union", h3 = "Wage"}
]
]
There's still a bit of work to do. The DynamicModule should be turned into a fully standalone function, with the header stuff more streamlined, but this should be sufficient for a first impression.
#Mr.Wizard's answer is indeed robust and long-lasting as it grounds on ReapSow method suitable for some map reduce jobs in Mathematica. Due to the fact that MMA itself develops, consider a new option as well.
GroupBy (introduced in Mathematica v.10.0) provides a generalization of the map reduce operation.
So, the above data job may be implemented as follows (partly an overkill for readability):
headings = Union # data[[All, #]] & /# {1, 4}
{{"East", "North", "South", "West"}, {"01/2011", "02/2011", "03/2011",
"04/2011", "05/2011", "06/2011", "07/2011", "08/2011", "09/2011",
"10/2011", "11/2011", "12/2011"}}
We may use Outer to set up a rectangular template for TableForm:
template = Outer[List, Apply[Sequence][headings]];
Main job with GroupBy and Total as third argument:
pattern = Append[Normal #
GroupBy[data, (#[[{1, 4}]] &) -> (#[[-1]] &), Total],
_ -> Null];
Finally, inject pattern into template (and apply TableForm headings for beauty):
TableForm[Replace[template, pattern, {2}], TableHeadings -> headings]
This outputs some:
Note: we have made a total of last column in data. (Many other aggregations are, of course, possible.)
Use http://www.wolfram.com/products/applications/excel_link/ , this way you have the best of both worlds. This product creates a flawless link between Excel and mma, 2-ways.
Here's what I've come up with. It uses the function SelectEquivalents defined in What is in your Mathematica tool bag?. Function1 and Function2 are meant to have different grouping possibilities of criteria1 and criteria2. FilterFunction is here in order to define an arbitrary filter formula on the data based on the header names.
Using the data example of Mr. Wizard here are some usages of this function.
criteria={"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
criteria1 = "Region";
criteria2 = "Ship Date";
consideredData = "Units";
PivotTable[data,criteria,criteria1,criteria2,consideredData]
A neat example
function2 = If[ToExpression#StringTake[#, 2] <= 6, "First Semester", "Second Semester"] &;
PivotTable[data,criteria,criteria1,criteria2,consideredData,FilterFunction->("Gender"=="Girl"&&"Units"*"Price"<=100&),Function2->function2]
Here's the definition of the function
keysToIndex[keys_] :=
Module[{keyIndex},
(keyIndex[#1] = #2[[1]])&~MapIndexed~keys;
keyIndex
];
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
Options[PivotTable]={Function1->Identity,Function2->Identity,FilterFunction->(True &),AggregationFunction->Total,FormatOutput->True};
PivotTable[data_,criteria_,criteria1_,criteria2_,consideredData_,OptionsPattern[]]:=
Module[{criteriaIndex, criteria1Index, criteria2Index, consideredDataIndex, criteria1Function, criteria2Function, filterFunctionTranslated, filteredResult, keys1, keys1Index, keys2, keys2Index, resultTable, function1, function2, filterFunction, aggregationFunction, formatOutput,p,sharp},
function1 = OptionValue#Function1;
function2 = OptionValue#Function2;
filterFunction = OptionValue#FilterFunction;
aggregationFunction = OptionValue#AggregationFunction;
formatOutput=OptionValue#FormatOutput;
criteriaIndex=keysToIndex[criteria];
criteria1Index=criteriaIndex#criteria1;
criteria2Index=criteriaIndex#criteria2;
consideredDataIndex=criteriaIndex#consideredData;
criteria1Function=Composition[function1,#[[criteria1Index]]&];
criteria2Function=Composition[function2,#[[criteria2Index]]&];
filterFunctionTranslated = filterFunction/.(# -> p[sharp, criteriaIndex##]& /# criteria /. sharp -> #)/.p->Part;
filteredResult=
SelectEquivalents[
data
,
TagElement->({criteria1Function##,criteria2Function##,filterFunctionTranslated##}&)
,
TransformElement->(#[[consideredDataIndex]]&)
,
TagPattern->_?(#[[3]]&)
,
TransformResults->(Append[Most##1,aggregationFunction##2]&)
];
If[formatOutput,
keys1=filteredResult[[All,1]]//Union//Sort;
keys2=filteredResult[[All,2]]//Union//Sort;
resultTable=
SelectEquivalents[
filteredResult
,
TagElement->(#[[{1,2}]]&)
,
TransformElement->(#[[3]]&)
,
TagPattern->Flatten[Outer[List, keys1, keys2], 1]
,
FinalFunction-> (InverseFlatten[Flatten[#/.{}->Missing[]],{Length#keys1,Length#keys2}]&)
];
TableForm[resultTable,TableHeadings->{keys1,keys2}]
,
filteredResult
]
];
I little latter in the game. Here is another self contained solution with object like form.
Using random data created by #Mr.Wizard:
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /# Range#12,
Range#15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 5000] & /# choices // Transpose;
Using an MapIndexed and SparseArray as key functions, here is the code:
Options[createPivotTable]={"RowColValueHeads"-> {1,2,3},"Function"-> Total};
createPivotTable[data_,opts:OptionsPattern[{createPivotTable}]]:=Module[{r,c,v,aggDataIndex,rowRule,colRule,pivot},
{r,c,v}=OptionValue["RowColValueHeads"];
pivot["Row"]= Union#data[[All,r]];
pivot["Col"]= Union#data[[All,c]];
rowRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Row"]];
colRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Col"]];
aggDataIndex={#[[1,r]]/.rowRule,#[[1,c]]/.colRule}->OptionValue["Function"]##[[All,v]]&/#GatherBy[data,#[[{r,c}]]&];
pivot["Data"]=Normal#SparseArray#aggDataIndex;
pivot["Properties"]={"Data","Row","Col"};
pivot["Table"]=TableForm[pivot["Data"], TableHeadings -> {pivot["Row"], pivot["Col"]}];
Format[pivot]:="PivotObject";
pivot
]
That you can use as:
pivot=createPivotTable[data,"RowColValueHeads"-> ({"Ship Date","Region","Units"}/.key)];
pivot["Table"]
pivot["Data"]
pivot["Row"]
pivot["Col"]
To get:
I believe that the speed is faster than #Ms.Wizard, but I have to make a better test, and don't have time now.

Efficient way to remove empty lists from lists without evaluating held expressions?

In previous thread an efficient way to remove empty lists ({}) from lists was suggested:
Replace[expr, x_List :> DeleteCases[x, {}], {0, Infinity}]
Using the Trott-Strzebonski in-place evaluation technique this method can be generalized for working also with held expressions:
f1[expr_] :=
Replace[expr,
x_List :> With[{eval = DeleteCases[x, {}]}, eval /; True], {0, Infinity}]
This solution is more efficient than the one based on ReplaceRepeated:
f2[expr_] := expr //. {left___, {}, right___} :> {left, right}
But it has one disadvantage: it evaluates held expressions if they are wrapped by List:
In[20]:= f1[Hold[{{}, 1 + 1}]]
Out[20]= Hold[{2}]
So my question is: what is the most efficient way to remove all empty lists ({}) from lists without evaluating held expressions? The empty List[] object should be removed only if it is an element of another List itself.
Here are some timings:
In[76]:= expr = Tuples[Tuples[{{}, {}}, 3], 4];
First#Timing[#[expr]] & /# {f1, f2, f3}
pl = Plot3D[Sin[x y], {x, 0, Pi}, {y, 0, Pi}];
First#Timing[#[pl]] & /# {f1, f2, f3}
Out[77]= {0.581, 0.901, 5.027}
Out[78]= {0.12, 0.21, 0.18}
Definitions:
Clear[f1, f2, f3];
f3[expr_] :=
FixedPoint[
Function[e, Replace[e, {a___, {}, b___} :> {a, b}, {0, Infinity}]], expr];
f1[expr_] :=
Replace[expr,
x_List :> With[{eval = DeleteCases[x, {}]}, eval /; True], {0, Infinity}];
f2[expr_] := expr //. {left___, {}, right___} :> {left, right};
How about:
Clear[f3];
f3[expr_] :=
FixedPoint[
Function[e,
Replace[e, {a___, {}, b___} :> {a, b}, {0, Infinity}]],
expr]
It seems to live up to the specs:
In[275]:= f3[{a, {}, {b, {}}, c[d, {}]}]
Out[275]= {a, {b}, c[d, {}]}
In[276]:= f3[Hold[{{}, 1 + 1, {}}]]
Out[276]= Hold[{1 + 1}]
You can combine the solutions you mentioned with a minimal performance hit and maintain the code unevaluated by using a technique from this post, with a modification that the custom holding wrapper will be made private by using Module:
ClearAll[removeEmptyListsHeld];
removeEmptyListsHeld[expr_Hold] :=
Module[{myHold},
SetAttributes[myHold, HoldAllComplete];
Replace[MapAll[myHold, expr, Heads -> True],
x : myHold[List][___] :>
With[{eval = DeleteCases[x, myHold[myHold[List][]]]},
eval /; True],
{0, Infinity}]//. myHold[x_] :> x];
The above function assumes that the input expression is wrapped in Hold. Examples:
In[53]:= expr = Tuples[Tuples[{{}, {}}, 3], 4];
First#Timing[#[expr]] & /# {f1, f2, f3, removeEmptyListsHeld[Hold[#]] &}
Out[54]= {0.235, 0.218, 1.75, 0.328}
In[56]:= removeEmptyListsHeld[Hold[{{},1+1,{}}]]
Out[56]= Hold[{1+1}]
I'm just a bit late with this one. ;-)
Though rather complicated this tests about an order of magnitude faster than your f1:
fx[expr_] :=
Module[{s},
expr //
Quiet[{s} /. {x_} :> ({} /. {x___} -> (# /. {} -> x //. {x ..} -> x) &)]
]
It does not evaluate:
Hold[{{}, 1 + 1}] // fx
Hold[{1 + 1}]
Timings
expr = Tuples[Tuples[{{}, {}}, 3], 4];
First # Timing # Do[# # expr, {100}] & /# {f1, fx}
pl = Plot3D[Sin[x y], {x, 0, Pi}, {y, 0, Pi}];
First # Timing # Do[# # pl, {100}] & /# {f1, fx}
{10.577, 0.982} (* 10.8x faster *)
{1.778, 0.266} (* 6.7x faster *)
Check
f1#expr === fx#expr
f1#pl === fx#pl
True
True
Explanation
The basic version of this function would look like this:
{} /. {x___} -> (# //. {} | {x ..} -> x) &
The idea is to first reduce the expression with //. {} | {x ..} -> x and then use the injector pattern with an empty expression to remove all instances of x, as though they were replaced with Sequence[] but without evaluation.
The first change is to optimize this somewhat by splitting the replacement into /. {} -> x //. {x ..} -> x. The second change is to somehow localize x in the patterns so that it does not fail if x appears in the expression itself. Because of the way Mathematica handles nested scoping constructs I cannot simply use Module[{x}, . . . ] but instead have to use the injector pattern again to get a unique symbol into x___ etc., and Quiet to keep it from complaining about the nonstandard use.

Resources