Mathematica, efficient way to compare dates - wolfram-mathematica

I have a list like this:
{{2002, 4, 10}, 9.61}, {{2002, 4, 11}, 9.53}, {{2002, 4, 12}, 9.58},
I need to lookup this list to find the exact match of date, if there is no match, I'll have the next available date in the list, here is my code:
Select[history, DateDifference[#[[1]], {2012, 3, 17}] <= 0 &, 1]
but it's a lot slower than just looking for exact match, is there a faster way to do this? Thank you very much!

It is true that DateDifference is rather slow. This can be worked around by converting all dates to "absolute times", which in Mathematica means the number of seconds elapsed since 1900 January 1.
Here's an example. This is the data:
data = {AbsoluteTime[#1], #2} & ###
FinancialData["GOOG", {{2010, 1, 1}, {2011, 1, 1}}];
We're looking for this date or the next one if this is not available:
date = AbsoluteTime[{2010, 8, 1}]
One way to retrieve it is:
dt[[1 + LengthWhile[dt[[All, 1]], # < date &]]]
You'll find other methods, including an already implemented binary search, in the answers to this question.

finddate[data:{{{_Integer, _Integer, _Integer}, _}..},
date:{_Integer, _Integer, _Integer}] :=
First[Extract[data, (Position[#1, First[Nearest[#1, AbsoluteTime[date]]]] & )[
AbsoluteTime/# data[[All,1]]]]]
will do what you want.
E.g.,
finddate[{{{2002, 4, 10}, 9.61}, {{2002, 4, 11}, 9.53}, {{2002, 4, 12}, 9.58}},
{2012, 3, 17}]
gives {{2002, 4, 12}, 9.58}
It seems to be reasonably fast ( half a second for 10^5 dates ).

Could you / would it be faster for you to write a binary search, assuming that your history is ordered?
That should give you the date in log(n) comparisons, which is way better than the linear filter you appear to be using now.
If will give you the date, if it exists, or if the date does not exist, it will give you the point where you should insert the new date.

Fastest thing for many accesses into the same dataset is to create an interpolation function based on the AbsoluteTime[] of the date and the value. If the default swings the wrong way, you can negate all the "seconds" and it'll swing that way.

Related

How do I apply Map[] to a function using two arguments in Mathematica?

In general, I was trying to compute the norm of the difference between every set two elements in a list which looks something like
X = {{1,2,3,4,5},{6,7,8,9,10},{11,12,13,14,15}}
therefore evaluating
Norm[X[[1]]-X[[2]]]
Norm[X[[1]]-X[[3]]]
Norm[X[[2]]-X[[3]]]
Now, applying Outer[] is one possible way how to do this
Outer[Norm[X[[#1]] - X[[#2]]] &, {1,2,3}, {1,2,3}]
but unfortunately it results in a quite slow code if I increase the number of elements in X and the length of each element.
Is there any possible way to construct a Map[] operation? Something like
MapThread[Norm[X[[#1]] - X[[#2]]] &,{{1,2,3},{1,2,3}}]
does not work give the desired "currying" which I was looking for.
I'm using Mathematica Version 11.2.0.0, so I don't have access to Curry[].
Would be happy about any advice!
mX = {{1, 2, 3, 4, 5}, {6, 7, 8, 9, 10}, {11, 12, 13, 14, 15}}
Apply[Norm#*Subtract, #] & /# Subsets[mX, {2}]
Two equivalent approaches:
(Norm#*Subtract) ### Subsets[mX, {2}]
Apply[Norm#*Subtract, Subsets[mX, {2}], {1}]

Combining Sublists Sequentially Based on Similar Elements in Mathematica

I am attempting to combine sublists in a list of data given below:
Data={{{2, 6, 3, 5}, {4, 2, 5, 1}}, {{2, 6, 3, 5}, {6, 4, 7, 3}},
{{8, 12 ,9 ,11}, {12 ,8 , 13, 7}},
{{10, 13, 11, 14}, {14, 9, 1, 10}};
The goal is to combine sublists based on whether each pair has a similar term, like this:
FinalData={{{2,6,5,3},{4,2,5,1},{6,4,7,3}},
{{8, 12 ,9 ,11}, {12 ,8 , 13, 7}},
{{10, 13, 11, 14}, {14, 9, 1, 10}}};
I've attempted to solve this problem using multiple methods such as For loops, while loops, Gather, Union, and Select, but still am stuck. Would anyone be willing to help me out? First post here, and I am hoping to get some advice! Thank you in advance.
this reproduces your example:
Union[Flatten[#, 1]] & /# GatherBy[data, First]
Note this is only grouping where the first sublist is the same, and Union sorts the results. If you need it more general you should give a more general example.
This
Data //. {{h___,{p_,q_},m___,{p_,r_},t___}->{h,{p,q,r},m,t},
{h___,{p_,q_},m___,{r_,q_},t___}->{h,{p,q,r},m,t}}
searches your data to find any list {p,q} and another list {p,r} and turns those into {p,q,r}. It also searches to find any list {p,q} and another list {r,q} and turns those into {p,q,r}. And it does that over and over until no further lists match. You should test that carefully to make certain that it is correct in all cases. You should look up //. which is also called ReplaceRepeated in the documentation to try to understand how that works. You should also look up "triple blank" which is three underscores in a row and is in the documentation as BlankNullSequence to try to understand how that works. And look up how putting a symbol in front of _ or ___ "names the pattern" to try to understand how that works. Understanding all this will give you new power to write programs to control Mathematica.

How to find modulus patterns using Mathematica

Is there any way to find the lowest modulus of a list of integers? I'm not sure how to say it correctly, so I'm going to clarify with an example.
I'd like to input a list (mod x) and output the "same" list, modulus y (< x). For example, the list {0, 4, 6, 10, 12, 16, 18, 22} (mod 24) is essentially the same as {0, 4} (mod 6).
Thank you for all your help.
You are looking for a set of arithmetic sequences. We'll consider your example
ee = {0, 4, 6, 10, 12, 16, 18, 22};
which has two such sequences, and an example with four of them.
ff = {0, 3, 7, 11, 17, 20, 24, 28, 34, 37, 41, 45};
In this second one we start with {0,3,7,11} and then increase by 17. So what is the general way to get from the nth term to the n+1th? If the set has k sequences (k=2 for ee and 4 for ff) then add the modulus to the n-k+1th term. What is the modulus? It is the difference between the nth and n-kth terms.
Putting this together and assuming we know k (we don't in general, but we'll get to that) we have a recurrence of the form f(n+1)=f(n-k+1) + (f(n)-f(n-k)). So we need to find a recurrence (if one exists), check that it is of the correct form, and post-process if so.
Here is code to do all this. Note that it in effect solves for k.
findArithmeticSequences[ll : {_Integer ..}] := With[
{rec = FindLinearRecurrence[ll]},
{Take[ll, Length[rec] - 1], ll[[Length[rec]]]} /;
ListQ[rec] &&
(rec === {1, 1, -1} || MatchQ[rec, {1, 0 .., 1, -1}])
]
(Afficionados of pure functions might prefer the variant below. Failure cases are handled a bit differently, for no compelling reason.)
findArithmeticSequences2[ll : {_Integer ..}] :=
If[ListQ[#] &&
(# === {1, 1, -1} || MatchQ[#, {1, 0 .., 1, -1}]), {Take[ll,
Length[#] - 1], ll[[Length[#]]]}, $Failed] &[
FindLinearRecurrence[ll]]
Tests:
In[115]:= findArithmeticSequences[ee]
Out[115]= {{0, 4}, 6}
In[116]:= findArithmeticSequences[ff]
Out[116]= {{0, 3, 7, 11}, 17}
Note that one can "almost" do such problems by polynomial factorization (if the input has no partial sequences at the end). For example, the polynomial
In[117]:= poly = Plus ## (x^ee)
Out[117]= 1 + x^4 + x^6 + x^10 + x^12 + x^16 + x^18 + x^22
factors into
(1+x^4)*(1+x^6+x^12+x^18)
which contains the needed information in a way that is easy to see. Unfortunately for this particular purpose, Factor will factor beyond this point, and obscure the information in so doing.
I keep wondering if there might be a signal processing way to go about this sort of thing, e.g. via DFTs. But I've not come up with anything.
Daniel Lichtblau
Wow, thank you Daniel for this! It works nearly the way I want it to. Your method is just a bit "too restrictive". It doesn't return anything useful if 'FindLinearRecurrence' doesn't find any recurrence. I've modified your method a bit, so it suits my needs better. I hope you don't mind. Here's my code.
findArithmeticSequences[ll_List] := Module[{rec = FindLinearRecurrence[ll]},
If[! MatchQ[rec, {1, 0 ..., 1, -1}], Return[ll],
Return[{ll[[Length[rec]]], Take[ll, Length[rec] - 1]}];
];
];
I had a feeling it'd have to involve recurrence, I just don't have enough experience with Mathematica to implement it. Thank you again for your time!
Mod is listable, and you can remove duplicate elements by DeleteDuplicates. So
DeleteDuplicates[Mod[{0, 4, 6, 10, 12, 16, 18, 22}, 6]]
(*
-> {0,4}
*)

Conditional Data Manipulation in Mathematica

I am trying to prepare the best tools for efficient Data Analysis in Mathematica.
I have a approximately 300 Columns & 100 000 Rows.
What would be the best tricks to :
"Remove", "Extract" or simply "Consider" parts of the data structure, for plotting for e.g.
One of the trickiest examples I could think of is :
Given a data structure,
Extract Column 1 to 3, 6 to 9 as well as the last One for every lines where the value in Column 2 is equal to x and the value in column 8 is different than y
I also welcome any general advice on data manipulation.
For a generic manipulation of data in a table with named columns, I refer you to this solution of mine, for a similar question. For any particular case, it might be easier to write a function for Select manually. However, for many columns, and many different queries, chances to mess up indexes are high. Here is the modified solution from the mentioned post, which provides a more friendly syntax:
Clear[getIds];
getIds[table : {colNames_List, rows__List}] := {rows}[[All, 1]];
ClearAll[select, where];
SetAttributes[where, HoldAll];
select[cnames_List, from[table : {colNames_List, rows__List}], where[condition_]] :=
With[{colRules = Dispatch[ Thread[colNames -> Thread[Slot[Range[Length[colNames]]]]]],
indexRules = Dispatch[Thread[colNames -> Range[Length[colNames]]]]},
With[{selF = Apply[Function, Hold[condition] /. colRules]},
Select[{rows}, selF ## # &][[All, cnames /. indexRules]]]];
What happens here is that the function used in Select gets generated automatically from your specifications. For example (using #Yoda's example):
rows = Array[#1 #2 &, {5, 15}];
We need to define the column names (must be strings or symbols without values):
In[425]:=
colnames = "c" <> ToString[#] & /# Range[15]
Out[425]= {"c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10", "c11", "c12",
"c13", "c14", "c15"}
(in practice, usually names are more descriptive, of course). Here is the table then:
table = Prepend[rows, colnames];
Here is the select statement you need (I picked x = 4 and y=2):
select[{"c1", "c2", "c3", "c6", "c7", "c8", "c9", "c15"}, from[table],
where["c2" == 4 && "c8" != 2]]
{{2, 4, 6, 12, 14, 16, 18, 30}}
Now, for a single query, this may look like a complicated way to do this. But you can do many different queries, such as
In[468]:= select[{"c1", "c2", "c3"}, from[table], where[EvenQ["c2"] && "c10" > 10]]
Out[468]= {{2, 4, 6}, {3, 6, 9}, {4, 8, 12}, {5, 10, 15}}
and similar.
Of course, if there are specific correlations in your data, you might find a particular special-purpose algorithm which will be faster. The function above can be extended in many ways, to simplify common queries (include "all", etc), or to auto-compile the generated pure function (if possible).
EDIT
On a philosophical note, I am sure that many Mathematica users (myself included) found themselves from time to time writing similar code again and again. The fact that Mathematica has a concise syntax makes it often very easy to write for any particular case. However, as long as one works in some specific domain (like, for example, data manipulations in a table), the cost of repeating yourself will be high for many operations. What my example illustrates in a very simple setting is a one possible way out - create a Domain-Specific Language (DSL). For that, one generally needs to define a syntax/grammar for it, and write a compiler from it to Mathematica (to generate Mathematica code automatically). Now, the example above is a very primitive realization of this idea, but my point is that Mathematica is generally very well suited for DSL creation, which I think is a very powerful technique.
data = RandomInteger[{1, 20}, {40, 20}]
x = 5;
y = 8;
Select[data, (#[[2]] == x && #[[8]] != y &)][[All, {1, 2, 3, 6, 7, 8, 9, -1}]]
==> {{5, 5, 1, 4, 18, 6, 3, 5}, {10, 5, 15, 3, 15, 14, 2, 5}, {18, 5, 6, 7, 7, 19, 14, 6}}
Some useful commands to get pieces of matrices and list are Span (;;), Drop, Take, Select, Cases and more. See tutorial/GettingAndSettingPiecesOfMatrices and guide/PartsOfMatrices,
Part ([[...]]) in combination with ;; can be quite powerful. a[[All, 1;;-1;;2]], for instance, means take all rows and all odd columns (-1 having the usual meaning of counting from the end).
Select can be used to pick elements from a list (and remember a matrix is a list of lists), based on a logical function. It's twin brother is Cases which does selection based on a pattern. The function I used here is a 'pure' function, where # refers to the argument on which this function is applied (the elements of the list in this case). Since the elements are lists themselves (the rows of the matrix) I can refer to the columns by using the Part ([[..]]) function.
To pull out columns (or rows) you can do it by part indexing
data = Array[#1 #2 &, {5, 15}];
data[[All, Flatten#{Range#3, Range ## {6, 9}, -1}]]
MatrixForm#%
The last line is just to view it pretty.
As Sjoerd mentioned in his comment (and in the explanation in his answer), indexing a single range can be easily done with the Span (;;) command. If you are joining multiple disjoint ranges, using Flatten to combine the separate ranges created with Range is easier than entering them by hand.
I read:
Extract Column 1 to 3, 6 to 9 as well as the last One for every lines where the value in Column 2 is equal to x and the value in column 8 is different than y
as meaning that we want:
elements 1-3 and 6-9 from each row
AND
the last element from rows wherein [[2]] == x && [[8]] != y.
This is what I hacked together:
a = RandomInteger[5, {20, 10}]; (*define the array*)
x = 4; y = 0; (*define the test values*)
Join ## Range ### {1 ;; 3, 6 ;; 9}; (*define the column ranges*)
#2 == x && #8 != y & ### a; (*test the rows*)
Append[%%, #] & /# % /. {True -> -1, False :> Sequence[]}; (*complete the ranges according to the test*)
MapThread[Part, {a, %}] // TableForm (*extract and display*)

Shuffling a list in Mathematica

What's the best/easiest way to shuffle a long list in Mathematica?
RandomSample[list]
Yes, it's really that simple. At least since version 6.
Before RandomSample was introduced, one might use:
#[[ Ordering[Random[] & /# #] ]] & # list
Before RandomSample was introduced, I've used the below MathGroup-function heavily, though RandomSample is faster at least by one magnitude on my machine.
In[128]:= n = 10;
set = Range#n
Out[129]= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
In[130]:= Take[set[[Ordering[RandomReal[] & /# Range#n]]], n]
Out[130]= {8, 4, 5, 2, 3, 10, 7, 9, 6, 1}
Other problem besides performance is that if the same random reals are hit twice (improbable, though possible) Ordering will not give these two in random order.
Currently I use
list[[PermutationList#RandomPermutation#Length[list]]]
This is for Mathematica 8. Combinatorica also has a RandomPermutation function (earlier versions).
I am looking for other/better solutions, if there are any.

Resources