Alorithm to find line segments closest to the X-Axis - algorithm

We have a list of line segments (intervals) :
input Array of objects in the following order:
// Start and end represents the x coordinate and distance represents the y coordinate.
{start: 13, end: 15, distance: 1}, // S[0] -- pale red
{start: 12, end: 15, distance: 2}, // S[1] -- pale orange
{start: 2, end: 5, distance: 1}, // S[2] -- pale yellow
{start: 7, end: 9, distance: 2}, // S[3] -- pale green 1
{start: 7, end: 9, distance: 2}, // S[4] -- pale green 2
{start: 6, end: 8, distance: 2}, // S[5] -- fresh green
{start: 2, end: 5, distance: 4}, // S[6] -- pale gray
{start: 5, end: 11, distance: 4}, // S[7] -- air blue
{start: 9, end: 10, distance: 1}, // S[8] -- cyan blue
{start: 1, end: 11, distance: 3}, // S[9] -- magenta purple
We want to find the parts of each of intervals closest to x axis:
{start: 1, end: 2, in: S[9]},
{start: 2, end: 5, in: S[2]},
{start: 5, end: 6, in: S[9]},
{start: 6, end: 8, in: S[5]},
{start: 8, end: 9, in: S[3]},
{start: 9, end: 10, in: S[8]},
{start: 10, end: 11, in: S[9]},
{start: 12, end: 13, in: S[1]},
{start: 13, end: 15, in: S[0]},
One way of achieving this is
First sorting the array of objects.
Then using N no. of Stacks to push each interval and keep the one which is closest to x axis (lowest distance) and creating the final set.
But this wont be the optimal one.
What should be the optimal solution for this?

Related

Find Top N Most Frequent Sequence of Numbers in List of a Billion Sequences

Let's say I have the following list of lists:
x = [[1, 2, 3, 4, 5, 6, 7], # sequence 1
[6, 5, 10, 11], # sequence 2
[9, 8, 2, 3, 4, 5], # sequence 3
[12, 12, 6, 5], # sequence 4
[5, 8, 3, 4, 2], # sequence 5
[1, 5], # sequence 6
[2, 8, 8, 3, 5, 9, 1, 4, 12, 5, 6], # sequence 7
[7, 1, 7, 3, 4, 1, 2], # sequence 8
[9, 4, 12, 12, 6, 5, 1], # sequence 9
]
Essentially, for any list that contains the target number 5 (i.e., target=5) anywhere within the list, what are the top N=2 most frequently observed subsequences with length M=4?
So, the conditions are:
if target doesn't exist in the list then we ignore that list completely
if the list length is less than M then we ignore the list completely
if the list is exactly length M but target is not in the Mth position then we ignore it (but we count it if target is in the Mth position)
if the list length, L, is longer than M and target is in the i=M position(ori=M+1position, ori=M+2position, ...,i=Lposition) then we count the subsequence of lengthMwheretarget` is in the final position in the subsequence
So, using our list-of-lists example, we'd count the following subsequences:
subseqs = [[2, 3, 4, 5], # taken from sequence 1
[2, 3, 4, 5], # taken from sequence 3
[12, 12, 6, 5], # taken from sequence 4
[8, 8, 3, 5], # taken from sequence 7
[1, 4, 12, 5], # taken from sequence 7
[12, 12, 6, 5], # taken from sequence 9
]
Of course, what we want are the top N=2 subsequences by frequency. So, [2, 3, 4, 5] and [12, 12, 6, 5] are the top two most frequent sequences by count. If N=3 then all of the subsequences (subseqs) would be returned since there is a tie for third.
Important
This is super simplified but, in reality, my actual list-of-sequences
consists of a few billion lists of positive integers (between 1 and 10,000)
each list can be as short as 1 element or as long as 500 elements
N and M can be as small as 1 or as big as 100
My questions are:
Is there an efficient data structure that would allow for fast queries assuming that N and M will always be less than 100?
Are there known algorithms for performing this kind of analysis for various combinations of N and M? I've looked at suffix trees but I'd have to roll my own custom version to even get close to what I need.
For the same dataset, I need to repeatedly query the dataset for various values or different combinations of target, N, and M (where target <= 10,000, N <= 100 and `M <= 100). How can I do this efficiently?
Extending on my comment. Here is a sketch how you could approach this using an out-of-the-box suffix array:
1) reverse and concatenate your lists with a stop symbol (I used 0 here).
[7, 6, 5, 4, 3, 2, 1, 0, 11, 10, 5, 6, 0, 5, 4, 3, 2, 8, 9, 0, 5, 6, 12, 12, 0, 2, 4, 3, 8, 5, 0, 5, 1, 0, 6, 5, 12, 4, 1, 9, 5, 3, 8, 8, 2, 0, 2, 1, 4, 3, 7, 1, 7, 0, 1, 5, 6, 12, 12, 4, 9]
2) Build a suffix array
[53, 45, 24, 30, 12, 19, 33, 7, 32, 6, 47, 54, 51, 38, 44, 5, 46, 25, 16, 4, 15, 49, 27, 41, 37, 3, 14, 48, 26, 59, 29, 31, 40, 2, 13, 10, 20, 55, 35, 11, 1, 34, 21, 56, 52, 50, 0, 43, 28, 42, 17, 18, 39, 60, 9, 8, 23, 36, 58, 22, 57]
3) Build the LCP array. The LCP array will tell you how many numbers a suffix has in common with its neighbour in the suffix array. However, you need to stop counting when you encounter a stop symbol
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 2, 1, 1, 0, 2, 1, 1, 2, 0, 1, 3, 2, 2, 1, 0, 1, 1, 1, 4, 1, 2, 4, 1, 0, 1, 2, 1, 3, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 2, 1, 2, 0]
4) When a query comes in (target = 5, M= 4) you search for the first occurence of your target in the suffix array and scan the corresponding LCP-array until the starting number of suffixes changes. Below is the part of the LCP array that corresponds to all suffixes starting with 5.
[..., 1, 1, 1, 4, 1, 2, 4, 1, 0, ...]
This tells you that there are two sequences of length 4 that occur two times. Brushing over some details using the indexes you can find the sequences and revert them back to get your final results.
Complexity
Building up the suffix array is O(n) where n is the total number of elements in all lists and O(n) space
Building the LCP array is also O(n) in both time and space
Searching a target number in the suffix is O(log n) in average
The cost of scanning through the relevant subsequences is linear in the number of times the target occurs. Which should be 1/10000 on average according to your given parameters.
The first two steps happen offline. Querying is technically O(n) (due to step 4) but with a small constant (0.0001).

Element-wise maximum value for two lists

Given two Mathematica sets of data such as
data1 = {0, 1, 3, 4, 8, 9, 15, 6, 5, 2, 0};
data2 = {0, 1, 2, 5, 8, 7, 16, 5, 5, 2, 1};
how can I create a set giving me the maximum value of the two lists, i.e. how to obtain
data3 = {0, 1, 3, 5, 8, 9, 16, 6, 5, 2, 1};
?
data1 = {0, 1, 3, 4, 8, 9, 15, 6, 5, 2, 0};
data2 = {0, 1, 2, 5, 8, 7, 16, 5, 5, 2, 1};
Max /# Transpose[{data1, data2}]
(* {0, 1, 3, 5, 8, 9, 16, 6, 5, 2, 1} *)
Another possible solution is to use the MapThread function:
data3 = MapThread[Max, {data1, data2}]
belisarius solution however is much faster.
Simplest, though not the fastest:
Inner[Max,data1,data2,List]

Group time intervals by date (in d3.js)

For instance, there is an array of objects with start, end and duration (in hours) attributes.
[{start: new Date(2013, 2, 4, 0),
end: new Date(2013, 2, 4, 8),
duration: 8},
{start: new Date(2013, 2, 4, 22),
end: new Date(2013, 2, 5, 2),
duration: 4},
{start: new Date(2013, 2, 5, 5),
end: new Date(2013, 2, 7, 5),
duration: 48}]
I'd like to visualize them into something like the following (y - hours, x - dates):
I'm thinking about creating additional objects to fill the empty spaces between events like this
[{start: new Date(2013, 2, 4, 0),
end: new Date(2013, 2, 4, 8),
status: "busy"},
{start: new Date(2013, 2, 4, 8, 0, 1),
end: new Date(2013, 2, 4, 21, 59, 59),
status: "free"},
{start: new Date(2013, 2, 4, 22),
end: new Date(2013, 2, 4, 23, 59, 59),
status: "busy"},
{start: new Date(2013, 2, 5, 0),
end: new Date(2013, 2, 5, 2),
status: "busy"}]
And then map this to Stack Layout.
So my question is, how would be better to split and group the array, to make this visualization easier? Maybe there is some built-in D3.js features for this?
I would consider changing the data format to
[{start: new Date(2013, 2, 4, 0),
end: new Date(2013, 2, 4, 8)},
{start: new Date(2013, 2, 4, 22),
end: new Date(2013, 2, 5, 2)},
{start: new Date(2013, 2, 5, 5),
end: new Date(2013, 2, 7, 5)}]
Since you have the start and end date, you don't really need a duration. Alternatively you could have just the start date and a duration.
I'm not extremely familiar with the stacklayout, but it might be sufficent (and easier) for this project to simply append rect elements to the right position. I made an example here: http://tributary.io/inlet/5841372 which doesn't take into account the fact that you need to wrap events that start one day and end the next. This just displays all events in the same column, with the white space representing free time.

How to plot 2 lines and spcify that the first column in list is the x-axis?

I have a list that has 5 elements;
{{1, 6, 4.5}, {2, 7, 4.5}, {3, 5, 5}, {4, 8, 5}, {5, 9, 5}, {6, 10, 5}}
I would like to plot this so that the first column is the x-axis, and the other 2 columns would be plotted as lines. Like this:
Line 1 would be have these values 6, 7, 5, 8, 9, 10 and Line 2 would be built in the same manner.
I tired:
ListPlot[data3, Joined -> True, Frame -> True,
FrameLabel -> {"Test Number", "TS Index"}, PlotRange -> All]
The resulting chart was not what I wanted.
How can I plot this as a 2 line plot? Also, what is the technical name for the numbers inside each element?
Thank you in advance.
lst = {{1,6,4.5},{2,7,4.5},{3,5,5},{4,8,5},{5,9,5},{6,10,5}}
line1 = lst[[All,{1,2}]]
line2 = lst[[All,{1,3}]]
ListPlot[{line1,line2},Joined->True,Mesh->All,AxesOrigin->{0,0}]

How can I implement such a map-like operation in mathematica

I have a list and an arbitrary function taking 4 parameters, let's say {1, 11, 3, 13, 9, 0, 12, 7} and f[{x,y,z,w}]={x+y, z+w}, what I want to do is to form a new list such that 4 consecutive elements in the original list are evaluated to get a new value as the new list's component, and the evaluation has to be done in every 2 positions in the original list, in this case, the resulting list is:
{{12, 16}, {16, 9}, {9, 19}}
Note here 4 and 2 can change. How to do this conveniently in Mathematica? I imagine this as something like Map, but not sure how to relate.
There's an alternative to Map[f, Partition[...]]: Developer`PartitionMap. Which works exactly like Map[f, Partition[list, n, ...]]. So, your code would be
Needs["Developer`"]
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
PartitionMap[f,list, 4, 2]
giving the same result as Mark's answer.
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
f /# Partition[list, 4, 2]

Resources