Binning in d3.js (reducing the number of elements being displayed)

Binning in d3.js (reducing the number of elements being displayed) - d3.js

I'm quite new to d3.js and I got a question regarding binning.I'm working on a visualization that uses a matrix layout in order to visualize sets (rows) and elements (columns). As the number of sets is very high, I'd like to use some sort of binning, i.e., I'd like to map values from an input domain to some output range in order to reduce the number of items shown on the screen.
Let's consider the following basic code snippet:
var input = [
{ degree: 1, count: 2070 },
{ degree: 2, count: 1311 },
{ degree: 3, count: 398 },
{ degree: 4, count: 93 },
{ degree: 5, count: 9 }
];
var desired_bins = 3;
In the example above I have an array of length 5 which serves as the input. The desired length of my output is 3 items in this case, given by desired_bins=3.
What I'd like to compute is something like the following:
var output = [
{ range: "1", count: 2070 },
{ range: "2", count: 1311 },
{ range: "3-5", count: 500}
];
The logic behind the binning should be the following:
Each output bin should not contain more than n/k values, where n is the total number of elements in input (2070 + 1311 + ... + 9 = 3881) and k is the number of desired output bins, 3 in this case.
So each bin should contain at most 1294 elements. If one item from the input domain already contains more than 1294 elements, than this has to be a separate output bin as it can't be split up.
I was looking into d3.scales which can be used for a lot of things obviously but I'm not sure if they are suitable for my particular use case.
Any ideas how to solve this with the built in functionality of d3? Or do I just need to implement the binning algorithm from scratch?

Related

Ranking algorithm with multiples factors

So I'd like to rank items depending on multiples factors, however some are more important than others.
Concretely, I've a list of products which all have the following properties:
A price
A weight (in kilogrammes)
A time to build (in minutes)
A size (in centimetres)
Each property has a different scale, and I know the min & max range of them.
For example the price are between 10 and 200, while the weight are between 1.2 and 3.4, etc.
I'd like to apply a priority to the size, then to the time to build, weight and finally the price.
However, I'd like to ensure that no matter the time to build, the weight or the price values are, the size should be the first things that should matters.
For example:
[{
price: 320,
size: 10,
weight: 0.4
time: 4
},
{
price: 230,
size: 5,
weight: 1.2
time: 23
},
{
price: 230,
size: 10,
weight: 1.2
time: 23
}]
should results in:
[{
price: 230,
size: 5, // the lowest the better
weight: 1.2
time: 23
},
{
price: 320, // the higher the better
size: 10,
weight: 0.4
time: 4
},
{
price: 230,
size: 10,
weight: 1.2
time: 23
}]
I'm not very good at math and I don't really know where to start.
I'm thinking of something like scale each values on the same range (for example from 0 to 100) and them apply a factor the resulting range value and them add them all before to sort or something like that.
Any ideas.

It looks like you want to sort by size in increasing order, then if two objects have the same size then by time in increasing order, then by weight in increasing order, then by price in decreasing order (according to the comment in your example).
If you are using a language like Python, just put the four values into a tuple, with the items to be in decreasing order with their values replaced with their negatives. For each item your key is
(size, time, weight, -price)
Then Python itself will sort those tuples appropriately--it is built into the language. This is the easiest thing to do.
If you are using a language without sorted tuples, or for some reason you really want the key to be a floating point value, you can do this. For each factor, look at the known minimum and maximum. Use those to scale that factor to a number between 1 and 10, perhaps including 1 but not including 10. Make sure 1 goes with the value to be sorted first. This can be done with
scale1to10 = (value - min) / (max + 1 - min) * 9 + 1
for increasing factors, and with
scale1to10 = (max - value) / (max + 1 - min) * 9 + 1
for decreasing factors. Then combine all the factors into one "4-digit" number, as in
scale = scalesize * 1000 + scaletime * 100 + scaleweight * 10 + scaleprice
Note that the most important factor is multiplied by the highest power of 10, while the lowest has no multiplier.

Your sorting function should have the ability to pass a comparison function. Pass a comparison function that tests each field in order, only moving to the next if everything so far is equal. In most languages this is not too hard to do. My guess (based on your posting history) is that you're using PHP so http://www.the-art-of-web.com/php/sortarray/ is likely to be the best guide to how to do that.

What algorithm should i use to see if i can get n from given numbers

I'm give number n and x different numbers,i should find out if i can get n from given numbers using + and -.What algorithm should i use?
For example .
Input Output
n=10 . 15+25-30=10
15 25 30

You can do it dynamically. Iterate through the given numbers and store the values, that are achieveable by taking the previous result and adding/subtracting the present number. In your example it's going to be:
{ 0 }
{ -15, 15 }
{ -40, 10, -10, 40 }
{ -70, -10, -20, 40, -40, 20, 10, 70 }
Finally, just check if your value is in the final set. It looks like exponential algorithm (set size doubles with each iteration), though the numbers repeat quickly. Also, you can actually store only positive values.

d3.ticks() in d3 v4 often gives a top tick value lower than the highest value in the range

In D3 version 4's D3-array module, d3.ticks() takes an array and outputs an array of nicely-rounded values suitable for chart axis labels:
d3.ticks(start, stop, count)
Returns an array of approximately count + 1 uniformly-spaced, nicely-rounded values between start and stop (inclusive). Each value is a power of ten multiplied by 1, 2 or 5. See also tickStep and linear.ticks...
Ticks are inclusive in the sense that they may include the specified start and stop values if (and only if) they are exact, nicely-rounded values consistent with the inferred step. More formally, each returned tick t satisfies start ≤ t and t ≤ stop.
However, I don't get the results I expect. For example, for this:
d3.ticks( 0, 63500, 7 );
I'd expect output like:
[ 0, 10000, 20000, 30000, 40000, 50000, 60000, 70000 ]
Instead, what I get is:
[ 0, 10000, 20000, 30000, 40000, 50000, 60000 ]
... where my highest value (63500) is greater than my chart's highest tick, meaning I'd expect a value to extend off the chart.
Requesting a different number of ticks doesn't solve it:
( 0, 63500, 8 ) gives the same thing
( 0, 63500, 9 ) also gives the same
( 0, 63500, 10 ) gives [0, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, 55000, 60000] which is no better
From random trial and error, the only input I could find that actually exceeded my maximum value was 24, which gave [0, 2000, 4000, ... 62000, 64000]
This is for a function that can be passed any data set, so it can't require a hand-picked number of ticks for each data set
Why is this happening (have I misunderstood something, or doesn't this violate "t ≤ stop") from the docs?), and how do I ensure that my ticks do cover my range?
I've seen the question d3.js scale doesn't place tick at the top which describes a similar problem in D3 version 3 with an illustration:
...but the functions are different in version 4 and the documentation links in the answer don't apply to D3 version 4.
I've looked at D3-Scale - the link to linear.ticks() appears to be broken - but I'm not really understanding why what I'm trying to do would need a whole new D3 submodule and scaling algorithim.

Try this
getRoundedYMaxValue(number) {
var numLen = number.toString().length;
var dividerValue = "1"
for (var i = 0; i < numLen - 1; i++) {
dividerValue += 0
}
return Math.ceil(number / dividerValue) * dividerValue
}

Method for associating vector elements with items they represent

Imagine an "item" structure (represented as a JSON hash)
{
id: 1,
value: 5
}
Now imagine I have a set of 100,000 items, and I need to perform calculations on the value associated with each. At the end of the calculation, I update each item with the new value.
To do this quickly, I have been using GSL vector libraries, loading each value as an element of the vector.
For example, the items:
{ id: 1, value: 5 }
{ id: 2, value: 6 }
{ id: 3, value: 7 }
Becomes:
GSL::Vector[5, 6, 7]
Element 1 corresponds to item id 1, element 2 corresponds to item id 2, etc. I then proceed to perform element-wise calculations on each element in the vector, multiplying, dividing etc.
While this works, it bothers me that I have to depend on the list of items being sorted by ID.
Is there another structure that acts like a hash (allowing me to say with certainty a particular result value corresponds to a particular item), but allows me to do fast, memory efficient element-wise operations like a vector?
I'm using Ruby and the GSL bindings, but willing to re-write this in another language if necessary.

Composing an average stream piecewise

I have a list of n floating point streams each having a different size.
The streams can be be composed together using the following rules:
You can put a stream starting at any point in time (its zero before it started). You can use the same stream few times (it can overlap itself and even be in the same position few times) and you are allowed to not use a certain stream at all.
e.g.:
input streams:
1 2 3 4
2 4 5 6 7
1 5 6
Can be composed like:
1 2 3 4
1 5 6
1 5 6
After the placements an output stream is composed by the rule that each output float equals to the square root of the sum of the square of each term.
e.g.:
If the streams at a position are:
1
2
3
The output is:
sqrt(1*1 + 2*2 + 3*3) = sqrt(14) = 3.74...
So for the the example composition:
1 2 3 4
1 5 6
1 5 6
The output is:
1 5.09 6.32 3 4.12 5 6
What I have is the output stream and the input streams. I need to compute the composition that lead to that output. an exact composition doesn't have to exists - I need a composition as close as possible to the output (smallest accumulated difference).
e.g.:
Input:
Stream to mimic:
1 5.09 6.32 3 4.12 5 6
and a list:
1 2 3 4
2 4 5 6 7
1 5 6
Expected output:
Stream 0 starting at 1,
Stream 2 starting at 0,
Stream 2 starting at 4.
This seems like an NP problem, is there any fast way to solve this? it can be somewhat brute force (but not totally, its not theoretic problem) and it can give not the best answer as long as its close enough.
The algorithm will be usually used with stream to mimic with very long length (can be few megabytes) while it will have around 20 streams to be composed from, while each stream will be around kilobyte long.

I think you can speed up a greedy search a bit over the obvious. First of all, square each element in all of the streams involved. Then you are looking for a sum of squared streams that looks a lot like the squared target stream. Suppose that "it looks like" is the euclidean distance between the squared streams, considered as vectors.
Then we have (a-b)^2 = a^2 + b^2 - 2a.b. So if we can find the dot product of two vectors quickly, and we know their absolute size, we can find the distance quickly. But using the FFT and the http://en.wikipedia.org/wiki/Convolution_theorem, we can work out a.b_i where a is the target stream and b_i is stream b at some offset of i, by using the FFT to convolve a reversed version of b - for the cost of doing an FFT on a, an FFT on reversed b, and an FFT on the result, we get a.b_i for every offset i.
If we do a greedy search, the first step will be to find the b_i that makes (a-b_i)^2 smallest and subtract it from a. Then we are looking for a stream c_j that makes (a-b_i-c_j)^2 as small as possible. But this is a^2 + b_i^2 + c_j^2 - 2a.b_i - 2a.c_j + 2b_i.c_j and we have already calculated everything except b_i.c_j in the step above. If b and c are shorter streams it will be cheap to calculate b_i.c_j, and we can use the FFT as before.
So we have a not too horrible way to do a greedy search - at each stage subtract off the stream from the adjusted target stream so far that makes the residual smallest (considered as vectors in euclidean space), and carry on from there. At some stage we will find that none of the streams we have available make the residual any smaller. We can stop there, because our calculation above shows us that using two streams at once won't help either then - this follows because b_i.c_j >= 0, since each element of b_i is >= 0, because it is a square.
If you do a greedy search and are not satisfied, but have more cpu to burn, try Limited Discrepancy Search.

If I can use C#, LINQ & the Rx framework's System.Interactive extensions then this works:
First up - define a jagged array for the allowable arrays.
int[][] streams =
new []
{
new [] { 1, 2, 3, 4, },
new [] { 2, 4, 5, 6, 7, },
new [] { 1, 5, 6, },
};
Need an infinite iterator on integers to represent each step.
IEnumerable<int> steps =
EnumerableEx.Generate(0, x => true, x => x + 1, x => x);
Need a random number generator to randomly select which streams to add to each step.
var rnd = new Random();
In my LINQ query I've used these operators:
Scan^ - runs an accumulator function over a sequence producing an
output value for every input value
Where - filters the sequence based on the predicate
Empty - returns an empty sequence
Concat - concatenates two sequences
Skip - skips over the specified number of elements in a sequence
Any - returns true if the sequence contains any elements
Select - projects the sequence using a selector function
Sum - sums the values in the sequence
^ - from the Rx System.Interactive library
Now for the LINQ query that does all of the hard work.
IEnumerable<double> results =
steps
// Randomly select which streams to add to this step
.Scan(Enumerable.Empty<IEnumerable<int>>(), (xs, _) =>
streams.Where(st => rnd.NextDouble() > 0.8).ToArray())
// Create a list of "Heads" & "Tails" for each step
// Heads are the first elements of the current streams in the step
// Tails are the remaining elements to push forward to the next step
.Scan(new
{
Heads = Enumerable.Empty<int>(),
Tails = Enumerable.Empty<IEnumerable<int>>()
}, (acc, ss) => new
{
Heads = acc.Tails.Concat(ss)
.Select(s => s.First()),
Tails = acc.Tails.Concat(ss)
.Select(s => s.Skip(1)).Where(s => s.Any()),
})
// Keep the Heads only
.Select(x => x.Heads)
// Filter out any steps that didn't produce any values
.Where(x => x.Any())
// Calculate the square root of the sum of the squares
.Select(x => System.Math.Sqrt((double)x.Select(y => y * y).Sum()));
Nice lazy evaluation per step - scary though...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio