Estimate the storage capacity of a 20-minute-vinyl in bits, but how? - byte

Estimate the storage capacity of a 20-minute-vinyl in bits.
We should estimate since it depends on a lot of factors but I have no clue how to approximate/estimate the amount of bits that can be stored on a 20-minutes-vinyl. How would you estimate the storage capacity?
The exercise is more about reasoning and less about how many bits a 20-minutes-vinyl can actually store.

Assuming you're talking about the old LP (or SP) records, the basic idea is this:
It was a single groove cut into the vinyl spiralling towards the center, which is how the stylus followed it. Hence the storage capacity needs to take into account:
The length of the groove (which can be calculated by the diameter of the record's playing surface and the distance between points on adjacent groove sections;
The number of "digits" that can be stored per distance in the groove; and
The number of values each digit can have (i.e., distinguishable groove depths).
So, for example (you will need to figure out your own figures), let's say there are two distinguishable depths in the groove (shallow and deep). Further assume that 1,000 "digits" can be stored per linear meter, and that the total groove length is 40m.
The storage capacity would then be 40,000 bits.
Another approach where you have only the time and cannot easily get the groove length, is to use different figures.
For example, assume that decent music will have different possible frequency/volume pairs (at 8 bits each) and that you need sixty of these per second to get a decent sound quality. That would be 16 * 60 = 960 bits per second, which would give you a capacity of 1,152,000 bits for a 20-minute duration.
Just remember those are example values, the actual values may be far different in reality. But, since you've stated the intent here is just to demonstrate estimation skills, that should be okay.

Related

Get all possible valid positions of ships in battleship game

I'm creating probability assistant for Battleship game - in essence, for given game state (field state and available ships), it would produce field where all free cells will have probability of hit.
My current approach is to do a monte-carlo like computation - get random free cell, get random ship, get random ship rotation, check if this placement is valid, if so continue with next ship from available set. If available set is empty, add how the ships were set to output stack. Redo this multiple times, use outputs to compute probability of each cell.
Is there sane algorithm to process all possible ship placements for given field state?
An exact solution is possible. But does not qualify as sane in my books.
Still, here is the idea.
There are many variants of the game, but let's say that we start with a worst case scenario of 1 ship of size 5, 2 of size 4, 3 of size 3 and 4 of size 2.
The "discovered state" of the board is all spots where shots have been taken, or ships have been discovered, plus the number of remaining ships. The discovered state naively requires 100 bits for the board (10x10, any can be shot) plus 1 bit for the count of remaining ships of size 5, 2 bits for the remaining ships of size 4, 2 bits for remaining ships of size 3 and 3 bits for remaining ships of size 2. This makes 108 bits, which fits in 14 bytes.
Now conceptually the idea is to figure out the map by shooting each square in turn in the first row, the second row, and so on, and recording the game state along with transitions. We can record the forward transitions and counts to find how many ways there are to get to any state.
Then find the end state of everything finished and all ships used and walk the transitions backwards to find how many ways there are to get from any state to the end state.
Now walk the data structure forward, knowing the probability of arriving at any state while on the way to the end, but this time we can figure out the probability of each way of finding a ship on each square as we go forward. Sum those and we have our probability heatmap.
Is this doable? In memory, no. In a distributed system it might be though.
Remember that I said that recording a state took 14 bytes? Adding a count to that takes another 8 bytes which takes us to 22 bytes. Adding the reverse count takes us to 30 bytes. My back of the envelope estimate is that at any point in our path there are on the order of a half-billion states we might be in with various ships left, killed ships sticking out and so on. That's 15 GB of data. Potentially for each of 100 squares. Which is 1.5 terabytes of data. Which we have to process in 3 passes.

How do I efficiently find the fastest segments from a sequence of distances and times?

My input is a gpx-file containing a sequence of timestamped positions, like the one you'd get if you go for a run with a GPS and tell it to record your track.
The timestamped positions are not necessarily equal in distance from each other, or equal in time-delta between each other.
Given this input, I want to efficiently find the highest speed the gpx-file indicates for all different distances.
Example:
12:00:00 start
12:00:05 moved 100m
12:00:15 moved 100m
12:00:35 moved 200m
In this example the correct answer is:
20.0 m/s at 100m
13.3 m/s at 200m
11.4 m/s at 400m
What is a good algorithm for (preferably reasonably efficiently) calculating this?
Clarification: I'm not looking solely for the fastest segment, that's trivial. I'm looking for the fastest speed represented by the track for ALL distances up to the length of the track in sum total.
If someone uploaded a gpx-track of a marathon they ran, I'd want to know the fastest 100m they ran in that marathon, the fastest 200m, the fastest 300m and so on.
Let's say you have a gpx track for a 1,500 meter run, and you want to do this. So you want the fastest 100, 200, 300, 400, ... 1,500 meters. There are:
15 100-meter segments
14 200-meter segments
13 300-meter segments
12 400-meter segments
...
2 1,400-meter segments
1 1,500-meter segment
That works out to 15+14+13+12+...2+1 = (15^2-15)/2, or 105 different segments to examine if you want to calculate the 15 different distances.
You can do this in a single pass of the array. Simply initialize an array that contains the current running total and maximum speed for each of the distances you're interested in. As you read each segment, you subtract the value for the oldest segment, add the new segment value, recompute the average speed, and update the max speed if appropriate.
The algorithm will require you to look at (n^2-n)/2 individual splits. Regardless of how you do it, you have to look at every possible split for every distance you want to compute. You have n data points and you're trying to determine n different best split times. That's O(n^2) any way you slice it.
But the amount of data you're talking about isn't huge, certainly not by today's standards. A marathon is only 42,165 meters. You'll need an array of 422 distances if you want 100-meter resolution. And your code will do on the order of 178,084 calculations. That's quite doable with even a low-end computer these days.
As for the data, I would recommend either pre-processing the .gpx file to generate a stream of data points that are exactly 100 meters. You can do that separately, or you can do it as part of reading the data in while you're computing the splits. It's not difficult, and it'll make the rest of your code much easier to work with.

if a Bitcoin mining nounce is just 32 bits long how come is it increasingly difficult to find the winning hash?

I'm learning about mining and the first thing that surprised me is that the nounce part of the algorithm which is supposed to be randomly looped until you get a number smaller than the target hash .. is just 32 bits long.
Can you explain why then is it so difficult to loop an unsigned int and how come is it increasingly difficult over time? Thank you.
The task is: try different nonce values in your potential block until you reach a block having a hash value below some given threshold.
I can't find the source right now, but I'm quite sure that since the introduction of special mining ASICs the 32-bit nonce is no longer enough to keep the miners busy for the planned 10 minutes interval between blocks. They are able to compute 4 billion block hashes in less than 10 minutes.
Increasing the difficulty didn't help anymore, as that reached the point where none of the 4 billion possible nonce values gave a hash below the threshold.
So they found some additional fields in the block that are now used as nonce-extension. The principle is still the same: try different values until you reach a block with a hash below the threshold, only now it's more than 32 bits that can be varied, allowing for the threshold to be lowered beyond the former 32-bit-implied barrier.
Because it's not just the 32bit nonce that is involved in the calculation. The 1MB of transaction data is also part of the mining input. There is then a non-trivial amount of arithmetic to arrive at the output, which then can be compared with the target.
Bitcoin mining is looping over all 4billion uints until you find a "right" one.
The way that difficulty is increased, is that only some of the bits of the output matter. E.g. early on the lowest 11 bits had to be some specific pattern, the remaining 21bits could be anything. In theory there would be 2million "right" values for each transaction block, uniformly distributed across the range of a uint. Then the "difficulty" is increased so that 13 bits have to be some pattern, so now there are 4x fewer "right" answers, so it takes (on average) 4x longer to find one.

How to work out the complexity of the game 2048?

Edit: This question is not a duplicate of What is the optimal algorithm for the game 2048?
That question asks 'what is the best way to win the game?'
This question asks 'how can we work out the complexity of the game?'
They are completely different questions. I'm not interested in which steps are required to move towards a 'win' state - I'm interested in in finding out whether the total number of possible steps can be calculated.
I've been reading this question about the game 2048 which discusses strategies for creating an algorithm that will perform well playing the game.
The accepted answer mentions that:
the game is a discrete state space, perfect information, turn-based game like chess
which got me thinking about its complexity. For deterministic games like chess, its possible (in theory) to work out all the possible moves that lead to a win state and work backwards, selecting the best moves that keep leading towards that outcome. I know this leads to a large number of possible moves (something in the range of the number of atoms in the universe).. but is 2048 more or less complex?
Psudocode:
for the current arrangement of tiles
- work out the possible moves
- work out what the board will look like if the program adds a 2 to the board
- work out what the board will look like if the program adds a 4 to the board
- move on to working out the possible moves for the new state
At this point I'm thinking I will be here a while waiting on this to run...
So my question is - how would I begin to write this algorithm - what strategy is best for calculating the complexity of the game?
The big difference I see between 2048 and chess is that the program can select randomly between 2 and 4 when adding new tiles - which seems add a massive number of additional possible moves.
Ultimately I'd like the program to output a single figure showing the number of possible permutations in the game. Is this possible?!
Let's determine how many possible board configurations there are.
Each tile can be either empty, or contain a 2, 4, 8, ..., 512 or 1024 tile.
That's 12 possibilities per tile. There are 16 tiles, so we get 1612 = 248 possible board states - and this most likely includes a few unreachable ones.
Assuming we could store all of these in memory, we could work backwards from all board states that would generate a 2048 tile in the next move, doing a constant amount of work to link reachable board states to each other, which should give us a probabilistic best move for each state.
To store all bits in memory, let's say we'd need 4 bits per tile, i.e. 64 bits = 8 bytes per board state.
248 board states would then require 8*248 = 2251799813685248 bytes = 2048 TB (not to mention added overhead to keep track of the best boards). That's a bit beyond what a desktop computer these days has, although it might be possible to cleverly limit the number of boards required at any given time as to get down to something that will fit on, say, a 3 TB hard drive, or perhaps even in RAM.
For reference, chess has an upper bound of 2155 possible positions.
If we were to actually calculate, from the start, every possible move (in a breadth-first search-like manner), we'd get a massive number.
This isn't the exact number, but rather a rough estimate of the upper bound.
Let's make a few assumptions: (which definitely aren't always true, but, for the sake of simplicity)
There are always 15 open squares
You always have 4 moves (left, right, up, down)
Once the total sum of all tiles on the board reaches 2048, it will take the minimum number of combinations to get a single 2048 (so, if placing a 2 makes the sum 2048, the combinations will be 2 -> 4 -> 8 -> 16 -> ... -> 2048, i.e. taking 10 moves)
A 2 will always get placed, never a 4 - the algorithm won't assume this, but, for the sake of calculating the upper bound, we will.
We won't consider the fact that there may be duplicate boards generated during this process.
To reach 2048, there needs to be 2048 / 2 = 1024 tiles placed.
You start with 2 randomly placed tiles, then repeatedly make a move and another tile gets placed, so there's about 1022 'turns' (a turn consisting of making a move and a tile getting placed) until we get a sum of 2048, then there's another 10 turns to get a 2048 tile.
In each turn, we have 4 moves, and there can be one of two tiles placed in one of 15 positions (30 possibilities), so that's 4*30 = 120 possibilities.
This would, in total, give us 1201032 possible states.
If we instead assume a 4 will always get placed, we get 120519 states.
Calculating the exact number will likely involve working our way through all these states, which won't really be viable.

Percentiles of Live Data Capture

I am looking for an algorithm that determines percentiles for live data capture.
For example, consider the development of a server application.
The server might have response times as follows:
17 ms
33 ms
52 ms
60 ms
55 ms
etc.
It is useful to report the 90th percentile response time, 80th percentile response time, etc.
The naive algorithm is to insert each response time into a list. When statistics are requested, sort the list and get the values at the proper positions.
Memory usages scales linearly with the number of requests.
Is there an algorithm that yields "approximate" percentile statistics given limited memory usage? For example, let's say I want to solve this problem in a way that I process millions of requests but only want to use say one kilobyte of memory for percentile tracking (discarding the tracking for old requests is not an option since the percentiles are supposed to be for all requests).
Also require that there is no a priori knowledge of the distribution. For example, I do not want to specify any ranges of buckets ahead of time.
If you want to keep the memory usage constant as you get more and more data, then you're going to have to resample that data somehow. That implies that you must apply some sort of rebinning scheme. You can wait until you acquire a certain amount of raw inputs before beginning the rebinning, but you cannot avoid it entirely.
So your question is really asking "what's the best way of dynamically binning my data"? There are lots of approaches, but if you want to minimise your assumptions about the range or distribution of values you may receive, then a simple approach is to average over buckets of fixed size k, with logarithmically distributed widths. For example, lets say you want to hold 1000 values in memory at any one time. Pick a size for k, say 100. Pick your minimum resolution, say 1ms. Then
The first bucket deals with values between 0-1ms (width=1ms)
Second bucket: 1-3ms (w=2ms)
Third bucket: 3-7ms (w=4ms)
Fourth bucket: 7-15ms (w=8ms)
...
Tenth bucket: 511-1023ms (w=512ms)
This type of log-scaled approach is similar to the chunking systems used in hash table algorithms, used by some filesystems and memory allocation algorithms. It works well when your data has a large dynamic range.
As new values come in, you can choose how you want to resample, depending on your requirements. For example, you could track a moving average, use a first-in-first-out, or some other more sophisticated method. See the Kademlia algorithm for one approach (used by Bittorrent).
Ultimately, rebinning must lose you some information. Your choices regarding the binning will determine the specifics of what information is lost. Another way of saying this is that the constant size memory store implies a trade-off between dynamic range and the sampling fidelity; how you make that trade-off is up to you, but like any sampling problem, there's no getting around this basic fact.
If you're really interested in the pros and cons, then no answer on this forum can hope to be sufficient. You should look into sampling theory. There's a huge amount of research on this topic available.
For what it's worth, I suspect that your server times will have a relatively small dynamic range, so a more relaxed scaling to allow higher sampling of common values may provide more accurate results.
Edit: To answer your comment, here's an example of a simple binning algorithm.
You store 1000 values, in 10 bins. Each bin therefore holds 100 values. Assume each bin is implemented as a dynamic array (a 'list', in Perl or Python terms).
When a new value comes in:
Determine which bin it should be stored in, based on the bin limits you've chosen.
If the bin is not full, append the value to the bin list.
If the bin is full, remove the value at the top of the bin list, and append the new value to the bottom of the bin list. This means old values are thrown away over time.
To find the 90th percentile, sort bin 10. The 90th percentile is the first value in the sorted list (element 900/1000).
If you don't like throwing away old values, then you can implement some alternative scheme to use instead. For example, when a bin becomes full (reaches 100 values, in my example), you could take the average of the oldest 50 elements (i.e. the first 50 in the list), discard those elements, and then append the new average element to the bin, leaving you with a bin of 51 elements that now has space to hold 49 new values. This is a simple example of rebinning.
Another example of rebinning is downsampling; throwing away every 5th value in a sorted list, for example.
I hope this concrete example helps. The key point to take away is that there are lots of ways of achieving a constant memory aging algorithm; only you can decide what is satisfactory given your requirements.
I've once published a blog post on this topic. The blog is now defunct but the article is included in full below.
The basic idea is to reduce the requirement for an exact calculation in favor of "95% percent of responses take 500ms-600ms or less" (for all exact percentiles of 500ms-600ms).
As we’ve recently started feeling that response times of one of our webapps got worse, we decided to spend some time tweaking the apps’ performance. As a first step, we wanted to get a thorough understanding of current response times. For performance evaluations, using minimum, maximum or average response times is a bad idea: “The ‘average’ is the evil of performance optimization and often as helpful as ‘average patient temperature in the hospital'” (MySQL Performance Blog). Instead, performance tuners should be looking at the percentile: “A percentile is the value of a variable below which a certain percent of observations fall” (Wikipedia). In other words: the 95th percentile is the time in which 95% of requests finished. Therefore, a performance goals related to the percentile could be similar to “The 95th percentile should be lower than 800 ms”. Setting such performance goals is one thing, but efficiently tracking them for a live system is another one.
I’ve spent quite some time looking for existing implementations of percentile calculations (e.g. here or here). All of them required storing response times for each and every request and calculate the percentile on demand or adding new response times in order. This was not what I wanted. I was hoping for a solution that would allow memory and CPU efficient live statistics for hundreds of thousands of requests. Storing response times for hundreds of thousands of requests and calculating the percentile on demand does neither sound CPU nor memory efficient.
Such a solution as I was hoping for simply seems not to exist. On second thought, I came up with another idea: For the type of performance evaluation I was looking for, it’s not necessary to get the exact percentile. An approximate answer like “the 95th percentile is between 850ms and 900ms” would totally suffice. Lowering the requirements this way makes an implementation extremely easy, especially if upper and lower borders for the possible results are known. For example, I’m not interested in response times higher than several seconds – they are extremely bad anyway, regardless of being 10 seconds or 15 seconds.
So here is the idea behind the implementation:
Define any random number of response time buckets (e.g. 0-100ms, 100-200ms, 200-400ms, 400-800ms, 800-1200ms, …)
Count number of responses and number of response each bucket (For a response time of 360ms, increment the counter for the 200ms – 400ms bucket)
Estimate the n-th percentile by summing counter for buckets until the sum exceeds n percent of the total
It’s that simple. And here is the code.
Some highlights:
public void increment(final int millis) {
final int i = index(millis);
if (i < _limits.length) {
_counts[i]++;
}
_total++;
}
public int estimatePercentile(final double percentile) {
if (percentile < 0.0 || percentile > 100.0) {
throw new IllegalArgumentException("percentile must be between 0.0 and 100.0, was " + percentile);
}
for (final Percentile p : this) {
if (percentile - p.getPercentage() <= 0.0001) {
return p.getLimit();
}
}
return Integer.MAX_VALUE;
}
This approach only requires two int values (= 8 byte) per bucket, allowing to track 128 buckets with 1K of memory. More than sufficient for analysing response times of a web application using a granularity of 50ms). Additionally, for the sake of performance, I’ve intentionally implemented this without any synchronization(e.g. using AtomicIntegers), knowing that some increments might get lost.
By the way, using Google Charts and 60 percentile counters, I was able to create a nice graph out of one hour of collected response times:
I believe there are many good approximate algorithms for this problem. A good first-cut approach is to simply use a fixed-size array (say 1K worth of data). Fix some probability p. For each request, with probability p, write its response time into the array (replacing the oldest time in there). Since the array is a subsampling of the live stream and since subsampling preserves the distribution, doing the statistics on that array will give you an approximation of the statistics of the full, live stream.
This approach has several advantages: it requires no a-priori information, and it's easy to code. You can build it quickly and experimentally determine, for your particular server, at what point growing the buffer has only a negligible effect on the answer. That is the point where the approximation is sufficiently precise.
If you find that you need too much memory to give you statistics that are precise enough, then you'll have to dig further. Good keywords are: "stream computing", "stream statistics", and of course "percentiles". You can also try "ire and curses"'s approach.
(It's been quite some time since this question was asked, but I'd like to point out a few related research papers)
There has been a significant amount of research on approximate percentiles of data streams in the past few years. A few interesting papers with full algorithm definitions:
A fast algorithm for approximate quantiles in high speed data streams
Space-and time-efficient deterministic algorithms for biased quantiles over data streams
Effective computation of biased quantiles over data streams
All of these papers propose algorithms with sub-linear space complexity for the computation of approximate percentiles over a data stream.
Try the simple algorithm defined in the paper “Sequential Procedure for Simultaneous Estimation of Several Percentiles” (Raatikainen). It’s fast, requires 2*m+3 markers (for m percentiles) and tends to an accurate approximation quickly.
Use a dynamic array T[] of large integers or something where T[n] counts the numer of times the response time was n milliseconds. If you really are doing statistics on a server application then possibly 250 ms response times are your absolute limit anyway. So your 1 KB holds one 32 bits integer for every ms between 0 and 250, and you have some room to spare for an overflow bin.
If you want something with more bins, go with 8 bit numbers for 1000 bins, and the moment a counter would overflow (i.e. 256th request at that response time) you shift the bits in all bins down by 1. (effectively halving the value in all bins). This means you disregard all bins that capture less than 1/127th of the delays that the most visited bin catches.
If you really, really need a set of specific bins I'd suggest using the first day of requests to come up with a reasonable fixed set of bins. Anything dynamic would be quite dangerous in a live, performance sensitive application. If you choose that path you'd better know what your doing, or one day you're going to get called out of bed to explain why your statistics tracker is suddenly eating 90% CPU and 75% memory on the production server.
As for additional statistics: For mean and variance there are some nice recursive algorithms that take up very little memory. These two statistics can be usefull enough in themselves for a lot of distributions because the central limit theorem states that distributions that that arise from a sufficiently large number of independent variables approach the normal distribution (which is fully defined by mean and variance) you can use one of the normality tests on the last N (where N sufficiently large but constrained by your memory requirements) to monitor wether the assumption of normality still holds.
#thkala started off with some literature citations. Let me extend that.
Implementations
T-digest from the 2019 Dunning paper has reference implementation in Java, and ports on that page to Python, Go, Javascript, C++, Scala, C, Clojure, C#, Kotlin, and C++ port by facebook, and a further rust port of that C++ port
Spark implements "GK01" from the 2001 Greenwald/Khanna paper for approximate quantiles
Beam: org.apache.beam.sdk.transforms.ApproximateQuantiles has approximate quantiles
Java: Guava:com.google.common.math.Quantiles implements exact quantiles, thus taking more memory
Rust: quantiles crate has implementations for the 2001 GK algorithm "GK01", and the 2005 CKMS algorithm. (caution: I found the CKMS implementation slow - issue)
C++: boost quantiles has some code, but I didn't understand it.
I did some profiling of the options in Rust [link] for up to 100M items, and found GK01 the best, T-digest the second, and "keep 1% top values in priority queue" the third.
Literature
2001: Space-efficient online computation of quantile summaries (by Greenwald, Khanna). Implemented in Rust: quantiles::greenwald_khanna.
2004: Medians and beyond: new aggregation techniques for sensor networks (by Shrivastava, Buragohain, Agrawal, Suri). Introduces "q-digests", used for fixed-universe data.
2005: Effective computation of biased quantiles over data streams (by Cormode, Korn, Muthukrishnan, Srivastava)... Implemented in Rust: quantiles::ckms which notes that the IEEE presentation is correct but the self-published one has flaws. With carefully crafted data, space can grow linearly with input size. "Biased" means it focuses on P90/P95/P99 rather than all the percentiles).
2006: Space-and time-efficient deterministic algorithms for biased quantiles over data streams (by Cormode, Korn, Muthukrishnan, Srivastava)... improved space bound over 2005 paper
2007: A fast algorithm for approximate quantiles in high speed data streams (by Zhang, Wang). Claims 60-300x speedup over GK. The 2020 literature review below says this has state-of-the-art space upper bound.
2019 Computing extremely accurate quantiles using t-digests (by Dunning, Ertl). Introduces t-digests, O(log n) space, O(1) updates, O(1) final calculation. It's neat feature is you can build partial digests (e.g. one per day) and merge them into months, then merge months into years. This is what the big query engines use.
2020 A survey of approximate quantile computation on large-scale data (technical report) (by Chen, Zhang).
2021 The t-digest: Efficient estimates of distributions - an approachable wrap-up paper about t-digests.
Cheap hack for P99 of <10M values: just store top 1% in a priority queue!
This'll sound stupid, but if I want to calculate the P99 of 10M float64s, I just created a priority queue with 100k float32s (takes 400kB). This takes only 4x as much space as "GK01" and is much faster. For 5M or fewer items, it takes less space than GK01!!
struct TopValues {
values: std::collections::BinaryHeap<std::cmp::Reverse<ordered_float::NotNan<f32>>>,
}
impl TopValues {
fn new(count: usize) -> Self {
let capacity = std::cmp::max(count / 100, 1);
let values = std::collections::BinaryHeap::with_capacity(capacity);
TopValues { values }
}
fn render(&mut self) -> String {
let p99 = self.values.peek().unwrap().0;
let max = self.values.drain().min().unwrap().0;
format!("TopValues, p99={:.4}, max={:.4}", p99, max)
}
fn insert(&mut self, value: f64) {
let value = value as f32;
let value = std::cmp::Reverse(unsafe { ordered_float::NotNan::new_unchecked(value) });
if self.values.len() < self.values.capacity() {
self.values.push(value);
} else if self.values.peek().unwrap().0 < value.0 {
self.values.pop();
self.values.push(value);
} else {
}
}
}
You can try the following structure:
Take on input n, ie. n = 100.
We'll keep an array of ranges [min, max] sorted by min with count.
Insertion of value x – binary search for min range for x. If not found take preceeding range (where min < x). If value belongs to range (x <= max) increment count. Otherwise insert new range with [min = x, max = x, count = 1].
If number of ranges hits 2*n – collapse/merge array into n (half) by taking min from odd and max from even entries, summing their count.
To get ie. p95 walk from the end summing the count until next addition would hit threshold sum >= 95%, take p95 = min + (max - min) * partial.
It will settle on dynamic ranges of measurements. n can be modified to trade accuracy for memory (to lesser extent cpu). If you make values more discrete, ie. by rounding to 0.01 before insertion – it'll stabilise on ranges sooner.
You could improve accuracy by not assuming that each range holds uniformly distributed entries, ie. something cheap like sum of values which will give you avg = sum / count, it would help to read closer p95 value from range where it sits.
You can also rotate them, ie. after m = 1 000 000 entries start filling new array and take p95 as weighted sum on count in array (if array B has 10% of count of A, then it contributes 10% to p95 value).

Resources