amCharts 4: How to handle real gaps? - amcharts

I have date based Data with real gaps in it, not only the value is missing, also the date is missing. In amCharts 3, graph.connect = false, the Date based chart produced gaps.
In amCharts 4 series.connect = false only works if at least one data point follows with only the date, but without the value.
Is it some how possible to produce gaps when the whole data point is missing?
The demos with gaps always have at least on data point for the gap.

To continue with David Liang's answer, you have to put connect=false and use autoGapCount. But be careful as it won't work with XYCharts!
Taken from amcharts:
The feature (available since version 4.2.4) responsible for that is called autoGapCount, and it works like this: if the distance between two adjacent data points is bigger than baseInterval * autoGapCount and connect = false the line breaks.
The default for this setting is 1.1 which means that if the distance is at least 10% greater than the granularity of data, we consider it a gap.
Looking at the sample data above, we have daily granularity: distance between each data point is one day (24 hours). Now, since April 22nd is missing the distance between 21st and 23rd becomes two days (48 hours) which satisfies "gap" criteria (24 hours * 1.1 = 26.4 hours). Boom - a gap.
This allows fine-grained control of what to consider a gap in the data. Say we want a line to break only if there's a distance of three days between data points: we set autoGapCount = 3. Now, if there's distance of one or two days between two data points, they'll be connected, even if connect = false

Related

Transposing two colums into rows of 4 each with a date sense checker - Google Sheets

I don't quite know how to word this.
I have a set of data which comes in two columns (essentially containers in storage or containers arriving soon)
I then take the two sets of data, and plug it into a "production schedule" using the below formula, only after I have added both unique columns into one workable column (seen in Column "O" or "Combined"). The formula which plots them into 4 workable slots per day is as follows:
=ArrayFormula(split(transpose(split(regexreplace(query(array_constrain(if(mod(row(O5:O), 4) = 2, "^", "|") & O5:O,max(row(O5:O) * (O5:O<> "")),1),,9^99), "^^",), "^", true, false)), "|", true, false))
For some unknown reason which I have yet to resolve, the above formula leaves a couple spaces instead of filling it with potential containers within the first couple of days, which I have yet to resolve.
The issue I have is that the above formula now plots for me each day I can work the containers, but it is not sense checking whether the containers will actually arrive in time for them to be worked in reality according to how the above formula is plotting them to be worked.
MY QUESTION:
How do I add to the above formula so that it also checks to make sure that the date it is automatically plotted in for "production" aligns with a 2 day logic test for the containers arrival. I.E. if the container is to arrive on the 10th September, logically it can only be worked from the 12th September, not on the 10th September or before the 10th September. If this check says it can only be worked later, then it should plot from the day it can be worked and then each container afterwards must be placed in a "production slot".
It also needs to take into consideration what is in storage first before plotting, but I think this is resolved with the unique formula I am using to adjust the plotting.
The sample sheet I am working off of:
https://docs.google.com/spreadsheets/d/1_niDFQtTcL9c3yROeh4qHmD_irBGbF066Pz2jEaQ674/edit?usp=sharing
Hope you can assist me here, this is a mammoth formula I am trying to plot to make a manual job substantially easier going forward.
Thanks in advance.

PowerBI - Displaying the average of row figures in a matrix

I've been Googling around this problem for hours and haven't found a solution that suits my needs.
I have a large data set with agent activities and the total time in seconds each activity lasts. I'm pulling this together in a matrix, to display agent names on the left and the start date of each week across the top like so:
This is working as intended (I've used a measure to convert the seconds into hours) but I need the average of the displayed weeks as another column, or to replace the Total column.
I've tried solutions involving DAX measures but none are applicable, likely because I'm using a custom column (WeekStart) to roll up my numbers into weeks. Adding more complexity is I have 2 filters on the matrix; one to exclude any weeks older that 5 weeks in the past and another to exclude any future weeks.
In Excel I'd just add another column next to the table, averaging the 5 cells to the left of it. I could add it to the data table with a SUMIFS checking the Activity date is within the week range and dividing the result by 5. I can't do either of these in PowerBI and I'm new to the software so I'm at a loss as to how to do this.

Statistics/Algorithm: How do I compare a weekly graph with its own history to see when in the past it was almost the same?

I’ve got a statistical/mathematical problem I’m stumped on and I was really hoping to get some help. I’m working on a research where I need to compare a weekly graph with its own history to see when in the past it was almost the same. Think of this as “finding the closest match”. The information is displayed as a line graph, but it’s readily available as raw data:
Date...................Result
08/10/18......52.5
08/07/18......60.2
08/06/18......58.5
08/05/18......55.4
08/04/18......55.2
and so on...
What I really want is the output to be a form of correlation between the current data points with the other set of 5 concurrent data points in history. So, something like:
Date range.....................Correlation
07/10/18-07/15/18....0.98
We’ll be getting a code written in Python for the software to do this automatically (so that as new data is added, it automatically runs and finds the closest set of numbers to match the current one).
Here’s where the difficulty sets in: Since numbers are on a general upward trend over time, we don’t want it to compare the absolute value (since the numbers might never really match). One suggestion has been to compare the delta (rate of change as a percentage over the previous day), or using a log scale.
I’m wondering: how do I go about this? What kind of calculation I can use to get the desired results? I’ve looked at the different kind of correlation equations, but they don’t account for the “shape” of the data, and they generally just average it out. The shape of the line chart is the important thing.
Thanks very much in advance!
I would simply divide the data of each week by their average (i.e., normalize them to an average of 1), then sum the squares of the differences of each day of each pair of weeks. This sum is what you want to minimize.
If you don't care about how much a graph oscillates relative to its mean, you can normalize also the variance. For each week, calculate mean and variance, then subtract the mean and divide by the root of the variance. Each week will have mean 0 and variance 1. Then minimize the sum of squares of differences like before.
If the normalization of data is all you can change in your workflow, just leave out the sum of squares of differences minimization part.

How to create value over time line chart in Kibana 4?

I'm facing a following problem. In Kibana 4 I've created a line chart based on my input from elasticeasrch but I can only display average, min, max instead of an actual value of the field per time, e.g. sent bytes.
Most answears to that question on stackoverflow are about Kibana 3 (How to create value over time chart with Kibana 3?) and seem to include a Histogram on a X axis, yet I can't seem to find one which will enable me to apply them to Kibana 4. I was unable to find the histogram panel and once I click on the discover tab there is the constant Searching loading.
If I have the following fields in my _source:
{"timestamp":"2015-06-02T10:16:44.0855","time":587,"threadName":"Thread Group 1-957","byte":1372,"status":"false","latence":306,"registerCall":"404"}
and I would like to have the number of bytes on the Y-axis and on the X-axis my timestamp.
Any help in the right direction will be appreciated :)
To create a value over time line chart in Kibana, follow these steps:
Go to visualize tab and select line chart
In the X-axis, select X-axis, Aggregation as Date Histogram and then select your timestamp field as the date field.
Next for the Y-Axis, select Sum as the aggregation and then bytes as the field.
For the X axis, what Alcanzar said is good, but as you notice, the Y axis is problematic.
Sum (suggested by "Limit") works, but since it's aggregated, it shows the total used in each aggregated bucket, but that may be meaningless depending on what you are trying to show. Your question isn't clear on what you want, so I'm just guessing here. One hour of requests, each of which ran for one minute and sent 1 megabyte is indeed 60 megabytes-minutes, if you are trying to show total capacity used over than hour (maybe you are paying a bill based on usage per time). On the other hand, if you are trying to show peak usage in each time, it would be wrong.
You said you already looked and Max and Min and they don't meet your needs. I don't suppose Standard Deviation would be any better?
I have the same concern. The best I've been able to do so far is
display Min and Max simultaneously in the Y axis. When they diverge, I know I'm zoomed out too far, so I zoom in until they align.
This is how I know I'm seeing individual events.
In any case, I share your frustration. I too would like to be able to show time series as easily as I can in, say, Excel.

How to handle recurring times?

First off, I marked this question as language agnostic, but I'm using PHP and MySQL. It shouldn't affect the question itself very much tho.
I'm creating an application which shows times of certain shows throughout the week. Every single show is recurring (on weekly basis) and there might be shows which are airing through 2 days - eg. starting on Sunday at 23:30, ending on Monday at 00:30. I'm storing start of the show (day of the week - Monday, Tuesday... - it's never exact date; time) and duration. There are never shows that would take more than 24 hours.
My problem is with validation if newly added shows aren't overlapping some old ones. Especially if it comes to Sunday-Monday shows.
How are such recurring events usually handled on both DB side and server side?
tl;dr version with stuff I considered
My first idea was to create some custom validation algorithm, but it seemed too cumbersome and complicated. Not that I'd whine about complicated hand-made solutions, but I'm interested if there isn't something more basic that I'm missing.
Other alternative that came to mind was to change table structure to use datetime (instead of "day of week" and "time"), and use a fake fixed date range to store the data. For example all Mondays would be set to 5th Jan 1970, Sundays would use 11th Jan 1970. There would be one exception to this rule - if there would be some show which starts on Sunday and ends on Monday, it would be stored as 12th Jan 1970. This solution would allow more flexible quering of the DB than the original one, and it would also simplify queries for shows which overlap between individual weeks (since we can do the comparison directly in the query). There are some disadvantages to this solution as well (for one, using fake dates might make it confusing).
Both solutions smell of wrong algorithms to me and would love to hear some opinions from more experienced fellow developers.
Sounds like you could just store the starting minute of each show as an integer number of minutes since the start of the week (10,080 possible values).
Then a show starting at minute $a with duration $dur_a will overlap $b if and only if
(10080 + $b - $a) % 10080 < $dur_a
For example consider a show starting at 11pm Sunday and another starting at 12.30am Monday. Here $a == 10020 and $dur_a == 120 and $b == 30. (10080 + $b - $a) % 10080 == 90. This is less than $dur_a and hence the shows overlap.
This problem could be simplified by converting the data into a format that is amenable to the calculations that are required. I recommend creating a type that represents the start times as the number of minutes from Sunday at midnight. Then simple integer range comparisons could be used to find overlapping shows.
The internal representation must, of course, be hidden and abstracted. You may, at some point, want to change the representation from minutes to seconds, for example.
I would opt for a custom validation algorithm:
For each show, compute all showing intervals [start1, end1], [start2, end2], ... [startN, endN], where N is the number of recurrence of the show.
For a new show, also compute these intervals.
Now check if any of these new intervals intersect any old intervals. This is the case if the start or the end of one interval is contained in the other.

Resources