D3.js Time series graph with epoch time - d3.js

I have data captured every 5 minutes, data would be in pairs of a unix epoch timestamp and a value. I would like to render a bar chart with this data. Can this be easily accomplished in D3.js or is there a better tool for the job? Any examples? Ideally I'd like to have it refresh as additional data points come in.

Maybe EpochJS is worth looking at? They have examples on their website of different realtime charts...
http://epochjs.github.io/epoch/

Related

Historical data on Google Maps Distance Matrix API

Can i get travel time data from google maps distance matrix API on the past dates, such as November 2017, using driver modes ? And if it possible, how can i do it ?
Thank you so much
I've been trying to but it looks like there is a time threshold that after you just get Zero_Results response.
I tried some dates in 2017 and even 2 weeks back with no luck.
you can try yourself finding the date that you want in epoch time [1] and and fill this request with the time and your API key.
https://maps.googleapis.com/maps/api/distancematrix/json?origins=San%20Francisco,%20CA&destinations=San%20Jose,%20CA&mode=transit&departure_time=(epoch_time)&key=(your_api_key)
[1]https://www.epochconverter.com/
I am testing and I believe the limit for historical data is one week only for transit mode.

Reading Jennifer5 Monitor

I am using Jennifer5 to monitor my webservices, but I am confused about the information on the monitor. I have attached an image, and if you see the circled part of the graphs, they are showing future time for current day, and does have some data, is this data an average of the past data, or some algorithm used on past data to predict the possible future data? I cannot say what those values exactly are.
It was the data of the previous day, as described for one of the charts in the manual of Jennifer5;

K-Means on time series data with Apache Spark

I have a data pipeline system where all events are stored in Apache Kafka. There is an event processing layer, which consumes and transforms that data (time series) and then stores the resulting data set into Apache Cassandra.
Now I want to use Apache Spark in order train some machine learning models for anomaly detection. The idea is to run the k-means algorithm on the past data for example for every single hour in a day.
For example, I can select all events from 4pm-5pm and build a model for that interval. If I apply this approach, I will get exactly 24 models (centroids for every single hour).
If the algorithm performs well, I can reduce the size of my interval to be for example 5 minutes.
Is it a good approach to do anomaly detection on time series data?
I have to say that strategy is good to find the Outliers but you need to take care of few steps. First, using all events of every 5 minutes to create a new Centroid for event. I think tahat could be not a good idea.
Because using too many centroids you can make really hard to find the Outliers, and that is what you don't want.
So let's see a good strategy:
Find a good number of K for your K-means.
That is reall important for that, if you have too many or too few you can take a bad representation of the reality. So select a good K
Take a good Training set
So, you don't need to use all the data to create a model every time and every day. You should take a example of what is your normal. You don't need to take what is not your normal because this is what you want to find. So use this to create your model and then find the Clusters.
Test it!
You need to test if it is working fine or not. Do you have any example of what you see that is strange? And you have a set that you now that is not strange. Take this an check if it is working or not. To help with it you can use Cross Validation
So, your Idea is good? Yes! It works, but make sure to not do over working in the cluster. And of course you can take your data sets of every day to train even more your model. But make this process of find the centroids once a day. And let the Euclidian distance method find what is or not in your groups.
I hope that I helped you!

Predicting Data Set

In mathematica I am have 10 data sets. I am trying to figure out how to predict future outcomes based on this data. The data almost follows a normal distribution. I want to predict what the average curve looks like based on the data I have. Is there any way to do this?
See the documentation for
https://reference.wolfram.com/language/howto/GetResultsForFittedModels.html
or for a time series
https://reference.wolfram.com/language/ref/TimeSeriesModelFit.html

d3.js not being able to visualiza a large dataset

I need some suggestions on using d3.js for visualizing big data. I am pulling data from hbase and storing in a json file for visualizing using d3.js. When I pull the data of few hours the size of json file is around 100MB and can be easily visualized by d3.js but the filtering using dc.js and crossfilter is little slow. But when I pull the dataset of 1 week the json file size becomes more than 1GB and try to visualize using d3.js, dc.js and crossfilter then the visualization is not working properly and the filtering is also not possible. Can anyone give me any idea whether there is a good solution to this or I need to work on different platform instead of d3?
I definitely agree with what both Mark and Gordon have said before. But I must add what I have learnt in the past months as I scaled up a dc.js dashboard to deal with pretty big datasets.
One bottleneck is, as pointed out, the size of your datasets when it translates into thousands of SVG/DOM or Canvas elements. Canvas is lighter on the browser, but you still have a huge amount of elements in memory, each with their attributes, click events, etc.
The second bottleneck is the complexity of your data. The responsiveness of dc.js depends not only on d3.js, but also on crossfilter.js. If you inspect the Crossfilter example dashboard, you will see that the size of the data they use is quite impressive: over 230000 entries. However, the complexity of those data is rather low: just five variables per entry. Keeping your datasets simple helps scaling up a lot. Keep in mind that five variables per each entry here means about one million values in the browser's memory during visualization.
Final point, you mention that you pull the data in JSON format. While that is very handy in Javascript, parsing and validating big JSON files is quite demanding. Besides, it is not the most compact format. The Crossfilter example data are formatted as a really simple and tight CSV file.
In summary, you will have to find the sweet spot between size and complexity of your data. One million data values (size times complexity) is perfectly feasible. Increase that by one order of magnitude and your application might still be usable.
As #Mark says, canvas versus DOM rendering is one thing to consider. For sure the biggest expense in Web visualization is DOM elements.
However, to some extent crossfilter can mitigate this by aggregating the data into a smaller number of visual elements. It can get you up into the hundreds of thousands of rows of data. 1GB might be pushing it, but 100s of megabytes is possible.
But you do need to be aware of what level you are aggregating at. So, for example, if it's a week of time series data, probably bucketing by the hour is a reasonable visualization, for 7*24 = 168 points. You won't actually be able to perceive many more points, so it is pointless asking the browser to draw thousands of elements.

Resources