I have data being written to Elasticsearch that I wanted to visualize in Kibana, but I'm having problems with the visualization.
I have a process writing when it starts {ProcessStartTime} and when it stops {ProcessStopTime}
I'm trying to create what I thought was a simple visualization:
A vertical bar chart with Count as the Y-Axis and {ProcessStartTime} and {ProcessStopTime} as bars on the X-Axis.
The problem is, instead of count of 480 for the {ProcessStartTime} as one vertical bar and a count for 389 for {ProcessStopTime} as another vertical bar. It separates out all unique {ProcessStartTime} entires so I have a count of 1 with a thousand vertical bars. Moreover, I appears I cannot add more than one term, just sub categories, so {ProcessStopTime} isn't on the bar chart at all. So I decided to try the Filter aggregation, which allowed me to get a count of all entries with "ProcessStartTime" in the body. However, I cannot add "ProcessStopTime" as another filter as those don't coexist.
My current solution is to have two charts, using the Filter aggregation, then compare the charts side-by-side to compare the counts. For obvious reasons, I'd like those combined, but I just don't see how to have two X-Axis buckets, or to group the data as it needs to be.
I am missing something obvious?
I might get wrong what you are trying to do and I can't comment on your question to ask for details, but here are a few things that you can do:
Get all entries regardless of their content (empty search query). Keep the Y-axis metrics for Aggregation-Count.
After that you can set a bucket for the X-axis with Filters aggregation, and use 2 filters.
Filter 1: ProcessStartTime: *
Filter 2: ProcessStopTime: *
This setup should give you 2 bars with the count of records that have the given attributes.
The other option is to make a new attribute, for example 'event', and give this attribute the values 'ProcessStartTime' and 'ProcessStopTime', and make a Terms aggregation bucket setup on event.keyword.
I hope this helps.
Related
dc.js has been great, and now I'm trying to understand how to use it for data with multiple dimensions.
I have time series data (csv), which contains the number of people that fit a certain attribute on a given day - e.g. the number of brown-haired people age 65+. A simplified version of it looks like this (There are 5 options for hair color, 5 for age group, and about 200 dates):
Date, Hair Color, 0-18, 19-39, 40-64, 65+
1/1/21, Brown, 5, 3, 10, 2
1/1/21, Blonde, 15, 2, 4, 1
1/2/21, Brown, 2, 8, 0, 2
1/2/21, Blonde, 11, 6, 7, 4
...
I'd like to be able to plot the cumulative counts over time for each sub-population. The complication is that I'd like to show
A plot aggregated by hair color
(so summing over all age groups), which can then be toggled (ideally by clicking on one of the lines) to show:
A plot for a given hair color
disaggregated by age group.
(Note that in the mockups, I'm normalizing counts to show it as a cumulative percentage. I've been doing that calculation straightforwardly with valueAccessors.)
My question is: how do I create the dimensions and groups to create these plots?
I'd prefer not to create individual variables for each age group (I'd like it to be generic enough to expand to finer categories). But I'm having trouble understanding how to use reduce and filters to achieve my desired outcome.
Also, should I be doing it all as linecharts in a compositeChart, or in a series chart? There is the added wrinkle that I plan to then annotate the chart with extra trendlines added in from d3.
Thanks!
The series chart is a convenience class that generates a composite chart underneath.
It allows you to specify your data using a 2D key, where one component is the key to be used for the X values in the chart, and one component is another key to be used for splitting the data into multiple layers - lines, in your case. You also give it the "prototype" of the layer chart, in the form of a function that returns a partially-initialized chart.
It sounds like you are on the right track, so I won't attempt to give a complete answer, just a few hints. Please feel free to follow up in the comments, and I will edit this answer to fill in details.
Flattening the data
You will probably want to flatten your data so that there is only one value per row, i.e. structure it with an Age column and a Value column. This is a general best practice for working with crossfilter.
It's possible to work with the data as you have it, but
you won't be able to filter by age, since filtering in crossfilter is by row
aggregating across ages will be more complicated, requiring custom reductions
Using multikeys and series chart
Following the series chart example, you might define your dimension as
const colorDateDimension = cf.dimension(d => [d['Hair Color'], d.Date]);
Now any group on this dimension will aggregate by both hair color and date.
Now if you're using the series chart, you can extract the components with
chart
.seriesAccessor(({key}) => key[0])
.keyAccessor(({key}) => key[1])
You could use the third parameter of the series chart chart function to determine the color or dash style of the layer, e.g.:
const dashStyles = {
'0-18': [3,1],
'19-29': [4,1,1,1],
// ...
};
.chart(function(c, _, subkey) {
return new dc.LineChart(c).dashStyle(dashStyles[subkey]);
})
Interaction
dc.js does not natively support the kind of drill-down you are describing. It would be easier to have one chart which is by hair color and another chart which is by age. Then when no hair color is selected, the age chart will show all hair colors, and when no age is selected, the hair color chart will show all ages.
If you want drill-down as you describe, you will have to write custom code to apply the filter and swap the chart definition when a hair color is clicked. It's not terribly complicated but please ask a follow-up question if you can't figure it out - it's better to keep SO questions on a single topic.
Annotating with D3
This part is pretty simple no matter how you implement the charts.
You will implement a pretransition handler and use chart.selectAll to add the content you need. There are many examples here on SO, so I won't go into it here.
Conclusion
I hope this gets you started. I've answered your specific question and given some hints about other assumptions or implicit questions within your question. It will be some work to get the results you want, but it is definitely possible.
I have created a bar chart in Kibana using the percentage bar mode, with two filters on the x axis. I simply want to show the distribution of the filters as a percentage of the total results from the search query. The problem is that all of the filters are showing as 100%, when this is not correct. So the visualisation is not actually showing the amount of results in the filter as a percentage of the total results. My visualisation options are shown in the images below:
And the Options tab:
Old post, but in case somebody is looking for a similar answer:
You need to use a bucket type "Split Series" instead of "X-axis".
Need to display line in a line-chart , with the ability to move the tiles, to see a max bitrate value line, to see labels and axis pointers on hover, grouped with a table and time Slider.Y dimension needs to display "bitrate total" or "bitrate Avg" (as defined in code). X dimension needs to display 15 min interval in scope of weeks.
I can upload my data into a table but not into the line graph. I can see points on the graph using .renderDataPoints() but no lines.
I checked the data - could not find any null/NaN values being returned, not using any old version of colors.
The code can be found in https://jsfiddle.net/dani2011/bu2ag0f7/8/. Tried to replace my CSV with var data but nothing is being displayed at the moment in the fiddle. The code as whole is displayed in https://groups.google.com/forum/#!topic/dc-js-user-group/MEslyF2RWRI
Any help would be greatly appreciated.
Here's my go-to-answer for how to put data into a jsFiddle. Basically it's easiest to stick it in an unused tag in the HTML. bl.ocks.org / blockbuilder.org is easier for this.
Here's a fork of your fiddle with the data loaded that way:
http://jsfiddle.net/gordonwoodhull/bu2ag0f7/17/
I also had to remove the spaces from the column names, because those got d3.csv confused and caused the BITRATE calculations to fail.
There was also some stray code inside the renderlet which was failing with a complaint about dim not existing.
The main reason why data was not displaying was because the input groups were not producing usable aggregated data. Your data is very close together in time, so aggregating by week would aggregate everything.
The way to debug this is to put a breakpoint or a console.log before the chart initialization and look at the results of group.all()
In this case bitrateWeekMinIntervalGroupMove and minIntervalWeekBitrateGroup were returning an array with one key/value pair. No lines can be drawn with one point. :)
It looks like you originally wanted to aggregate by 15 minute intervals, so let's get that working.
For whatever reason, there are two levels of aggregation in crossfilter, the dimension level and the group level. The dimension will have first crack at generating a key, and then the group will further refine these keys.
Your min15 function will map each time-key to the 15-minute mark before it, but it needs data that is higher than 15 minutes in resolution. So let's put these groups on the dateDimension, which hasn't already been mapped to a lower resolution:
var minIntervalWeekBitrateGroup = dateDimension.group(min15).reduceSum(function (d) {
return +d.BITRATE
});
var bitrateWeekMinIntervalGroupMove = dateDimension.group(min15).reduce(
...
Great, now there are 30 data points. And it draws lines.
I made the dots a bit smaller :) because at 30 pixels it was hard to see the lines.
Zooming in using the range chart reveals more of lines:
There still seem to be glitches in the reduce function (or somewhere) because the lines drop to zero when you zoom in too far, but hopefully this is enough to get you moving again.
My fork of your fiddle: http://jsfiddle.net/gordonwoodhull/bu2ag0f7/25/
I have a few million documents in an ElasticSearch index with some numeric fields, say foo and bar. Is there any way to use Kibana 4 to create a graph with foo values on the X axis and bar values on the Y axis? Like a very, very basic chart one might create using Excel.
I'm fine with sampling/aggregations of some kind. I understand that these tools won't show me a plot with 20 million data points. I'm just trying to see if there's some obvious relationship between foo and bar by creating a graph.
To just plot the correlation between revenue and employee count I would just use a line chart like this:
In order to justify creating a scatter plot chart though (since they're awesome and I wanted to) I generated some fake data that looked something like this:
{
name: faker.company.companyName(),
employees: _.random(3, 30),
revenue: _.random(10000, 100000),
industry: _.sample(industries)
}
And plotted it in visualize by breaking it down piece-by-piece:
Start with a line chart
Switch to Options tab of sidebar (since 4.1)
Uncheck "Show Connecting Lines"
Check "Scale Y-Axis to Data Bounds"
Switch back to the Data tab
Modify the "Y-Axis"
use the Average aggregation
on the employees field
Add a "Dot Size" metric
use the Unique Count aggregation
on the company field
Add a "Split Lines" bucket
use the Terms aggregation
on the industry field
I like to set the size close to the cardinality of my data
Add an "X-Axis"
use the Histogram aggregation
on the revenue field
guess an interval, you will need to play with this a bit
Finally, click Apply
This configuration is pretty complex, but the resulting visualization shows a lot of information.
I found a hack for this.
Create a line chart
X axis is a Terms aggregation of foo
Add sub-aggregation (Split Lines) on the same field
Y Axis is sum of your other column (bar)
I don't see any way of making the legend meaningful, though
I am trying to show machine states over time. Part of this is to reproduce/automate a report that used to be done by hand. It consists of coloring 2minute 'time slices' in Excel based on what the machine is doing.
(Sorry, not enough reputation to post a picture, but it is a classic heatmap where the state drives the color. Some non DC-JS fiddle: http://jsfiddle.net/ww6Lbnc5/4/)
I was able to generate most of what I want in the following jsfiddle:
http://jsfiddle.net/hwhfxz2t/14/
See fiddle for code.
The total state duration (for selected time frame) is shown in the pieChart, followed by the individual state lines and then the heatmap that people are used to. (the ZOOM and date selection buttons do not work in the fiddle but are there to select specific data ranges or zoom in if you like).
The line charts uses the original representation of the states, which consists of a time the state is entered and a duration.
In order to make the heat map work, I had to (I think) take the original data and convert it into individual minute chunks and mark them with a state. So for instance the original data specifying:
RUN state starting 14:30 for 300 seconds
becomes:
14:30=RUN, 14:31=RUN, 14:32=RUN, 14:33=RUN and 14:34=RUN
The code in lines 233-297 loops through the original data and generates a new one that does this. In cases where there is more than one state within a given minute, the last state survives.
This works okay but it seems that this code is exactly what is normally done in group().reduce(add,remove,init). But in this case I need to add multiple timeslots depending on the duration of a state.
Also, because it is now using a different crossfilter, maps do not update each other.
Here are my questions related to this:
Can I display a heatmap without supplying information for all individual
'cells'? (i.e. straddle cells based on a value, similar to rowspan in a table)
Can I add multiple values at once inside group().reduce()?
Is there an easy way to invert the yAxis so 0 is at the top?
When clicking a row in the heatmap, it selects a column and vice-versa?
I'm not sure if this should be in the crossfilter group. If so please ignore my rambling. If someone knows how to keep the charts linked by grouping better, please let me know.
--Nico
Concerning Question 3:
DC.js heatmaps currently do not support custom order functions on axis but there is a pull request that has been merged into the developing branch and should be accessible to the public soon.
You could manually edit the dc.js file to set the sorting in heatmaps to a custom function. In the latest (2.0.0-beta10) version it is the following line:
rowValues.sort(d3.ascending);
and accordingly
colValues.sort(d3.ascending);