I want to show on 2 different charts the same metric filtered by 2 different 'static' filters.
For example, say I have sales data for all cities in the US with an extra criteria being East or West for the region.
City,Region,Sales
NY,East,100
LA,West,200
Boston,East,300
SanDiego,West,400`
I want to show the cities from the east and west coast side by side in row charts.
xfilter = crossfilter(data)
city_dim_east = xfilter.dimension("City")
city_dim_west = xfilter.dimension("City")
city_group_east = city_dim_east.group().reduceSum("Sales")
city_group_west = city_dim_west.group().reduceSum("Sales")
Those 2 dims/groups are obviously exactly similar. How do I put a static filter in there which won't get reset when dc.js works its magic around the other dimensions/groups ?
Related
I am trying to create visualizations for recent commonwealth medal tally dataset.
I would like to create a grouped bar chart of top ten countries by total number of medals won.
Y axis = total
x axis = Country name
How can I divide totals into three bars consisting of no of :
gold, Silver,Bronze medals won by each country?
I created one using excel, but don't know how to do it using seaborn
P.S. I have already tried using a list of columns for hue.
df_10 = df.head(10)
sns.barplot(data = df_10, x = 'team' , y = 'total' , hue = df_10[["gold" ,
"silver","bronze"]].apply(tuple , axis = 1) )
Here is the chart that I created using excel:
enter image description here
To plot the graph, you will need to change the dataframe to the format that will allow for easy plotting. One of the ways to do this is using dataframe.melt(). The method used by you may not work... Once the data is in a format that seaborn understands easily, plotting will become simple. As you have not provided the format for df_10, I have assumed the data to have 4 columns - Country, Gold, Silver and Bronze. Below is the code...
## Use melt using Country as ID and G, S, B as the rows for values
df_10 = pd.melt(df_10, id_vars=['Country'], value_vars=['Gold', 'Silver', 'Bronze'])
df_10.rename(columns={'value':'Count', 'variable':'Medals'}, inplace=True) ##Rename so the plot has informative texts
fig, ax=plt.subplots(figsize=(12, 7)) ## Set figure size
ax=sns.barplot(data=df_10, x='Country', y='Count', hue='Medals') ## Plot the graph
I am using working on Tableau stacked bar chart.
The bar chart represents the total %. Therefore, the length of bar chart is equal.
Now I would like to sort the dimension (referee) based on the values of legends ( highest to lowest).
can anyone suggest me how to do it.
I also attached the packaged workfile here
Here is the picture of sort screen;
Level of data source below:
Below is the screen shot based on the final answer provided:
Thanks,
Zep
So to get this you first need to get a calc field that gets the win %:
SUM(IF [FTR] = 'AWins' OR [FTR] = 'Hwins' THEN 1 END)/COUNTD([Game ID])
This can then be used to rank the referees:
Now the reason that it may not be working for you with your technique is that you're sorting on COUNTD(Wins) which is the total number of wins, not the percentage wins for the ref. So someone that has just played more games may come up higher in the rank
Now you have the calc field, you can go back to your report and sort on the new field:
I rearranged the legend so you can see that the ref with the best % wins are shown first (red and blue bars)
If you don't want it sorted by win %, then change the calc field to:
SUM(IF [FTR] = 'AWins' OR [FTR] = 'Hwins' THEN 1 END)
For the COUNTD of games, if you only have the date and the game available and want to create an ID from that that is unique, create a calc field like this:
game-date-id = STR([game]) + STR(' ') + STR(date)
This will then be used in your COUNTD if statement:
SUM(IF [FTR] = 'AWins' OR [FTR] = 'Hwins' THEN 1 END)/COUNTD([game-date-id])
I have attached the picture of the dashboard.
I want to sort the referee based of Hwin
Yeah. It did not work out as expected
Good Evening Everyone,
I'm trying to take the data from a database full of hour reports (name, timestamp, hours worked, etc.) and create a plot using dc.js to visualize the data. I would like the timestamp to be on the x-axis, the sum of hours for the particular timestamp on the y-axis, and a new bar graph for each unique name all on the same chart.
It appears based on my objectives that using crossfilter.js the timestamp should be my 'dimension' and then the sum of hours should be my 'group'.
Question 1, how would I then use the dimension and group to further split the data based on the person's name and then create a bar graph to add to my composite graph? I would like for the crossfilter.js functionality to remain intact so that if I add a date range tool or some other user controllable filter, everything updates accordingly.
Question 2, my timestamps are in MySQL datetime format: YYYY-mm-dd HH:MM:SS so how would I go about dropping precision? For instance, if I want to combine all entries from the same day into one entry (day precision) or combine all entries in one month into a single entry (month precision).
Thanks in advance!
---- Added on 2017/01/28 16:06
To further clarify, I'm referencing the Crossfilter & DC APIs alongside the DC NASDAQ and Composite examples. The Composite example has shown me how to place multiple line/bar charts on a single graph. On the composite chart I've created, each of the bar charts I've added a dimension based off of the timestamps in the data-set. Now I'm trying to figure out how to define the groups for each. I want each bar chart to represent the total time worked per timestamp.
For example, I have five people in my database, so I want there to be five bar charts within the single composite chart. Today all five submitted reports saying they worked 8 hours, so now all five bar charts should show a mark at 01/28/2017 on the x-axis and 8 hours on the y-axis.
var parseDate = d3.time.format('%Y-%m-%d %H:%M:%S').parse;
data.forEach(function(d) {
d.timestamp = parseDate(d.timestamp);
});
var ndx = crossfilter(data);
var writtenDimension = ndx.dimension(function(d) {
return d.timestamp;
});
var hoursSumGroup = writtenDimension.group().reduceSum(function(d) {
return d.time_total;
});
var minDate = parseDate('2017-01-01 00:00:00');
var maxDate = parseDate('2017-01-31 23:59:59');
var mybarChart = dc.compositeChart("#my_chart");
mybarChart
.width(window.innerWidth)
.height(480)
.x(d3.time.scale().domain([minDate,maxDate]))
.brushOn(false)
.clipPadding(10)
.yAxisLabel("This is the Y Axis!")
.compose([
dc.barChart(mybarChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup, "Top Line")
]);
So based on what I have right now and the example I've provided, in the compose section I should have 5 charts because there are 5 people (obviously this needs to be dynamic in the end) and each of those charts should only show the timestamp: total_time data for that person.
At this point I don't know how to further breakup the group hoursSumGroup based on each person and this is where my Question #1 comes in and I need help figuring out.
Question #2 above is that I want to make sure that the code is both dynamic (more people can be handled without code change), when minDate and maxDate are later tied to user input fields, the charts update automatically (I assume through adjusting the dimension variable in some way), and if I add a names filter that if I unselect names that the chart will update by removing the data for that person.
A Question #3 that I'm now realizing I'll want to figure out is how to get the person's name to show up in the pointer tooltip (the title) along with timestamp and total_time values.
There are a number of ways to go about this, but I think the easiest thing to do is to create a custom reduction which reduces each person into a sub-bin.
First off, addressing question #2, you'll want to set up your dimension based on the time interval you're interested in. For instance, if you're looking at days:
var writtenDimension = ndx.dimension(function(d) {
return d3.time.hour(d.timestamp);
});
chart.xUnits(d3.time.hours);
This will cause each timestamp to be rounded down to the nearest hour, and tell the chart to calculate the bar width accordingly.
Next, here's a custom reduction (from the FAQ) which will create an object for each reduced value, with values for each person's name:
var hoursSumGroup = writtenDimension.group().reduce(
function(p, v) { // add
p[v.name] = (p[v.name] || 0) + d.time_total;
return p;
},
function(p, v) { // remove
p[v.name] -= d.time_total;
return p;
},
function() { // init
return {};
});
I did not go with the series example I mentioned in the comments, because I think composite keys can be difficult to deal with. That's another option, and I'll expand my answer if that's necessary.
Next, we can feed the composite line charts with value accessors that can fetch the value by name.
Assume we have an array names.
compositeChart.shareTitle(false);
compositeChart.compose(
names.map(function(name) {
return dc.lineChart(compositeChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup)
.valueAccessor(function(kv) {
return kv.value[name];
})
.title(function(kv) {
return name + ' ' + kv.key + ': ' + kv.value;
});
}));
Again, it wouldn't make sense to use bar charts here, because they would obscure each other.
If you filter a name elsewhere, it will cause the line for the name to drop to zero. Having the line disappear entirely would probably not be so simple.
The above shareTitle(false) ensures that the child charts will draw their own titles; the title functions just add the current name to those titles (which would usually just be key:value).
I'm having a bit of truble trying to get Kibana do a certain bar chart.
In a ridiculous reduction, it looks like this:
My data consists of documents of the following structure:
FULL NAME: "Michael Jordan"
PROPERTIES: "53Y MALE 198cm"
DEPARTMENT: Parquet
FULL NAME: "Sasha Digiulian"
PROPERTIES: "24Y FEMALE 157cm"
DEPARTMENT: Rock, Ice
FULL NAME: "Ueli Steck"
PROPERTIES: "40Y MALE 187cm"
DEPARTMENT: Ice
Eventually, I'd like to display a two-colored bar chart with its X axis the department, and the Y axis being a double bar with one color for the number of males for the department and another color for the number of females.
In this case there will be 3 (double) bars for
[Rock, Ice, Parquet]
with Y axis being
[(0,1), (1,1), (1,0)]
Now, for each one separately it is easy - define a filter as a query on PROPERTIES, then a (unique) count aggrigation on FULL NAME. But then again - filter is for ALL the plot, what can I do to make a different filter for each color?
Alternatively, I can try to define a scripted field, something like
MALE_NAME: doc['PROPERTIES'].value=~"MALE"?doc['FULL NAME'].value?""
and same for female. But now, scripted fields won't word on strings...
Any Ideas are greatly welcomed. Thanks!
Seems like this issue is still open. But then there's a workaround here using the Split Bar aggregation. Give it a go.
not sure how to do this. I have the following data:
Date, Country, QuantityA, QuantityB.
I want to make a timeline Chart with the ratio between Quantity A and B. I also want to create a barChart with Country, which will show the ratio in every country.
The problem is that the ratios are not additive, so if I do this:
var timeDim = ndx.dimension(function(d) {return d.Date;});
ratioAB = timeDim.group().reduceSum(function(d) {return QuantityA/QuantityB}
This will return the ratios for every country separately and will add them up. What I want is to add up QuantityA and QuantityB and then do the ratio.
Thus, the timeline chart will only show the right ratio if I filter in one of the countries.
Is there a way to add both the country and the date as a dimension?
You can create a custom grouping to calculate the sum of QuantityA, the sum of QuantityB, and the ratio between the 2. Or you could just create 2 sum groups, one summing QuantityA, the other QuantityB, and then calculate the ratio when you build the visualization.