Write actual values to bar chart using Gruff within Ruby - ruby

I am generating a bar chart with values [1,5,10,23]. Currently, I have no way of knowing those exact values when looking at the image generated by Gruff. I just know that 23 falls somewhere between the lines of 20 and 25.
Is it possible to write the exact values within the image?

I think you are looking for labels
g = Gruff::Bar.new
g.title = 'Wow! Look at this!'
g.data = "something", [1,5,10,23]
g.labels = { 0 => '1', 1 => '5', 2 => '10', 3 => '23'}
Read the documentation for more info on labels

I think this is what you are looking for:
g.show_labels_for_bar_values = true

Related

Bar Chart on Dimension-1 and Stacked by Dimension-2

Summary
I want to display a bar chart whose dimension is days and is stacked by a different category (i.e. x-axis = days and stack = category-1). I can do this "manually" in that I can write if-then's to zero or display the quantity, but I'm wondering if there's a systematic way to do this.
JSFiddle https://jsfiddle.net/wostoj/rum53tn2/
Details
I have data with dates, quantities, and other classifiers. For the purpose of this question I can simplify it to this:
data = [
{day: 1, cat: 'a', quantity: 25},
{day: 1, cat: 'b', quantity: 15},
{day: 1, cat: 'b', quantity: 10},
{day: 2, cat: 'a', quantity: 90},
{day: 2, cat: 'a', quantity: 45},
{day: 2, cat: 'b', quantity: 15},
]
I can set up a bar chart, by day, that shows total units and I can manually add the stacks for 'a' and 'b' as follows.
var dayDim = xf.dimension(_ => _.day);
var bar = dc.barChart("#chart");
bar
.dimension(dayDim)
.group(dayDim.group().reduceSum(
_ => _.cat === 'a' ? _.quantity : 0
))
.stack(dayDim.group().reduceSum(
_ => _.cat === 'b' ? _.quantity : 0
));
However, this is easy when my data has only 2 categories, but I'm wondering how I'd scale this to 10 or an unknown number of categories. I'd imagine the pseudo-code I'm trying to do is something like
dc.barChart("#chart")
.dimension(xf.dimension(_ => _.day))
.stackDim(xf.dimension(_ => _.cat))
.stackGroup(xf.dimension(_ => _.cat).group().reduceSum(_ => _.quantity));
I mentioned this in my answer to your other question, but why not expand on it a little bit here.
In the dc.js FAQ there is a standard pattern for custom reductions to reduce more than one value at once.
Say that you have a field named type which determines which type of value is in the row, and the value is in a field named value (in your case these are cat and quantity). Then
var group = dimension.group().reduce(
function(p, v) { // add
p[v.type] = (p[v.type] || 0) + v.value;
return p;
},
function(p, v) { // remove
p[v.type] -= v.value;
return p;
},
function() { // initial
return {};
});
will reduce all the rows for each bin to an object where the keys are the types and the values are the sum of values with that type.
The way this works is that when crossfilter encounters a new key, it first uses the "initial" function to produce a new value. Here that value is an empty object.
Then for each row it encounters which falls into the bin labelled with that key, it calls the "add" function. p is the previous value of the bin, and v is the current row. Since we started with a blank object, we have to make sure we initialize each value; (p[v.type] || 0) will make sure that we start from 0 instead of undefined, because undefined + 1 is NaN and we hate NaNs.
We don't have to be as careful in the "remove" function, because the only way a row will be removed from a bin is if it was once added to it, so there must be a number in p[v.type].
Now that each bin contains an object with all the reduced values, the stack mixin has helpful extra parameters for .group() and .stack() which allow us to specify the name of the group/stack, and the accessor.
For example, if we want to pull items a and b from the objects for our stacks, we can use:
.group(group, 'a', kv => kv.value.a)
.stack(group, 'b', kv => kv.value.b)
It's not as convenient as it could be, but you can use these techniques to add stacks to a chart programmatically (see source).

Extracting vectors from Doc2Vec

I am trying to extract the documents vector to feed into a regression model for prediction.
I have fed around 1 400 000 of labelled sentences into doc2vec for training, however I was only able to retrieve only 10 vectors using model.docvecs.
This is a snapshot of the labelled sentences I used to trained the doc2vec model:
In : documents[0]
Out: TaggedDocument(words=['descript', 'yet'], tags='0')
In : documents[-1]
Out: TaggedDocument(words=['new', 'tag', 'red', 'sparkl', 'firm', 'price', 'free', 'ship'], tags='1482534')
These are the code used to train the doc2vec model
model = gensim.models.Doc2Vec(min_count=1, window=5, size=100, sample=1e-4, negative=5, workers=4)
model.build_vocab(documents)
model.train(documents, total_examples =len(documents), epochs=1)
This is the dimension of the documents vectors:
In : model.docvecs.doctag_syn0.shape
Out: (10, 100)
On which part of the code did I mess up?
Update:
Adding on to the comment from sophros, it appear that i have made a mistake when I am creating the TaggedDocument prior to training which resulted in 1.4 mil Documents appearing as 10 Documents.
Courtesy of Irene Li on your tutorial on Doc2vec, I have made some slightly edit to the class she used to generate TaggedDocument
def get_doc(data):
tokenizer = RegexpTokenizer(r'\w+')
en_stop = stopwords.words('english')
p_stemmer = PorterStemmer()
taggeddoc = []
texts = []
for index,i in enumerate(data):
# for tagged doc
wordslist = []
tagslist = []
i = str(i)
# clean and tokenize document string
raw = i.lower()
tokens = tokenizer.tokenize(raw)
# remove stop words from tokens
stopped_tokens = [i for i in tokens if not i in en_stop]
# remove numbers
number_tokens = [re.sub(r'[\d]', ' ', i) for i in stopped_tokens]
number_tokens = ' '.join(number_tokens).split()
# stem tokens
stemmed_tokens = [p_stemmer.stem(i) for i in number_tokens]
# remove empty
length_tokens = [i for i in stemmed_tokens if len(i) > 1]
# add tokens to list
texts.append(length_tokens)
td = TaggedDocument(gensim.utils.to_unicode(str.encode(' '.join(stemmed_tokens))).split(),str(index))
taggeddoc.append(td)
return taggeddoc
The mistake was fixed when I made the change from
td = TaggedDocument(gensim.utils.to_unicode(str.encode(' '.join(stemmed_tokens))).split(),str(index))
to this
td = TaggedDocument(gensim.utils.to_unicode(str.encode(' '.join(stemmed_tokens))).split(),[str(index)])
It appear that the index of the TaggedDocument must be in the form of the list for TaggedDocument to work properly. For more details as to why, please refer to this answer by gojomo.
The gist of the error was: the tags for each individual TaggedDocument were being provided as plain strings, like '101' or '456'.
But, tags should be a list-of-separate tags. By providing a simple string, it was treated as a list-of-characters. So '101' would become ['1', '0', '1'], and '456' would become ['4', '5', '6'].
Across any number of TaggedDocument objects, there were thus only 10 unique tags, single digits ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']. Every document just caused some subset of those tags to be trained.
Correcting tags to be a list-of-one tag, eg ['101'], allows '101' to be seen as the actual tag.

LINQ : Distinct and Orderby

I am trying to use LINQ (to EF) to get a DISTINCT list and then sort it. All the examples I found sort the result based on the DISTINCT value. But I want to sort it on a different field.
Example: Table with 2 fields (canvasSize and canvasLength);
var sizes = (from s in ent.competitors
select s.canvasSize).Distinct().OrderBy(x => x);
All the examples I found give this type of answer. But it sorts by canvasSize whereas, I want to sort by canvasLength.
I'm stuck ... Any tips are greatly appreciated ...
Per J. Skeet > Additional info:
company canvasSize canvasLength
abc 8x10 8
d 8x10 8
e 10x10 10
f 10x10 10
g 40x40 40
I would like it to be distinct on canvasSize. The problem is that when sorted, it results in this order:
10x10
40x40
8x10
I would like the same result set but sorted using canvasLength so the result is:
8x10
10x10
40x40
I think what you're after may be something like this:
var sizes = (from s in ent.competitors
select new { s.canvasSize, s.canvasLength })
.Distinct()
.OrderBy(x => x.canvasLength);
Update
Based on the extra information in your question, the following should do what you want:
var sizes = ent.competitors
.Select(c => new {c.canvasSize, c.canvasLength})
.Distinct()
.OrderBy(x => x.canvasLength)
.Select(x => x.CanvasSize)
var sizes = ent.competitors
.GroupBy(s => s.canvasSize)
.Select(g => g.First())
.OrderBy(s => s.canvasLength);

CSS Selector for Table Row with X number of Cells

I'm trying to scrape some content off of a website and I am having trouble selecting the correct elements.
I'm using Nokogiri, and, as I know CSS best, I am trying to use it to select the data I want.
There is a big table with rows I do not want, but these can change; They are not always row 4, 5, 6, 10, 14 for example.
The only way I can tell if it's a row I want is if the row has TD tags in it.
What is the right CSS selector to do this?
# Search for nodes by css
doc.css('#mainContent p table tr').each do |td|
throw td
end
EDIT:
I'm trying to scrape boxrec.com/schedule.php. I want the rows for each match, but, it's a very large table with numerous rows which aren't the match. The first couple rows of each date section aren't needed, including every other line which has "bout subject to change....", and also spacing rows between days.
SOLUTION:
doc.xpath("//table[#align='center'][not(#id) and not(#class)]/tr").each do |trow|
#Try get the date
if trow.css('.show_left b').length == 1
match_date = trow.css('.show_left b').first.content
end
if trow.css('td a').length == 2 and trow.css('* > td').length > 10
first_boxer_td = trow.css('td:nth-child(5)').first
second_boxer_td = trow.css('td:nth-child(5)').first
match = {
:round => trow.css('td:nth-child(3)').first.content.to_i,
:weight => trow.css('td:nth-child(4)').first.content.to_s,
:first_boxer_name => first_boxer_td.css('a').first.content.to_s,
:first_boxer_link => first_boxer_td.css('a').first.attribute('href').to_s,
:second_boxer_name => second_boxer_td.css('a').first.content.to_s,
:second_boxer_link => second_boxer_td.css('a').first.attribute('href').to_s,
:date => Time.parse(match_date)
}
#:Weight => trow.css('td:nth-child(4)').to_s
#:BoxerA => trow.css('td:nth-child(5)').to_s
#:BoxerB => trow.css('td:nth-child(9)').to_s
myscrape.push(match)
end
end
You won't be able to tell how many td elements a tr contains, but you can tell if it is empty or not:
doc.css('#mainContent p table tr:not(:empty)').each do |td|
throw td
end
You can do something like this:
tr rows with a 4th td
doc.xpath('//tr/td[4]/..')
another way with css:
doc.css('tr').select{|tr| tr.css('td').length >= 4}

Gruff Bar Graph Data in array,

I need a Bar Graph in Gruff with two Bars.
For two subjects, I loop through to get the values for activity and grade :
sub = ["English", "Maths"]
activity = []
grade = []
sub.each do |sub|
activity.push(sub["activity"].to_i)
grade.push(sub["grade"].to_i)
end
Now, I am using these values for my Bar Graph.
g = Gruff::Bar.new('500x250')
g.maximum_value = 100
g.minimum_value = 0
g.y_axis_increment = 15
g.data( "Activity", activity.inspect)
g.data( "Summative", grade.inspect)
g.labels = {0 => 'English', 1 => 'Language II'}
g.write('images/overall_score.png')
But, this throws an error " comparison of String with 0 failed". I need the data to be printed as
g.data( "Activity", [10,20])
puts activity.inspect prints the array as above ex: [10,20]
Looks like the values are treated as strings. What should I do to resolve this.
Any help is greatly appreciated.
Cheers!
Actually inspect method returns String. I think you should just pass your arrays to data method like this:
g.data("Activity", activity)
g.data("Summative", grade)

Resources