Active Record Association after ".where" query - ruby

I've got three models: Decks, Slots and Cards. I put them together like so...
Decks are made of many slots, each slot contains one card and any one card can show up in a number of different slots.
I modeled it after the "Order - Order Line Item - Product" structure, hope that makes sense.
Anyways, Decks have an integer field called :deck_type, and suppose I want to get all of the decks of a certain type and then see all of their cards. I EXPECT to be able to run this query but I get an error of undefined method 'cards':
Deck.where(:deck_type => 1).cards
To get all decks of type 1 and then spit out their cards. I have an association established of "deck has many cards through slots", and when I call ".cards" on a single deck it works fine to return the cards.
I feel like this should be a pretty basic query - what am I missing?
Thanks in advance for any insight.

The method cards is for one deck only. So the following should work:
Deck.where(deck_type: 1).first.cards
The first will fetch 1 deck.
If you want cards that belong to decks with deck_type 1, then you've got a few options:
Deck.where(deck_type: 1).map(&:cards).flatten.uniq
That will apply the cards method on each found deck and then get all cards. The flatten will make the results into a 1D array and then uniq will ensure that no duplicates are present, if any.
But the following might be faster:
deck_ids = Deck.where(deck_type: 1).pluck(:id)
Card.where(deck_id: deck_ids)
I think it's safe to assume your Card model has a deck_id attribute. From the above, you will fetch only those cards that have deck_id in the deck_ids variable.
Even better however would be the following as it'll be a single database query. Assuming you've got the right associations setup, you can do:
# replace 'decks' with Deck.table_name if necessary
Card.joins(:deck).where(decks: {deck_type: 1})
I hope that last one is self-explanatory.

Related

Algorithm for translating MLB play-by-play records into descriptive text

I'm trying to collect a dataset that could be used for automatically generating baseball articles.
I have play-by-play records of MLB games from retrosheet.org that I would like to be written out to plain text, as those that could possibly appear as part of a recap news article.
Here are some examples of the play-by-play records:
play,2,0,semim001,32,.CBFFFBBX,9/F
play,2,0,phegj001,01,FX,S7/G
play,2,0,martn003,01,CX,3/G
play,2,1,youne003,00,,NP
The following is what I would like to achieve:
For the first example
play,2,0,semim001,32,.CBFFFBBX,9/F,
I want it to be written out as something like:
"semim001 (Marcus Semien) was on three balls and two strikes in the second inning as the away player. He hit the ball into play after one called strike, one ball, three fouls, and another two balls. The fly ball was caught by the right outfielder."
The plays are formatted in the following way:
The first field is the inning, an integer starting at 1.
The second field is either 0 (for visiting team) or 1 (for home team).
The third field is the Retrosheet player id of the player at the plate.
The fourth field is the count on the batter when this particular event (play) occurred. Most Retrosheet games do not have this information, and in such cases, "??" appears in this field.
The fifth field is of variable length and contains all pitches to this batter in this plate appearance and is described below. If pitches are unknown, this field is left empty, nothing is between the commas.
The sixth field describes the play or event that occurred.
Explanations for all the symbols in the fifth and sixth field can be found on this Retrosheet page.
With Python 3, I've been able to format all the info of invariable length into a formatted sentence, which is all but the last two fields. I'm having difficulty in thinking of an efficient way to unparse (correct me if this is the wrong term to use here) the fifth and sixth fields, the pitches and the events that occurred, due to their variable length and wide variety of things that can occur.
I think I could write out all the rules based on the info on the Retrosheet website, but I'm looking for suggestions for a smarter way to do this. I wrote natural language processing as tags, hoping this could be a trivial problem in that field. Any pointers will be greatly appreciated!

Reporting Multiple Values & Sorting

Having a bit of an issue and unsure if it's actually possible to do.
I'm working on a file that I will enter target progression vs actual target reporting the % outcome.
PAGE 1
¦NAME ¦TAR 1 %¦TAR 2 %¦TAR 3 %¦TAR 4 %¦OVERALL¦SUB 1¦SUB 2¦SUB 3¦
¦NAME1¦ 114%¦ 121%¦ 100%¦ 250%¦ 146%¦ 2¦ 0¦ 0%¦
¦NAME2¦ 88%¦ 100%¦ 90%¦ 50%¦ 82%¦ 0¦ 1¦ 0%¦
¦NAME3¦ 82%¦ 54%¦ 64%¦ 100%¦ 75%¦ 6¦ 6¦ 15%¦
¦NAME4¦ 103%¦ 64%¦ 56%¦ 43%¦ 67%¦ 4¦ 4¦ 24%¦
¦NAME5¦ 87%¦ 63%¦ 89%¦ 0%¦ 60%¦ 3¦ 2¦ 16%¦
Now I already have it sorting all rows by the Overall % column so I can quickly see at a glance but I am creating a second page that I need to reference points.
So on the second page I would like to somehow sort and reference different columns for example
PAGE 2
TOP TAR 1¦Name of top %¦Top %¦
TOP TAR 2¦Name of top %¦Top %¦
Is something like this possible to do?
Essentially I'm creating an Employee of the Month form that automatically works out who has topped what.
I'm willing to drop a paypal donation for whoever can figure this out for me as I've been doing it manually every month and would appreciate the time saved
I don't think a complicated array formula is necessary for this - I am suggesting a fairly standard Index/Match approach.
First set up the row titles - you can just copy and transpose them from Page 1, or use a formula in A2 of Page 2 like
=transpose('Page 1'!B1:E1)
The use them in an index/match to get the data in the corresponding column of the main sheet and find its maximum (in C2)
=max(index('Page 1'!A:E,0,match(A2,'Page 1'!A$1:E$1,0)))
Finally look up the maximum in the main sheet to find the corresponding name:
=index('Page 1'!A:A,match(C2,index('Page 1'!A:E,0,match(A2,'Page 1'!A$1:E$1,0)),0))
If you think there could be a tie for first place with two or more people getting the same score, you could use a filter to get the different names:
So if the max score is in B8 this time (same formula)
=max(index('Page 1'!A:E,0,match(A8,'Page 1'!A$1:E$1,0)))
the different names could be spread across the corresponding row using transpose (in C8)
=ArrayFormula(TRANSPOSE(filter('Page 1'!A:A,index('Page 1'!A:E,0,match(A8,'Page 1'!A$1:E$1,0))=B8)))
I have changed the test data slightly to show these different scenarios
Results

Count nodes based on two or more attribute values

I need to count how many times a particular node occurs in a document based on the values of two if its attributes. So, given the following small sample of XML:
<p:entry timestamp="2012-11-15T17:53:34.642-05:00" ticks="89709622449012" system="OSD" component="OSD5" marker=".\Launcher.cpp:1741" severity="Info" type="Driver" subtype="Start" tags="" sensitivity="false">
This can occur one or more times in the document with different attribute sets. I need to count how many show up with type="Driver" AND subtype="Start". I am able to count how many just have type="Driver" using:
count(//p:entry[#type="Driver"])
but haven't been able to combine them. This didn't work:
count(//p:entry[#type="Driver" and #subtype="Start"])
This works for the OP. Specify 2 predicates in succession instead of using operator and result in the same effect:
count(//p:entry[#type="Driver"][#subtype="Start"])
By right, the original code count(//p:entry[#type="Driver" and #subtype="Start"]) should work, as far as my knowledge goes.

Sorting by counting the intersection of two lists in MongoDB

We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can't figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I'm currently using the pymongodb as I'm python developer.
You should create an index on tags:
db.posts.ensure_index([('tags', 1)])
and search for posts that share at least one tag with postA:
posts = list(db.posts.find({_id: {$ne: postA['_id']}, 'tags': {'$in': postA['tags']}}))
and finally, sort by intersection in Python:
key = lambda post: len(tag for tag in post['tags'] if tag in postA['tags'])
posts.sort(key=key, reverse=True)
Note that if postA shares at least one tag with a large number of other posts this won't perform well, because you'll send so much data from Mongo to your application; unfortunately there's no way to sort and limit by the size of the intersection using Mongo itself.

Storing cvs data for further manipulation using Ruby

I am dealing with a csv file that has some customer information (email, name, address, amount, [shopping_list: item 1, item 2]).
I would like work with the data and produce some labels for printing... as well as to gather some extra information (total amounts, total items 1...)
My main concern is to find the appropriate structure to store the data in ruby for future manipulation. For now I have thought about the following possibilities:
multidimensional arrays: pretty simple to build, but pretty hard to access the data in a beautiful ruby way.
hashes: having the email as key, and storing the information in different hashes (one hash for name, another hash for address, another hash for shopping list...)
(getting the cvs data in to a Database and working with the data from ruby??)
I would really appreciate your advice and guidance!!
Once you have more than a couple pieces of information that you need to group together, it's time to consider moving from a generic hash/array to something more specialized. A good candidate for what you've described is Ruby's struct module:
Customer = Struct.new(:email, :name, :address) # etc.
bill = Customer.new('bill#asdf.com', 'Bill Foo', '123 Bar St.')
puts "#{bill.name} lives at #{bill.address} and can be reached at #{bill.email}"
Output:
Bill Foo lives at 123 Bar St. and can be reached at bill#asdf.com
Struct#new simply creates a class with an attr_accessor for each symbol you pass in. Well, it actually creates a bit more than that, but for starters, that's all you need to worry about.
Once you've got the data from each row packed into an object of some sort (whether it's a struct or a class of your own), then you can worry about how to store those objects.
A hash will be ideal for random access by a given key (perhaps the customer's name or other unique ID)
A one-dimensional array works fine for iterating over the entire set of customers in the same order they were inserted

Resources