Ruby Nested Hash with Composite Unique Keys - ruby

Given a comma separated CSV file in the following format:
Day,User,Requests,Page Views,Browse Time,Total Bytes,Bytes Received,Bytes Sent
"Jul 25, 2012","abc123",3,0,0,13855,3287,10568
"Jul 25, 2012","abc230",1,0,0,1192,331,861
"Jul 25, 2012",,7,0,0,10990,2288,8702
"Jul 24, 2012","123456",3,0,0,3530,770,2760
"Jul 24, 2012","abc123",19,1,30,85879,67791,18088
I wanted to drop the entire dataset (1000 users over 30 days = 30,000 records) into a hash such that Key 1 may be a duplicate key, key 2 may be a duplicate key, but Key 1 & 2 will be unique together.
Example using line 1 above:
report_hash = "Jul 25, 2012" => "abc123" => {"PageRequest" => 3, "PageViews" => 0, "BrowseTime" => 0, "TotalBytes" => 13855, "BytesReceived" => 3287, "BytesSent" => 10568}
def hashing(file)
#read the CSV file into an Array
report_arr = CSV.read(file)
#drop the header row
report_arr.drop(1)
#Create an empty hash to save the data to
report_hash = {}
#for each row in the array,
#if the first element in the array is not a key in the hash, make one
report_arr.each{|row|
if report_hash[row[0]].nil?
report_hash[row[0]] = Hash.new
#If the key exists, does the 2nd key exist? if not, make one
elsif report_hash[row[0]][row[1]].nil?
report_hash[row[0]][row[1]] = Hash.new
end
#throw all the other data into the 2-key hash
report_hash[row[0]][row[1]] = {"PageRequest" => row[2].to_i, "PageViews" => row[3].to_i, "BrowseTime" => row[4].to_i, "TotalBytes" => row[5].to_i, "BytesReceived" => row[6].to_i, "BytesSent" => row[7].to_i}
}
return report_hash
end
I spent several hours learning hashes and associated content to get this far, but feel like there is a much more efficient method to do this. Any suggestions on the proper/more efficient way of creating a nested hash with the first two keys being the first two elements of the array such that they create a "composite" unique key?

You could use the array [day, user] as the hash key.
report_hash = {
["Jul 25, 2012","abc123"] =>
{
"PageRequest" => 3,
"PageViews" => 0,
"BrowseTime" => 0,
"TotalBytes" => 13855,
"BytesReceived" => 3287,
"BytesSent" => 10568
}
}
You just have to make sure the date and user always appear the same. If your date (for example) appears in a different format sometimes, you'll have to normalize it before using it to read or write the hash.
A similar way would be to convert the day + user into a string, using some delimiter between them. But you have to be more careful that the delimiter doesn't appear in the day or the user.
EDIT:
Also make sure you don't modify the hash keys. Using arrays as keys makes this a very easy mistake to make. If you really wanted to, you could modify a copy using dup, like this:
new_key = report_hash.keys.first.dup
new_key[1] = 'another_user'

Related

JSON is object instead of array, if array_diff returns assoc array on Collection->toArray()

My issue is in my json I am expecting an array, but am getting an object.
Details:
I have an array of numbers:
$numbers = [1];
I select from relationship, the "drawn numbers":
$drawnNumbers = Ball::whereIn('number', $numbers)->where('game_id', $card->game->id)->get()->map(function($ball) {
return $ball->number;
})->toArray();
I do a ->toArray() here. I want to find the numbers in $numbers that do not occur in $drawnNumbers. I do so like this:
$numbersNotYetDrawn = array_diff($numbers, $drawnNumbers);
My method then return $numbersNotYetDrawn (my headers accept is application/json).
So now the issue. When $drawnNumbers is an empty array, then the printed json is a regular array like this:
[
1
]
However if the relationship returns $drawnNumbers to be an array with numbers, then json is printed as an object:
{
"0" => 1
}
Does anyone know why this is? Anyway to ensure that json is array?
Edit:
Here is my actual data:
$drawnNumbers = Ball::whereIn('number', $numbers)->where('game_id', $card->game->id)->get()->map(function($ball) {
return $ball->number;
})->toArray();
$undrawnNumbers = array_diff($numbers, $drawnNumbers);
// $undrawnNumbers = array_values(array_diff($numbers, $drawnNumbers)); // temp fix
Replace
$numbersNotYetDrawn = array_diff($numbers, $drawnNumbers);
with
$numbersNotYetDrawn = array_values(array_diff($numbers, $drawnNumbers));
to make sure element keys are reset and array is treated as a simple list and serialized to a JSON list - instead of being treated as an associative array and serialized to a JSON object.
I recently had this same problem and wondered the same thing.
I solved it by adding "array_values", but I was wondering how to reproduce it.
I found it that it is reproduced when array_diff removes an element from the array that isn't the last element. So:
>>> $x
=> [
1,
2,
3,
4,
5,
]
>>> array_diff($x, [5]);
=> [
1,
2,
3,
4,
]
>>> array_diff($x, [1]);
=> [
1 => 2,
2 => 3,
3 => 4,
4 => 5,
]

Ruby put in order columns when creating CSV document from Mongoid

I need to create CSV document from database. So I want to organise columns in particular order and I have template of this order and this template stored as array of headers
header = ["header1", "header2", "header3", "header4", "header5"]
record = [{"header4" =>"value4"}, {"header3" =>"value3"}, {"header5"=>"value5"}, {"header1"=>"value1"}, {"header2"=>"value2"}]
I need to get array like tis
record = [{"header1" =>"value1"}, {"header2" =>"value2"}, {"header3"=>"value3"}, {"header4"=>"value4"}, {"header5"=>"value5"}]
but when I doing
csv<< mymodel.attributes.values.sort_by! { |h| header.index(h.keys[0])
It does not work
When you call mymodel.attributes, you get a Hash back which maps attributes names (as strings) to their values. If your attribute names are header1 through header5 then mymodel.attributes will be something like this:
{
'header1' => 'value1',
'header2' => 'value2',
'header3' => 'value3',
'header4' => 'value4',
'header5' => 'value5'
}
Of course, the order depends on how things come out of MongoDB. The easiest way to extract a bunch of values from a Hash in a specified order is to use values_at:
mymodel.attributes.values_at(*header)

Most efficient way to extract an item from a Ruby array of hashes

I have some large Ruby structures that I need to quickly extract data from. I have no control over the format of the data, although I'm open to transforming it under certain circumstances. What is the most efficient way to extract a single item from the following hash, when using the displayName as the 'key'.
[
{'displayName'=>'Some Key 1', 'values'=>[1,2,3]},
{'displayName'=>'Some Key 2', 'values'=>["Some text"]},
{'displayName'=>'Some Key 3', 'values'=>["Different text","More text"]},
{'displayName'=>'Some Key 4', 'values'=>[2012-12-12]}
]
Each hash has other keys in it that I've removed to assist understanding.
The challenge is that in certain circumstances, the displayName field will need to be matched on a prefix sub-string. Does anybody have any practical experience knowing when to use .each and match manually, or .select to get the common case exact matches and fallback for the prefixes afterwards. Or is there some common trick I'm missing.
If you're doing this once, you'll probably just have to iterate over the set and find what you need:
row = data.find do |row|
row['displayName'] == name
end
row && row['values']
If you're doing it more than once, you should probably make an indexed structure out of it with a simple transform to create a temporary derivative structure:
hashed = Hash[
data.collect do |row|
[ row['displayName'], row['values'] ]
end
]
hashed[name]
You can use simple select thought it may no be as fast as it could with large sized arrays:
data = [
{'displayName'=>'Some Key 1', 'values'=>[1,2,3]},
{'displayName'=>'Some Key 2', 'values'=>["Some text"]},
{'displayName'=>'Some Key 3', 'values'=>["Different text","More text"]},
{'displayName'=>'Some Key 4', 'values'=>[2012-12-12]}
]
data.select { |e| e['displayName'] == 'Some Key 2' }.first
You can group_by the desired key instead, wich will make access faster
hashed_data = data.group_by { |e| e['displayName'] }
hashed_data['Some Key 4']
=> [{"displayName"=>"Some Key 4", "values"=>[1988]}]

CodeIgniter: Using array within array

I am following nettut+ tutorial for pagination and to store POST inputs as querystrings in db. So far, everything works fine until, suppose if I get an array as POST input, i am unable to loop through it and get all the array values and to store into query_array (i.e., store array within array).
The snippets below:
$query_array = array(
'gender' => $this->input->post('gender'),
'minage' => $this->input->post('minage'),
'maxage' => $this->input->post('maxage'),
'Citizenship' => $this->input->post('citizenship'), // checkboxes with name citizenship[]
);
This returns only last stored array value in Citizenship.
The output array:
Array ( [gender] => 1 [minage] => 18 [maxage] => 24 [Citizenship] => 2 )
makes the query string as:
&gender=1&minage=18&maxage=24&Citizenship=2
But, my requirement is to get all the values of 'Citizenship' array instead of last stored value.
The output required to make query string:
Array ( [gender] => 1 [minage] => 18 [maxage] => 24 [Citizenship] => 2 [Citizenship] => 4 [Citizenship] => 6 )
The query string :
&gender=1&minage=18&maxage=24&Citizenship[]=2&Citizenship[]=4&Citizenship[]=6
Any help appreciated..
Thanks.
Doesn't look like code ignighter supports un-named multidimensional arrays as input without a bit of hacking.
If you can access raw $_POST data try replacing
$this->input->post('citizenship')
with
array_map('intval',$_POST['citizenship'])
Alternativly add keys to your post data:
&gender=1&minage=18&maxage=24&Citizenship[0]=2&Citizenship[1]=4&Citizenship[2]=6
I fixed it myself. I just looped through the POST array and got the individual array key & pair values.
foreach($_POST['Citizenship'] as $k => $v) {
$Citizenship[$v] = $v;
}
Hope this helps someone who face similar problem.

increment value in a hash

I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }

Resources