I am relatively new to coding and am learning ruby right now. I came across a problem where I have a huge data record (>100k record) consisting of unique ID and another consisting of the date of birth. So it's basically a 2D array. How do I go about creating a method such that every time when I key in method(year), it will give me all the unique ID of those born in the year i choose? And how do I loop this?
The method I tried doing is as follow:
def Id_with_year(year)
emPloyee_ID_for_searching_year = [ ]
employeelist.sort_by!{|a,b|b}
if employeelist.select{|a,b| b == year}.map{|a,b| a}
return emPloyee_ID_for_searching_year
end
end
I should point out that the ID are sorted. That's why I am trying to sort the year in this method so that it will give me all the ID for the year I key in. The output I had was that it returned me [ ] with nothing inside instead of the ID.
Sidenote: methods in ruby are to be named in snake case (this is not mandatory, though.)
The problem you experience is you return what was never changed. The below should work:
def id_with_year(year)
employeelist.sort_by(&:last) # sorting by last element of array
.select{|_,b| b == year} # select
.map(&:first) # map to the first element
end
Out of the 10 columns there in the original CSV, I have 4 columns which I need to make integers (to process with MATLAB later; the other 6 columns already contain integer values). These 4 columns are: (1) platform (2) push (3) timestamp, and (4) udid.
An example input is: #other_column, Android, Y, 10-05-2015 3:59:59 PM, #other_column, d0155049772de9, #other_columns
The corresponding output should be: #other_column, 2, 1, 1431273612198, #other_column, 17923, #other_columns
So, I wrote the following code:
require 'csv'
CSV.open('C:\Users\hp1\Desktop\Datasets\NewColumns2.csv', "wb") do |csv|
CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map do |row|
if row['platform']=='Android'
row['platform']=2
elsif row['platform']=='iPhone'
row['platform']=1
end
if row['push']=='Y'
row['push']=1
elsif row['push']=='N'
row['push']=0
end
row['timestamp'].to_time.to_i
row['udid'].to_i
csv<<row
end
end
Now, the first 3 columns, weekday, platform and push, are having a small number of unique values for the whole file (i.e., 7, 2 and 2 respectively), which is why I used the above approach. However, the other 2 columns, timestamp and udid, are different - they have several values, a few of them common to some rows in the CSV, but there are thousands of unique values. And hence I thought of converting them to integers in the manner I showed above.
Anyhow, none of the columns are getting converted at all. Plus, there is another problem with the datetime column as it is in a format which Ruby apparently does not recognize as a legitimate time format (a sample looks like this: 10-05-2015 3:59:59 PM). So, what should I do? Thanks.
Edit - Redo, I missed part of the problem with the udids
Problems
You are using map when you don't need to, CSV#foreach already iterates through all of the rows - remove this
Date - include the ruby standard Time library
Unique ids - it sounds like you want to convert the udid into a shorter unique id since there may be more than one entry per mobile device - use an array to make a collection without repeats and use the index of the device udid in the array as your new shorter unique id
I used this as my input csv:
othercol1,platform,push,timestamp,othercol2,udid,othercol3,othercol4,othercol5,othercol6
11,Android, N, 10-05-2015 3:59:59 PM,22, d0155049772de9,33,44,55,66
11,iPhone, N, 10-05-2015 5:59:59 PM,22, d0155044772de9,33,44,55,66
11,iPhone, Y, 10-06-2015 3:59:59 PM,22, d0155049772de9,33,44,55,66
11,Android, Y, 11-05-2015 3:59:59 PM,22, d0155249772de9,33,44,55,66
Here is my output csv:
11,2,0,1431298799,22,1,33,44,55,66
11,1,0,1431305999,22,2,33,44,55,66
11,1,1,1433977199,22,1,33,44,55,66
11,2,1,1431385199,22,3,33,44,55,66
Here is the script I used:
require 'time' # use ruby standard time library to parse for you
require 'csv'
udids = [] # turn the udid in to a shorter unique id
CSV.open('new.csv', "wb") do |csv|
CSV.foreach('old.csv', headers: true) do |row|
if row['platform']=='Android'
row['platform']=2
elsif row['platform']=='iPhone'
row['platform']=1
end
if row['push'].strip =='Y'
row['push']=1
elsif row['push'].strip =='N'
row['push']=0
end
row['timestamp'] = Time.parse(row['timestamp']).to_i
# turn the udid in to a shorter unique id
unless udids.include?(row['udid'])
udids << row['udid']
end
row['udid'] = udids.index(row['udid']) + 1
csv << row
end
end
This is a wrong usage of map, this is not the function you need. Map is if you want to apply a function to all values in the array, and return the array. What you are doing is iterate, doing some changes, then pushing the modified row into a new array - you can just iterate, no need for the map function to be there:
CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true) instead of CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map
About the date, you can use strptime to transform string into date: DateTime.strptime("10-05-2015 3:59:59 PM", "%d-%m-%Y %l:%M:%S %p"). Here the docs: http://ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/DateTime.html
add :converters => :all to your options, so that the dates and numbers are automatically converted. Then, instead of
row['timestamp'].to_time.to_i
which does the conversion but doesn't put it anywhere (it is not in-place), do this:
row['timestamp'] = row['timestamp'].to_time.to_i
note that this only works with converters, otherwise row['timestamp'] is a string and there is no .to_time method.
I want to compare every object in lectures with each other and if some_condition is true, the second object has to be deleted:
toDelete=[]
lectures.combination(2).each do |first, second|
if (some_condition)
toDelete << second
end
end
toDelete.uniq!
lectures=lectures-toDelete
I got some weird errors while trying to delete inside the .each loop, so I came up with this approach.
Is there a more efficient way to do this?
EDIT after first comments:
I wanted to keep the source code free of unnecessary things, but now that you ask:
The elements of the lectures array are hashes containing data of different university lectures, like the name, room,the calendar weeks in which they are taught and begin and end time.
I parse the timetables of all student groups to get this data, but because some lectures are held in more than one student group and these sometimes differ in the weeks they are taught, I compare them with each other. If the compared ones only differ in certain values, I add the values from the second object to the first object and delete the second object. That's why.
The errors when deleting while in .each-loop: When using the Rails Hash.diff method, I got something like "Cannot convert Symbol to Integer". Turns out there was suddenly an Integer value of 16 in the array, although I tested before the loop that there are only hashes in the array...
Debugging is really hard if you have 9000 hashes.
EDIT:
Sample Data:
lectures = [ {:day=>0, :weeks=>[11, 12, 13, 14], :begin=>"07:30", :end=>"09:30", :rooms=>["Li201", "G221"], :name=>"TestSubject1", :kind=>"Vw", :lecturers=>["WALDM"], :tut_groups=>["11INM"]},
{:day=>0, :weeks=>[11, 12, 13, 14], :begin=>"07:30", :end=>"09:30", :rooms=>["Li201", "G221"], :name=>"TestSubject1", :kind=>"Vw", :lecturers=>["WALDM"], :tut_groups=>["11INM"]} ]
You mean something like this?
cleaned_lectures = lectures.combination(2).reject{|first, second| some_condition}
I believe that I may be missing something here, so please bear with me as I explain two scenarios in hopes to reconcile my misunderstanding:
My end goal is to create a dataset that's acceptable by Highcharts via lazy_high_charts, however in this quest, I'm finding that it is rather particular about the format of data that it receives.
A) I have found that when data is formatted like this going into it, it draws the points just fine:
[0.0000001240,0.0000000267,0.0000000722, ..., 0.0000000512]
I'm able to generate an array like this simply with:
array = Array.new
data.each do |row|
array.push row[:datapoint1].to_f
end
B) Yet, if I attempt to use the map function, I end up with a result like and Highcharts fails to render this data:
[[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], ..., [3.79e-09]]
From code like:
array = data.map{|row| [(row.datapoint1.to_f)] }
Is there a way to coax the map function to produce results in B that more akin to the scenario A resultant data structure?
This get's more involved as I have to also add datetime into this, however that's another topic and I just want to understand this first and what can be done to perhaps further control where I'm going.
Ultimately, EVEN SCENARIO B SHOULD WORK according to the data in the example here: http://www.highcharts.com/demo/spline-irregular-time (press the "View options" button at bottom)
Heck, I'll send you a sucker in the mail if you can fill me in on that part! ;)
You can fix arrays like this
[[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], ..., [3.79e-09]]
that have nested arrays inside them by using the flatten method on the array.
But you should be able to avoid generating nested arrays in the first place. Just remove the square brackets from your map line:
array = data.map{|row| row.datapoint1.to_f }
Code
a = [[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], [3.79e-09]]
b = a.flatten.map{|el| "%.10f" % el }
puts b.inspect
Output
["0.0000000067", "0.0000000044", "0.0000000021", "0.0000000025", "0.0000000038"]
Unless I, too, am missing something, your problem is that you're returning a single-element array from your block (thereby creating an array of arrays) instead of just the value. This should do you:
array = data.map {|row| row.datapoint1.to_f }
# => [ 6.67e-09, 4.39e-09, 2.1e-09, 2.52e-09, ..., 3.79e-09 ]
I'm trying to build a hash from an array. Basically I want to take the unique string values of the array and build a hash with a key. I'm also trying to figure out how to record how many times that unique word happens.
#The text from the .txt file:
# **Bob and George are great! George and Sam are great.
#Bob, George, and sam are great!**
#The source code:
count_my_rows = File.readlines("bob.txt")
row_text = count_my_rows.join
puts row_text.split.uniq #testing to make sure array is getting filled
Anyways I've tried http://ruby-doc.org/core/classes/Hash.html
I think I need to declare a empty hash with name.new to start I have no idea how to fill it up though. I'm assuming some iteration through the array fills the hash. I'm starting to think I need to record the value as a separate array storing the time it occurs and the word then assign the hash key to it.
Example = { ["Bob",2] => 1 , ["George",3], =>2 }
Leave some code so I can mull over it.
To get you started,
h={}
h.default=0
File.read("myfile").split.each do |x|
h[x]+=1
end
p h
Note: this is not complete solution