Out of the 10 columns there in the original CSV, I have 4 columns which I need to make integers (to process with MATLAB later; the other 6 columns already contain integer values). These 4 columns are: (1) platform (2) push (3) timestamp, and (4) udid.
An example input is: #other_column, Android, Y, 10-05-2015 3:59:59 PM, #other_column, d0155049772de9, #other_columns
The corresponding output should be: #other_column, 2, 1, 1431273612198, #other_column, 17923, #other_columns
So, I wrote the following code:
require 'csv'
CSV.open('C:\Users\hp1\Desktop\Datasets\NewColumns2.csv', "wb") do |csv|
CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map do |row|
if row['platform']=='Android'
row['platform']=2
elsif row['platform']=='iPhone'
row['platform']=1
end
if row['push']=='Y'
row['push']=1
elsif row['push']=='N'
row['push']=0
end
row['timestamp'].to_time.to_i
row['udid'].to_i
csv<<row
end
end
Now, the first 3 columns, weekday, platform and push, are having a small number of unique values for the whole file (i.e., 7, 2 and 2 respectively), which is why I used the above approach. However, the other 2 columns, timestamp and udid, are different - they have several values, a few of them common to some rows in the CSV, but there are thousands of unique values. And hence I thought of converting them to integers in the manner I showed above.
Anyhow, none of the columns are getting converted at all. Plus, there is another problem with the datetime column as it is in a format which Ruby apparently does not recognize as a legitimate time format (a sample looks like this: 10-05-2015 3:59:59 PM). So, what should I do? Thanks.
Edit - Redo, I missed part of the problem with the udids
Problems
You are using map when you don't need to, CSV#foreach already iterates through all of the rows - remove this
Date - include the ruby standard Time library
Unique ids - it sounds like you want to convert the udid into a shorter unique id since there may be more than one entry per mobile device - use an array to make a collection without repeats and use the index of the device udid in the array as your new shorter unique id
I used this as my input csv:
othercol1,platform,push,timestamp,othercol2,udid,othercol3,othercol4,othercol5,othercol6
11,Android, N, 10-05-2015 3:59:59 PM,22, d0155049772de9,33,44,55,66
11,iPhone, N, 10-05-2015 5:59:59 PM,22, d0155044772de9,33,44,55,66
11,iPhone, Y, 10-06-2015 3:59:59 PM,22, d0155049772de9,33,44,55,66
11,Android, Y, 11-05-2015 3:59:59 PM,22, d0155249772de9,33,44,55,66
Here is my output csv:
11,2,0,1431298799,22,1,33,44,55,66
11,1,0,1431305999,22,2,33,44,55,66
11,1,1,1433977199,22,1,33,44,55,66
11,2,1,1431385199,22,3,33,44,55,66
Here is the script I used:
require 'time' # use ruby standard time library to parse for you
require 'csv'
udids = [] # turn the udid in to a shorter unique id
CSV.open('new.csv', "wb") do |csv|
CSV.foreach('old.csv', headers: true) do |row|
if row['platform']=='Android'
row['platform']=2
elsif row['platform']=='iPhone'
row['platform']=1
end
if row['push'].strip =='Y'
row['push']=1
elsif row['push'].strip =='N'
row['push']=0
end
row['timestamp'] = Time.parse(row['timestamp']).to_i
# turn the udid in to a shorter unique id
unless udids.include?(row['udid'])
udids << row['udid']
end
row['udid'] = udids.index(row['udid']) + 1
csv << row
end
end
This is a wrong usage of map, this is not the function you need. Map is if you want to apply a function to all values in the array, and return the array. What you are doing is iterate, doing some changes, then pushing the modified row into a new array - you can just iterate, no need for the map function to be there:
CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true) instead of CSV.foreach('C:\Users\hp1\Desktop\Datasets\NewColumns.csv', :headers=>true).map
About the date, you can use strptime to transform string into date: DateTime.strptime("10-05-2015 3:59:59 PM", "%d-%m-%Y %l:%M:%S %p"). Here the docs: http://ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/DateTime.html
add :converters => :all to your options, so that the dates and numbers are automatically converted. Then, instead of
row['timestamp'].to_time.to_i
which does the conversion but doesn't put it anywhere (it is not in-place), do this:
row['timestamp'] = row['timestamp'].to_time.to_i
note that this only works with converters, otherwise row['timestamp'] is a string and there is no .to_time method.
Currently trying to generate a random number in a specific range;
and ensure that it would be unique against others stored records.
Using Mysql. Could be like an id, incremented; but can't be it.
Currently testing other existing records in an 'expensive' manner;
but I'm pretty sure that there would be a clean 1/2 lines of code to use
Currently using :
test = 0
Order.all.each do |ord|
test = (0..899999).to_a.sample.to_s.rjust(6, '0')
if Order.find_by_number(test).nil? then
break
end
end
return test
Thanks for any help
Here your are my one-line solution. It is also the quicker one since calls .pluck to retrieve the numbers from the Order table. .select instantiates an "Order" object for every record (that is very costly and unnecessary) while .pluck does not. It also avoids to iterate again each object with a .map to get the "number" field. We can avoid the second .map as well if we convert, using CAST in this case, to a numeric value from the database.
(Array(0...899999) - Order.pluck("CAST('number' AS UNSIGNED)")).sample.to_s.rjust(6, '0')
I would do something like this:
# gets all existing IDs
existing_ids = Order.all.select(:number).map(&:number).map(&:to_i)
# removes them from the acceptable range
available_numbers = (0..899999).to_a - existing_ids
# choose one (which is not in the DB)
available_numbers.sample.to_s.rjust(6, '0')
I think, you can do something like below :
def uniq_num_add(arr)
loop do
rndm = rand(1..15) # I took this range as an example
# random number will be added to the array, when the number will
# not be present
break arr<< "%02d" % rndm unless arr.include?(rndm)
end
end
array = []
3.times do
uniq_num_add(array)
end
array # => ["02", "15", "04"]
I'm trying to display total calls from a twilio object as well as unique calls.
The total calls is simple enough:
# set up a client to talk to the Twilio REST API
#sub_account_client = Twilio::REST::Client.new(#account_sid, #auth_token)
#subaccount = #sub_account_client.account
#calls = #subaccount.calls
#total_calls = #calls.list.count
However, I'm really struggling to figure out how to display unique calls (people sometimes call back form the same number and I only want to count calls from the same number once). I'm thinking this is a pretty simple method or two but I've burnt quite a few hours trying to figure it out (still a ruby noob).
Currently I've been working it in the console as follows:
#sub_account_client = Twilio::REST::Client.new(#account_sid, #auth_token)
#subaccount = #sub_account_client.account
#subaccount.calls.list({})each do |call|
#"from" returns the phone number that called
print call.from
end
This returns the following strings:
+13304833615+13304833615+13304833615+13304833615+13304567890+13304833615+13304833615+13304833615
There are only two unique numbers there so I'd like to be able to return '2' for this.
Calling class on that output shows strings. I've used "insert" to add a space then have done a split(" ") to turn them into arrays but the output is the following:
[+13304833615][+13304833615][+13304833615][+13304833615][+13304567890][+13304833615][+13304833615][+13304833615]
I can't call 'uniq' on that and I've tried to 'flatten' as well.
Please enlighten me! Thanks!
If what you have is a string that you want to manipulate the below works:
%{+13304833615+13304833615+13304833615+13304833615+13304567890+13304833615+13304833615+13304833615}.split("+").uniq.reject { |x| x.empty? }.count
=> 2
However this is more ideal:
#subaccount.calls.list({}).map(&:from).uniq.count
Can you build an array directly instead of converting it into a string first? Try something like this perhaps?
#calllist = []
#subaccount.calls.list({})each do |call|
#"from" returns the phone number that called
#calllist.push call.from
end
you should then be able to call uniq on #calllist to shorten it to the unique members.
Edit: What type of object is #subaccount.calls.list anyway?
uniq should work for creating a unique list of strings. I think you may be getting confused by other non-related things. You don't want .split, that's for turning a single string into an array of word strings (default splits by spaces). Which has turned each single number string, into an array containing only that number. You may also have been confused by performing your each call in the irb console, which will return the full array iterated on, even if your inner loop did the right thing. Try the following:
unique_numbers = #subaccount.calls.list({}).map {|call| call.from }.uniq
puts unique_numbers.inspect
I know this code is not optimal, any ideas on how to improve it?
job_and_cost_code_found = false
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, clean_cost_code].each do |row|
job_and_cost_code_found = true
end
if job_and_cost_code_found == false then
info = linenum + "," + id + ",,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
add_to_exception_output_file(info)
end
You're breaking a lot of simple rules here.
Don't select what you don't use.
You select a number of columns, then completely ignore the result data. What you probably want is a count:
SELECT COUNT(*) AS cost_code_count FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?'
Then you'll get one row that will have either a zero or non-zero value in it. Save this into a variable like:
job_and_cost_codes_found = timberline_db[...][0]['cost_code_count']
Don't compare against false unless you need to differentiate between that and nil
In Ruby only two things evaluate as false, nil and false. Most of the time you will not be concerned about the difference. On rare occasions you might want to have different logic for set true, set false or not set (nil), and only then would you test so specifically.
However, keep in mind that 0 is not a false value, so you will need to compare against that.
Taking into account the previous optimization, your if could be:
if job_and_cost_codes_found == 0
# ...
end
Don't use then or other bits of redundant syntax
Most Ruby style-guides spurn useless syntax like then, just as they recommend avoiding for and instead use the Enumerable class which is far more flexible.
Manipulate data, not strings
You're assembling some kind of CSV-like line in the end there. Ideally you'd be using the built-in CSV library to do the correct encoding, and libraries like that want data, not a string they'd have to parse.
One step closer to that is this:
line = [
linenum,
id,
nil,
"Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
].join(',')
add_to_exception_output_file(line)
You'd presumably replace join(',') with the proper CSV encoding method that applies here. The library is more efficient when you can compile all of the data ahead of time into an array-of-arrays, so I'd recommend doing that if this is the end goal.
For example:
lines = [ ]
# ...
if (...)
# Append an array to the lines to write to the CSV file.
lines << [ ... ]
end
Keep your data in a standard structure like an Array, a Hash, or a custom object, until you're prepared to commit it to its final formatted or encoded form. That way you can perform additional operations on it if you need to do things like filtering.
It's hard to refactor this when I'm not exactly sure what it's supposed to be doing, but assuming that you want to log an error when there's no entry matching a job & code pair, here's what I've come up with:
def fetch_by_job_and_cost_code(job, cost_code)
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, cost_code]
end
if fetch_by_job_and_cost_code(job, clean_cost_code).none?
add_to_exception_output_file "#{linenum},#{id},,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
end
I'm building a Sinatra based app for deployment on Heroku. You can imagine it like a standard URL shortener but where old shortcodes expire and become available for new URLs (I realise this is a silly concept but its easier to explain this way). I'm representing the shortcode in my database as an integer and redefining its reader to give a nice short and unique string from the integer.
As some rows will be deleted, I've written code that goes thru all the shortcode integers and picks the first free one to use just before_save. Unfortunately I can make my code create two rows with identical shortcode integers if I run two instances very quickly one after another, which is obviously no good! How should I implement a locking system so that I can quickly save my record with a unique shortcode integer?
Here's what I have so far:
Chars = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a
CharLength = Chars.length
class Shorts < ActiveRecord::Base
before_save :gen_shortcode
after_save :done_shortcode
def shortcode
i = read_attribute(:shortcode).to_i
return '0' if i == 0
s = ''
while i > 0
s << Chars[i.modulo(CharLength)]
i /= 62
end
s
end
private
def gen_shortcode
shortcode = 0
self.class.find(:all,:order=>"shortcode ASC").each do |s|
if s.read_attribute(:shortcode).to_i != shortcode
# Begin locking?
break
end
shortcode += 1
end
write_attribute(:shortcode,shortcode)
end
def done_shortcode
# End Locking?
end
end
This line:
self.class.find(:all,:order=>"shortcode ASC").each
will do a sequential search over your entire record collection. You'd have to lock the entire table so that, when one of your processes is scanning for the next integer, the others will wait for the first one to finish. This will be a performance killer. My suggestion, if possible, is for the process to be as follows:
Add a column that indicates when a record has expired (do you expire them by time of creation? last use?). Index this column.
When you need to find the next lowest usable number, do something like
Shorts.find(:conditions => {:expired => true},:order => 'shortcode')
This will have the database doing the hard work of finding the lowest expired shortcode. Recall that, in the absence of the :all parameter, the find method will only return the first matching record.
Now, in order to prevent race conditions between processes, you can wrap this in a transaction and lock while doing the search:
Shorts.transaction do
Shorts.find(:conditions => {:expired => true},:order => 'shortcode', :lock => true)
#Do your thing here. Be quick about it, the row is locked while you work.
end #on ending the transaction the lock is released
Now when a second process starts looking for a free shortcode, it will not read the one that's locked (so presumably it will find the next one). This is because the :lock => true parameter gets an exclusive lock (both read/write).
Check this guide for more on locking with ActiveRecord.