I have some time now and I do some challenges from SPOJ in Ruby. One think that bothers me is how I read user input faster.
For example, this problem: http://www.spoj.com/problems/TEST/
I have tried many solutions, all based on gets:
while ((i=STDIN.gets.to_i) != 42) do
puts i
end
$stdin.each_line do |line|
exit if line.strip! == "42"
puts line
end
def input
while (true)
gets
exit if ($_.chomp == "42")
puts $_.chomp
end
end
input
and other variations with gets. Best time that I get is 0.01s and memory footprint of 7.2 Mb. But looking at best submissions using Ruby language first 5 pages are all 0.00s and 3.1Mb of memory used.
Any idea how I can get the input faster?
Also all the tests there are using STDIN to pass the test cases to the app, some very large (hundreds of Mb) and I suspect that gets is too slow for reading that kind of input (or chomp might be). Is some other way faster than gets?
I can get 0.0 times by keeping the code simple and knowing what's going to be a faster way to do things. I won't show my code, because people are supposed to figure out the problem on their own:
while ((i=STDIN.gets.to_i) != 42) do
puts i
end
Ugh. Don't convert from the retrieved string to an integer just to compare it to 42. Instead compare to '42', though, remember, you could be getting trailing line-endings on the strings.
$stdin.each_line do |line|
exit if line.strip! == "42"
puts line
end
Again, don't strip, instead use a smarter comparison. Also, strip! could bite you if nothing changed in the string by returning nil instead of the string you're expecting. I'd use strip instead, because it'd is guaranteed to return the expected value.
def input
while (true)
gets
exit if ($_.chomp == "42")
puts $_.chomp
end
end
input
Two chomp are costly. You should $_.chomp! separately.
Something to know, Ruby's regular expression engine is very fast, and an anchored regular expression pattern will outrun a regular instring search.
Test your code variations using Ruby's Benchmark class and you can narrow down which differences help.
I have one version that uses a loop, very similar to your middle solution, and another that's a single line of code. Both were 0.0 sec.
Related
In a blog post about unconditional programming Michael Feathers shows how limiting if statements can be used as a tool for reducing code complexity.
He uses a specific example to illustrate his point. Now, I've been thinking about other specific examples that could help me learn more about unconditional/ifless/forless programming.
For example in this cat clone there is an if..else block:
#!/usr/bin/env ruby
if ARGV.length > 0
ARGV.each do |f|
puts File.read(f)
end
else
puts STDIN.read
end
It turns out ruby has ARGF which makes this program much simpler:
#!/usr/bin/env ruby
puts ARGF.read
I'm wondering if ARGF didn't exist how could the above example be refactored so there is no if..else block?
Also interested in links to other illustrative specific examples.
Technically you can,
inputs = { ARGV => ARGV.map { |f| File.open(f) }, [] => [STDIN] }[ARGV]
inputs.map(&:read).map(&method(:puts))
Though that's code golf and too clever for its own good.
Still, how does it work?
It uses a hash to store two alternatives.
Map ARGV to an array of open files
Map [] to an array with STDIN, effectively overwriting the ARGV entry if it is empty
Access ARGV in the hash, which returns [STDIN] if it is empty
Read all open inputs and print them
Don't write that code though.
As mentioned in my answer to your other question, unconditional programming is not about avoiding if expressions at all costs but about striving for readable and intention revealing code. And sometimes that just means using an if expression.
You can't always get rid of a conditional (maybe with an insane number of classes) and Michael Feathers isn't advocating that. Instead it's sort of a backlash against overuse of conditionals. We've all seen nightmare code that's endless chains of nested if/elsif/else and so has he.
Moreover, people do routinely nest conditionals inside of conditionals. Some of the worst code I've ever seen is a cavernous nightmare of nested conditions with odd bits of work interspersed within them. I suppose that the real problem with control structures is that they are often mixed with the work. I'm sure there's some way that we can see this as a form of single responsibility violation.
Rather than slavishly try to eliminate the condition, you could simplify your code by first creating an array of IO objects from ARGV, and use STDIN if that list is empty.
io = ARGV.map { |f| File.new(f) };
io = [STDIN] if !io.length;
Then your code can do what it likes with io.
While this has strictly the same number of conditionals, it eliminates the if/else block and thus a branch: the code is linear. More importantly, since it separates gathering data from using it, you can put it in a function and reuse it further reducing complexity. Once it's in a function, we can take advantage of early return.
# I don't have a really good name for this, but it's a
# common enough idiom. Perl provides the same feature as <>
def arg_files
return ARGV.map { |f| File.new(f) } if ARGV.length;
return [STDIN];
end
Now that it's in a function, your code to cat all the files or stdin becomes very simple.
arg_files.each { |f| puts f.read }
First, although the principle is good, you have to consider other things that are more importants such as readability and perhaps speed of execution.
That said, you could monkeypatch the String class to add a read method and put STDIN and the arguments in an array and start reading from the beginning until the end of the array minus 1, so stopping before STDIN if there are arguments and go on until -1 (the end) if there are no arguments.
class String
def read
File.read self if File.exist? self
end
end
puts [*ARGV, STDIN][0..ARGV.length-1].map{|a| a.read}
Before someone notices that I still use an if to check if a File exists, you should have used two if's in your example to check this also and if you don't, use a rescue to properly inform the user.
EDIT: if you would use the patch, read about the possible problems at these links
http://blog.jayfields.com/2008/04/alternatives-for-redefining-methods.html
http://www.justinweiss.com/articles/3-ways-to-monkey-patch-without-making-a-mess/
Since the read method isn't part of String the solutions using alias and super are not necessary, if you plan to use a Module, here is how to do that
module ReadString
def read
File.read self if File.exist? self
end
end
class String
include ReadString
end
EDIT: just read about a safe way to monkey patch, for your documentation see https://solidfoundationwebdev.com/blog/posts/writing-clean-monkey-patches-fixing-kaminari-1-0-0-argumenterror-comparison-of-fixnum-with-string-failed?utm_source=rubyweekly&utm_medium=email
Select key words in a string to change their format in Ruby
I have a big string (text) and an Array of strings (key_words) as below:
text = 'So in this election, we cannot sit back and hope that everything works out for the best. We cannot afford to be tired or frustrated or cynical. No, hear me. Between now and November, we need to do what we did eight years ago and four years ago…'
key_words = ['frustrated', 'tired', 'hope']
My objective is to print each word in ‘text’ while changing the colour and case of the words that are included in key_words. I’ve been able to do that by doing:
require 'colorize'
text.split(/\b/).each do |x|
if key_words.include?(x.downcase) ; print '#{x}'.colorize(:red)
else print '#{x}' end
end
However, since I don’t want to include many words in key_words I want to make the selection more sensitive going beyond an exact match. Such as if, for example:
key_words = ['frustrat', 'tire', 'hope'] => the algorithm would select both 'Frustration', 'Frustrated' or 'Tiring' and 'Tired' or 'Hope' and 'Hopeful'.
I’ve tried playing with word lengths in both the string and the array as below but it’s seems very inefficient solution and I’m getting very confused with the usage of .any? and .include? methods in this scenario.
key_words = ['frustrated', 'tired', 'hope']
key_words_abb = []
key_words.each { |x| key_words_abb << x.downcase[0][0..x.length-2]}
text.split(/\b/).each do |x|
if key_words_abb.include?(x.downcase[0][0..x.length-2]); print '#{x}'.colorize(:red)
else print x
end
end
Since I can’t find a specific solution online I would appreciate your help.
It's worth noting that when doing repeated substitutions on strings, especially longer ones, you'll want your substitution method to be as efficient as possible. Spinning through an array of things to switch out is painfully expensive, especially as that list grows.
Here's a variation on your approach:
replacement = Regexp.new('\b%s\b' % [ Regexp.union(key_words) ])
replaced = text.gsub(replacement) do |s|
s.colorize(:red)
end
puts replaced
If you're using that substitution repeatedly you should persist the Regexp object into a constant. That avoids having to compile it for each string you're adjusting. If the list changes based on factors hard to predict, leave it like this and produce it dynamically.
One thing to note about using Ruby is it's often best to express your code as a series of transformations with output as a final step. Putting things like print in the middle of a loop complicates things unnecessarily. If you want to add an additional step to your loop you have to do a lot of extra work to move that print to a later stage. With the approach here you can just chain on the end and do whatever you want.
I'd like to point out I tried quite extensively to find a solution for this and the closest I got was this. However I couldn't see how I could use map to solve my issue here. I'm brand new to Ruby so please bear that in mind.
Here's some code I'm playing with (simplified):
def base_word input
input_char_array = input.split('') # split string to array of chars
#file.split("\n").each do |dict_word|
input_text = input_char_array
dict_word.split('').each do |char|
if input_text.include? char.downcase
input_text.slice!(input_text.index(char))
end
end
end
end
I need to reset the value of input_text back to the original value of input_char_array after each cycle, but from what I gather since Ruby is reference-based, the modifications I make with the line input_text.slice!(input_text.index(char)) are reflected back in the original reference, and I end up assigning input_text to an empty array fairly quickly as a result.
How do I mitigate that? As mentioned I've tried to use .map but maybe I haven't fully wrapped my head around how I ought to go about it.
You can get an independent reference by cloning the array. This, obviously, has some RAM usage implications.
input_text = input_char_array.dup
The Short and Quite Frankly Not Very Good Answer
Using slice! overwrites the variable in place, equivalent to
input_text = input_text.slice # etc.
If you use plain old slice instead, it won't overwrite input_text.
The Longer and Quite Frankly Much Better Answer
In Ruby, code nested four levels deep is often a smell. Let's refactor, and avoid the need to reset a loop at all.
Instead of splitting the file by newline, we'll use Ruby's built-in file handling module to read through the lines. Memoizing it (the ||= operator) may prevent it from reloading the file each time it's referenced, if we're running this more than once.
def dictionary
#dict ||= File.open('/path/to/dictionary')
end
We could also immediately make all the words lowercase when we open the file, since every character is downcased individually in the original example.
def downcased_dictionary
#dict ||= File.open('/path/to/dictionary').each(&:downcase)
end
Next, we'll use Ruby's built-in file and string functions, including #each_char, to do the comparisons and output the results. We don't need to convert any inputs into Arrays (at all!), because #include? works on strings, and #each_char iterates over the characters of a string.
We'll decompose the string-splitting into its own method, so the loop logic and string logic can be understood more clearly.
Lastly, by using #slice instead of #slice!, we don't overwrite input_text and entirely avoid the need to reset the variable later.
def base_word(input)
input_text = input.to_s # Coerce in case it's not a string
# Read through each line in the dictionary
dictionary.each do |word|
word.each_char {|char| slice_base_word(input_text, char) }
end
end
def slice_base_word(input, char)
input.slice(input.index(char)) if input.include?(char)
end
So I'm trying to find a way to Donald Duck-ify statements inputed by users (judge me later).
This is my code so far:
puts "Wanna get Donald Duck-ified?"
print "Type some text here:"
user_input = gets.chomp
if user_input.gsub!(/s/,"th").gsub!(/ce/,"th").gsub!(/ci/,"th").gsub!(/cy/,"th")
puts "Boop - there go your s's and soft c's!"
else
puts "Dang, you didn't have any s's or soft c's!"
end
puts "#{user_input}"
Upon testing it with some input of my own ("square cycle caesar circle", specifically), I'm getting "undefined method `gsub!' for nil:NilClass" as an error.
How is gsub! undefined? If the code runs with user_input.gsub!(/s/,"th") on it own, without any other methods behind it, it works fine. Once a second method is added, the else code runs and only replacements for "s" are made. All four and I get the error above.
Does there happen to be another way of substituting multiple patterns (as named by the Ruby docs) with a single replacement? I've spent the last hours researching the problem and I still can't totally tell what the issue is.
New to Ruby. Encouraged and motivated.
Many thanks in advance.
Don't use #gsub! chained. (Actually, don't use #gsub! at all for most code.)
[gsub!] Performs the substitutions of String#gsub in place, returning str, or nil if no substitutions were performed.
Switch the code to #gsub which doesn't cause side-effects (yay!) and always returns a string (yay!) - simply compare the result with the original (unmodified) string.
Also, one could use the gsub form that accepts a hash (since Ruby 1.9.something). This has a subtle difference that replaced values will not be themselves replaced, although it doesn't matter here.
user_input.gsub(/s|ce|ci|cy/, { "s"=>"th", "ce"=>"th", "ci"=>"th", "cy"=>"th" })
# or since all are replaced with "th" (which I just noticed =^_^=) ..
user_input.gsub(/s|ce|ci|cy/, "th")
(I still recommend against gsub! because I find side effects upon strings disconcerting. However, it would work reliably when used with the non-chained forms above.)
Ruby's gsub! returns nil if it performs no substitutions. This means you can't reliably chain it like you do. If you want to verify that any of the gsubs have made any change, you can chain non-destructive gsubs (without the bang; return a new string instead of modifying the current one) instead:
input = gets.chomp
replaced = input.gsub(/s/,"th").gsub(/ce/,"th").gsub(/ci/,"th").gsub(/cy/,"th")
if input == replaced
...
I'm pretty new to programming, so be gentle. I'm trying to extract IBSN numbers from a library database .dat file. I have written code that works, but it is only searching through about half of the 180MB file. How can I adjust it to search the whole file? Or how can I write a program the will split the dat file into manageable chunks?
edit: Here's my code:
export = File.new("resultsfinal.txt","w+")
File.open("bibrec2.dat").each do |line|
line.scan(/[a]{1}[1234567890xX]{10}\W/) do |x|
export.puts x
end
line.scan(/[a]{1}[1234567890xX]{13}/) do |x|
export.puts x
end
end
You should try to catch exception to check if the problem is really on the read block or not.
Just so you know I already made a script with kinda the same syntax to search real big file of ~8GB without problem.
export = File.new("resultsfinal.txt","w+")
File.open("bibrec2.dat").each do |line|
begin
line.scan(/[a]{1}[1234567890xX]{10}\W/) do |x|
export.puts x
end
line.scan(/[a]{1}[1234567890xX]{13}/) do |x|
export.puts x
end
rescue
puts "Problem while adding the result"
end
end
The main thing is to clean up and combine the regex for performance benefits. Also you should always use block syntax with files to ensure the fd's are getting closed properly. File#each doesn't load the whole file into memory, it does one line at a time:
File.open("resultsfinal.txt","w+") do |output|
File.open("bibrec2.dat").each do |line|
output.puts line.scan(/a[\dxX]{10}(?:[\dxX]{3}|\W)/)
end
end
file = File.new("bibrec2.dat", "r")
while (line = file.gets)
line.scan(/[a]{1}[1234567890xX]{10}\W/) do |x|
export.puts x
end
line.scan(/[a]{1}[1234567890xX]{13}/) do |x|
export.puts x
end
end
file.close
As to the performance issue, I can't see anything particularly worrying about the file size: 180MB shouldn't pose any problems. What happens to memory use when you're running your script?
I'm not sure, however, that your Regular Expressions are doing what you want. This, for example:
/[a]{1}[1234567890xX]{10}\W/
does (I think) this:
one "a". Do you really want to match for an "a"? "a" would suffice, rather than "[a]{1}", in that case.
exactly 10 of (digit or "x" or "X")
a single "non-word" character i.e. not a-z, A-Z, 0-9 or underscore
There are a couple of sample ISBN matchers here and here, although they seem to be matching something more like the format that we see on the back cover of a book and I'm guessing your input file has stripped out some of that formatting.
You can look into using File#truncate and IO#seek and employ the binary search type algorithm. #truncate may be destructive so you should duplicate the file (I know this is a hassle).
middle = File.new("my_huge_file.dat").size / 2
tmpfile = File.new("my_huge_file.dat", "r+").truncate(middle)
# run search algoritm on 'tmpfile'
File.open("my_huge_file.dat") do |huge_file|
huge_file.seek(middle + 1)
# run search algorithm from here
end
The code is highly untested, brittle and incomplete. But I hope it gives you a platform to build of off.
If you are programming on a modern operating system and the computer has enough memory (say 512megs), Ruby should have no problem reading the entire file into memory.
Things typically get iffy when you get to about a 2 gigabyte working set on a typical 32bit OS.