Ruby grep with line number - ruby

What could be the best way of getting the matching lines with the line numbers using Ruby's Enumerable#grep method. (as we use -n or --line-number switch with grep command).

Enumerable#grep doesn't let you do that, at least by default. Instead, I came up with:
text = 'now is the time
for all good men
to come to the aid
of their country'
regex = /aid/
hits = text.lines.with_index(1).inject([]) { |m,i| m << i if (i[0][regex]); m }
hits # => [["to come to the aid\n", 3]]

maybe something like this:
module Enumerable
def lgrep(pattern){|e,| e =~ pattern}

This isn't elegant or efficient, but why not just number the lines before grepping?

You can kludge it in Ruby 1.8.6 like so:
require 'enumerator'
class Array
def grep_with_index(regex)
self.enum_for(:each_with_index).select {|x,i| x =~ regex}
arr = ['Foo', 'Bar', 'Gah']
arr.grep_with_index(/o/) # => [[0, 'Foo']]
arr.grep_with_index(/a/) # => [[1, 'Bar'], [2, 'Gah']]
Or if you're looking for tips on writing a grep-like utility in Ruby. Something like this should work:
def greplines(filename, regex)
lineno = 0 do |file|
file.each_line do |line|
puts "#{lineno += 1}: #{line}" if line =~ regex

>> lines=["one", "two", "tests"]
=> ["one", "two", "tests"]
>> lines.grep(/test/){|x| puts "#{lines.index(x)+1}, #{x}" }
3, tests

To mash up the Tin Man's and ghostdog74's answers
text = 'now is the time
for all good men
to come to the aid
of their country'
regex = /aid/
text.lines.grep(/aid/){|x| puts "#{text.lines.find_index(x)+1}, #{x}" }
# => 3, to come to the aid

A modification to the solution given by the Tin Man. This snippet will return a hash having line numbers as keys, and matching lines as values. This one also works in ruby 1.8.7.
text = 'now is the time
for all good men
to come to the aid
of their country'
regex = /aid/
hits = text.lines.each_with_index.inject({}) { |m, i| m.merge!({(i[1]+1) => i[0].chomp}) if (i[0][regex]); m}
hits #=> {3=>"to come to the aid"}

Put text in a file
now is the time
for all good men
to come to the aid
of their country
Command line (alternative of grep or awk command )
ruby -ne ' puts $_ if $_=~/to the/' test.log
Try this also
ruby -na -e ' puts $F[2] if $_=~/the/' test.log
ruby -na -e ' puts $_.split[2] if $_=~/the/' test.log
This is similar to awk command.

Another suggestion:
lines.find_index{ |l| l=~ regex }.


Print Horizontal Line in Ruby

Can ruby's puts or print draw horizontal line kind of like bash does with printf+tr does ?
printf '%20s\n' | tr ' ' -
this will draw:
You can use the following snippet
puts "-"*20
Check this for more help.
You might be interested in formatting using ljust, rjust and center as well.
I use a quick puts "*"*80 for debug purposes. I'm sure there are better ways.
For fancy lines:
p 'MY_LINE'.center(80,'_-')
#=> "_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-MY_LINE_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_"
You could also have the following:
puts "".center(20, "-")
irb(main):005:0> puts "".center(20, '-')
=> "--------------------"
This could be more flexible if you wanted to add additional information:
irb(main):007:0> puts "end of task".center(20, "-")
----end of task-----
=> nil
You can also use String#ljust or String#rjust.
puts ''.rjust(20,"-")
# >> --------------------
puts ''.ljust(20,"-")
# >> --------------------

Ruby: Use condition result in condition block

I have such code
reg = /(.+)_path/
if reg.match('home_path')
puts reg.match('home_path')[0]
This will eval regex twice :(
reg = /(.+)_path/
result = reg.match('home_path')
if result
puts result[0]
But it will store variable result in memory till.
I have one functional-programming idea
/(.+)_path/.match('home_path').compact.each do |match|
puts match[0]
But seems there should be better solution, isn't it?
There are special global variables (their names start with $) that contain results of the last regexp match:
r = /(.+)_path/
# $1 - the n-th group of the last successful match (may be > 1)
puts $1 if r.match('home_path')
# => home
# $& - the string matched by the last successful match
puts $& if r.match('home_path')
# => home_path
You can find full list of predefined global variables here.
Note, that in the examples above puts won't be executed at all if you pass a string that doesn't match the regexp.
And speaking about general case you can always put assignment into condition itself:
if m = /(.+)_path/.match('home_path')
puts m[0]
Though, many people don't like that as it makes code less readable and gives a good opportunity for confusing = and ==.
My personal favorite (w/ 1.9+) is some variation of:
if /(?<prefix>.+)_path/ =~ "home_path"
puts prefix
If you really want a one-liner: puts /(?<prefix>.+)_path/ =~ 'home_path' ? prefix : false
See the Ruby Docs for a few limitations of named captures and #=~.
From the docs: If a block is given, invoke the block with MatchData if match succeed.
/(.+)_path/.match('home_path') { |m| puts m[1] } # => home
/(.+)_path/.match('homepath') { |m| puts m[1] } # prints nothing
How about...
if m=/regex here/.match(string) then puts m[0] end
A neat one-line solution, I guess :)
how about this ?
puts $~ if /regex/.match("string")
$~ is a special variable that stores the last regexp match. more info:
Actually, this can be done with no conditionals at all. (The expression evaluates to "" if there is no match.)
puts /(.+)_path/.match('home_xath').to_a[0].to_s

How do I count the number of instances of particular words in a paragraph?

I'd like to count the number of times a set of words appear in each paragraph in a text file. I am able to count the number of times a set of words appears in an entire text.
It has been suggested to me that my code is really buggy, so I'll just ask what I would like to do, and if you want, you can look at the code I have at the bottom.
So, given that "frequency_count.txt" has the words "apple pear grape melon kiwi" in it, I want to know how often "apple" shows up in each paragraph of a separate file "test_essay.txt", how often pear shows up, etc., and then for these numbers to be printed out in a series of lines of numbers, each corresponding to a paragraph.
For instance:
apple, pear, grape, melon, kiwi
Where each line corresponds to one of the paragraphs.
I am very, very new to Ruby, so thank you for your patience.
output_file = '/Users/yirenlu/Quora-Personal-Analytics/weka_input6.csv'
o =, "r+")
common_words = '/Users/yirenlu/Quora-Personal-Analytics/frequency_count.txt'
c =, "r")
words1 = $line1.split
the_file = '/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt'
f =, "r")
rows = File.readlines("/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt")
text = rows.join
paragraph = text.split(/\n\n/)
h =
puts "this is each paragraph"
puts "this is each line"
words = line.split
if w1 == w
if h.has_key?(w)
h[w1] = h[w1] + 1
h[w1] = 1
$x = h[w1]
o.print "#{$x},"
o.print "\n"
o.print "#{$line1}"
If you're used to PHP or Perl you may be under the impression that a variable like $line1 is local, but this is a global. Use of them is highly discouraged and the number of instances where they are strictly required is very short. In most cases you can just omit the $ and use variables that way with proper scoping.
This example also suffers from nearly unreadable indentation, though perhaps that was an artifact of the cut-and-paste procedure.
Generally what you want for counters is to create a hash with a default of zero, then add to that as required:
# Create a hash where the default values for each key is 0
counter =
# Add to the counters where required
counter['foo'] += 1
counter['bar'] += 2
puts counter['foo']
# => 1
puts counter['baz']
# => 0
You basically have what you need, but everything is all muddled and just needs to be organized better.
Here are two one-liners to calculate frequencies of words in a string.
The first one is a bit easier to understand, but it's less effective:
txt.scan(/\w+/).group_by{|word| word.downcase}.map{|k,v| [k, v.size]}
# => [['word1', 1], ['word2', 5], ...]
The second solution is:
txt.scan(/\w+/).inject( { |hash, w| hash[w.downcase] += 1; hash}
# => {'word1' => 1, 'word2' => 5, ...}
This could be shorter and easier to read if you use:
The CSV library.
A more functional approach using map and blocks.
require 'csv'
common_words = %w(apple pear grape melon kiwi)
text ="test_essay.txt").read
def word_frequency(words, text) { |word| text.scan(/\b#{word}\b/).length }
end"file.csv", "wb") do |csv|
paragraphs = text.split /\n\n/
paragraphs.each do |para|
csv << word_frequency(common_words, para)
Note this is currently case-sensitive but it's a minor adjustment if you want case-insensitivity.
Here's an alternate answer, which is has been tweaked for conciseness (though not as easy to read as my other answer).
require 'csv'
words = %w(apple pear grape melon kiwi)
text ="test_essay.txt").read"file.csv", "wb") do |csv|
text.split(/\n\n/).map {|p| csv << {|w| p.scan(/\b#{w}\b/).length}}
I prefer the slightly longer but more self-documenting code, but it's fun to see how small it can get.
What about this:
# Create an array of regexes to be used in `scan' in the loop.
# `\b' makes sure that `barfoobar' does not match `bar' or `foo'.
p word_list ="frequency_count.txt"){|io|\w+/)}.map{|w| /\b#{w}\b/}"test_essay.txt") do |io|
loop do
# Add lines to `paragraph' as long as there is a continuous line
paragraph = ""
# A `l.chomp.empty?' becomes true at paragraph border
while l = io.gets and !l.chomp.empty?
paragraph << l
p{|re| paragraph.scan(re).length}
# The end of file has been reached when `l == nil'
break unless l
To count how many times one word appears in a text:
text = "word aaa word word word bbb ccc ccc"
text.scan(/\w+/).count("word") # => 4
To count a set of words:
text = "word aaa word word word bbb ccc ccc"
wlist = text.scan(/\w+/)
wset = ["word", "ccc"]
result = {}
wset.each {|word| result[word] = wlist.count(word) }
result # => {"word" => 4, "ccc" => 2}
result["ccc"] # => 2

Looking to clean up a small ruby script

I'm looking for a much more idiomatic way to do the following little ruby script."channels.xml").each do |line|
if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
Thanks in advance for any suggestions.
The original:"channels.xml").each do |line|
if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
can be changed into this:
m = nil
open("channels.xml").each do |line|
puts m if m = line.match(%r|(mms://{1}[\w\./-]+)|)
end can be changed to just open.
if XYZ
puts XYZ
can be changed to puts x if x = XYZ as long as x has occurred at some place in the current scope before the if statement.
The Regexp '(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)' can be refactored a little bit. Using the %rXX notation, you can create regular expressions without the need for so many backslashes, where X is any matching character, such as ( and ) or in the example above, | |.
This character class [a-zA-Z\.\d\/\w-] (read: A to Z, case insensitive, the period character, 0 to 9, a forward slash, any word character, or a dash) is a little redundant. \w denotes "word characters", i.e. A-Za-z0-9 and underscore. Since you specify \w as a positive match, A-Za-z and \d are redundant.
Using those 2 cleanups, the Regexp can be changed into this: %r|(mms://{1}[\w\./-]+)|
If you'd like to avoid the weird m = nil scoping sorcery, this will also work, but is less idiomatic:
open("channels.xml").each do |line|
m = line.match(%r|(mms://{1}[\w\./-]+)|) and puts m
or the longer, but more readable version:
open("channels.xml").each do |line|
if m = line.match(%r|(mms://{1}[\w\./-]+)|)
puts m
One very easy to read approach is just to store the result of the match, then only print if there's a match:"channels.xml").each do |line|
m = line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts m if m
If you want to start getting clever (and have less-readable code), use $& which is the global variable that receives the match variable:"channels.xml").each do |line|
puts $& if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
Personally, I would probably just use the POSIX grep command. But there is Enumerable#grep in Ruby, too:
puts File.readlines('channels.xml').grep(%r|mms://{1}[\w\./-]+|)
Alternatively, you could use some of Ruby's file and line processing magic that it inherited from Perl. If you pass the -p flag to the Ruby interpreter, it will assume that the script you pass in is wrapped with while gets; ...; end and at the end of each loop it will print the current line. You can then use the $_ special variable to access the current line and use the next keyword to skip iteration of the loop if you don't want the line printed:
ruby -pe 'next unless $_ =~ %r|mms://{1}[\w\./-]+|' channels.xml
ruby -pe 'next unless $_ =~ /re/' file
is equivalent to
grep -E re file

Ruby: How to get the first character of a string

How can I get the first character in a string using Ruby?
Ultimately what I'm doing is taking someone's last name and just creating an initial out of it.
So if the string was "Smith" I just want "S".
You can use Ruby's open classes to make your code much more readable. For instance, this:
class String
def initial
will allow you to use the initial method on any string. So if you have the following variables:
last_name = "Smith"
first_name = "John"
Then you can get the initials very cleanly and readably:
puts first_name.initial # prints J
puts last_name.initial # prints S
The other method mentioned here doesn't work on Ruby 1.8 (not that you should be using 1.8 anymore anyway!--but when this answer was posted it was still quite common):
puts 'Smith'[0] # prints 83
Of course, if you're not doing it on a regular basis, then defining the method might be overkill, and you could just do it directly:
puts last_name[0,1]
If you use a recent version of Ruby (1.9.0 or later), the following should work:
'Smith'[0] # => 'S'
If you use either 1.9.0+ or 1.8.7, the following should work:
'Smith'.chars.first # => 'S'
If you use a version older than 1.8.7, this should work:
'Smith'.split(//).first # => 'S'
Note that 'Smith'[0,1] does not work on 1.8, it will not give you the first character, it will only give you the first byte.
works in both ruby 1.8 and ruby 1.9.
For completeness sake, since Ruby 1.9 String#chr returns the first character of a string. Its still available in 2.0 and 2.1.
"Smith".chr #=> "S"
In MRI 1.8.7 or greater:
Try this:
>> a = "Smith"
>> a[0]
=> "S"
>> "Smith".chr
#=> "S"
In Rails
name = 'Smith'
>> s = 'Smith'
=> "Smith"
>> s[0]
=> "S"
Another option that hasn't been mentioned yet:
> "Smith".slice(0)
#=> "S"
Because of an annoying design choice in Ruby before 1.9 — some_string[0] returns the character code of the first character — the most portable way to write this is some_string[0,1], which tells it to get a substring at index 0 that's 1 character long.
Try this:
def word(string, num)
string = 'Smith'
If you're using Rails You can also use truncate
> 'Smith'.truncate(1, omission: '')
#=> "S"
or for additional formatting:
> 'Smith'.truncate(4)
#=> "S..."
> 'Smith'.truncate(2, omission: '.')
#=> "S."
While this is definitely overkill for the original question, for a pure ruby solution, here is how truncate is implemented in rails
# File activesupport/lib/active_support/core_ext/string/filters.rb, line 66
def truncate(truncate_at, options = {})
return dup unless length > truncate_at
omission = options[:omission] || "..."
length_with_room_for_omission = truncate_at - omission.length
stop = if options[:separator]
rindex(options[:separator], length_with_room_for_omission) || length_with_room_for_omission
"#{self[0, stop]}#{omission}"
Other way around would be using the chars for a string:
def abbrev_name
first_name.chars.first.capitalize + '.' + ' ' + last_name
Any of these methods will work:
name = 'Smith'
puts name.[0..0] # => S
puts name.[0] # => S
puts name.[0,1] # => S
puts name.[0].chr # => S
