Ruby grep with line number - ruby

What could be the best way of getting the matching lines with the line numbers using Ruby's Enumerable#grep method. (as we use -n or --line-number switch with grep command).

Enumerable#grep doesn't let you do that, at least by default. Instead, I came up with:
text = 'now is the time
for all good men
to come to the aid
of their country'
regex = /aid/
hits = text.lines.with_index(1).inject([]) { |m,i| m << i if (i[0][regex]); m }
hits # => [["to come to the aid\n", 3]]

maybe something like this:
module Enumerable
def lgrep(pattern)
map.with_index.select{|e,| e =~ pattern}
end
end

This isn't elegant or efficient, but why not just number the lines before grepping?

You can kludge it in Ruby 1.8.6 like so:
require 'enumerator'
class Array
def grep_with_index(regex)
self.enum_for(:each_with_index).select {|x,i| x =~ regex}
end
end
arr = ['Foo', 'Bar', 'Gah']
arr.grep_with_index(/o/) # => [[0, 'Foo']]
arr.grep_with_index(/a/) # => [[1, 'Bar'], [2, 'Gah']]
Or if you're looking for tips on writing a grep-like utility in Ruby. Something like this should work:
def greplines(filename, regex)
lineno = 0
File.open(filename) do |file|
file.each_line do |line|
puts "#{lineno += 1}: #{line}" if line =~ regex
end
end
end

>> lines=["one", "two", "tests"]
=> ["one", "two", "tests"]
>> lines.grep(/test/){|x| puts "#{lines.index(x)+1}, #{x}" }
3, tests

To mash up the Tin Man's and ghostdog74's answers
text = 'now is the time
for all good men
to come to the aid
of their country'
regex = /aid/
text.lines.grep(/aid/){|x| puts "#{text.lines.find_index(x)+1}, #{x}" }
# => 3, to come to the aid

A modification to the solution given by the Tin Man. This snippet will return a hash having line numbers as keys, and matching lines as values. This one also works in ruby 1.8.7.
text = 'now is the time
for all good men
to come to the aid
of their country'
regex = /aid/
hits = text.lines.each_with_index.inject({}) { |m, i| m.merge!({(i[1]+1) => i[0].chomp}) if (i[0][regex]); m}
hits #=> {3=>"to come to the aid"}

Put text in a file
test.log
now is the time
for all good men
to come to the aid
of their country
Command line (alternative of grep or awk command )
ruby -ne ' puts $_ if $_=~/to the/' test.log
Try this also
ruby -na -e ' puts $F[2] if $_=~/the/' test.log
Similarly
ruby -na -e ' puts $_.split[2] if $_=~/the/' test.log
This is similar to awk command.

Another suggestion:
lines.find_index{ |l| l=~ regex }.

Related

Print Horizontal Line in Ruby

Can ruby's puts or print draw horizontal line kind of like bash does with printf+tr does ?
printf '%20s\n' | tr ' ' -
this will draw:
--------------------
You can use the following snippet
puts "-"*20
Check this for more help.
You might be interested in formatting using ljust, rjust and center as well.
I use a quick puts "*"*80 for debug purposes. I'm sure there are better ways.
For fancy lines:
p 'MY_LINE'.center(80,'_-')
#=> "_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-MY_LINE_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_"
You could also have the following:
puts "".center(20, "-")
irb(main):005:0> puts "".center(20, '-')
=> "--------------------"
This could be more flexible if you wanted to add additional information:
irb(main):007:0> puts "end of task".center(20, "-")
----end of task-----
=> nil
You can also use String#ljust or String#rjust.
puts ''.rjust(20,"-")
# >> --------------------
puts ''.ljust(20,"-")
# >> --------------------

Ruby: Use condition result in condition block

I have such code
reg = /(.+)_path/
if reg.match('home_path')
puts reg.match('home_path')[0]
end
This will eval regex twice :(
So...
reg = /(.+)_path/
result = reg.match('home_path')
if result
puts result[0]
end
But it will store variable result in memory till.
I have one functional-programming idea
/(.+)_path/.match('home_path').compact.each do |match|
puts match[0]
end
But seems there should be better solution, isn't it?
There are special global variables (their names start with $) that contain results of the last regexp match:
r = /(.+)_path/
# $1 - the n-th group of the last successful match (may be > 1)
puts $1 if r.match('home_path')
# => home
# $& - the string matched by the last successful match
puts $& if r.match('home_path')
# => home_path
You can find full list of predefined global variables here.
Note, that in the examples above puts won't be executed at all if you pass a string that doesn't match the regexp.
And speaking about general case you can always put assignment into condition itself:
if m = /(.+)_path/.match('home_path')
puts m[0]
end
Though, many people don't like that as it makes code less readable and gives a good opportunity for confusing = and ==.
My personal favorite (w/ 1.9+) is some variation of:
if /(?<prefix>.+)_path/ =~ "home_path"
puts prefix
end
If you really want a one-liner: puts /(?<prefix>.+)_path/ =~ 'home_path' ? prefix : false
See the Ruby Docs for a few limitations of named captures and #=~.
From the docs: If a block is given, invoke the block with MatchData if match succeed.
So:
/(.+)_path/.match('home_path') { |m| puts m[1] } # => home
/(.+)_path/.match('homepath') { |m| puts m[1] } # prints nothing
How about...
if m=/regex here/.match(string) then puts m[0] end
A neat one-line solution, I guess :)
how about this ?
puts $~ if /regex/.match("string")
$~ is a special variable that stores the last regexp match. more info: http://www.regular-expressions.info/ruby.html
Actually, this can be done with no conditionals at all. (The expression evaluates to "" if there is no match.)
puts /(.+)_path/.match('home_xath').to_a[0].to_s

How do I count the number of instances of particular words in a paragraph?

I'd like to count the number of times a set of words appear in each paragraph in a text file. I am able to count the number of times a set of words appears in an entire text.
It has been suggested to me that my code is really buggy, so I'll just ask what I would like to do, and if you want, you can look at the code I have at the bottom.
So, given that "frequency_count.txt" has the words "apple pear grape melon kiwi" in it, I want to know how often "apple" shows up in each paragraph of a separate file "test_essay.txt", how often pear shows up, etc., and then for these numbers to be printed out in a series of lines of numbers, each corresponding to a paragraph.
For instance:
apple, pear, grape, melon, kiwi
3,5,2,7,8
2,3,1,6,7
5,6,8,2,3
Where each line corresponds to one of the paragraphs.
I am very, very new to Ruby, so thank you for your patience.
output_file = '/Users/yirenlu/Quora-Personal-Analytics/weka_input6.csv'
o = File.open(output_file, "r+")
common_words = '/Users/yirenlu/Quora-Personal-Analytics/frequency_count.txt'
c = File.open(common_words, "r")
c.each_line{|$line1|
words1 = $line1.split
words1.each{|w1|
the_file = '/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt'
f = File.open(the_file, "r")
rows = File.readlines("/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt")
text = rows.join
paragraph = text.split(/\n\n/)
paragraph.each{|p|
h = Hash.new
puts "this is each paragraph"
p.each_line{|line|
puts "this is each line"
words = line.split
words.each{|w|
if w1 == w
if h.has_key?(w)
h[w1] = h[w1] + 1
else
h[w1] = 1
end
$x = h[w1]
end
}
}
o.print "#{$x},"
}
}
o.print "\n"
o.print "#{$line1}"
}
If you're used to PHP or Perl you may be under the impression that a variable like $line1 is local, but this is a global. Use of them is highly discouraged and the number of instances where they are strictly required is very short. In most cases you can just omit the $ and use variables that way with proper scoping.
This example also suffers from nearly unreadable indentation, though perhaps that was an artifact of the cut-and-paste procedure.
Generally what you want for counters is to create a hash with a default of zero, then add to that as required:
# Create a hash where the default values for each key is 0
counter = Hash.new(0)
# Add to the counters where required
counter['foo'] += 1
counter['bar'] += 2
puts counter['foo']
# => 1
puts counter['baz']
# => 0
You basically have what you need, but everything is all muddled and just needs to be organized better.
Here are two one-liners to calculate frequencies of words in a string.
The first one is a bit easier to understand, but it's less effective:
txt.scan(/\w+/).group_by{|word| word.downcase}.map{|k,v| [k, v.size]}
# => [['word1', 1], ['word2', 5], ...]
The second solution is:
txt.scan(/\w+/).inject(Hash.new(0)) { |hash, w| hash[w.downcase] += 1; hash}
# => {'word1' => 1, 'word2' => 5, ...}
This could be shorter and easier to read if you use:
The CSV library.
A more functional approach using map and blocks.
require 'csv'
common_words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
def word_frequency(words, text)
words.map { |word| text.scan(/\b#{word}\b/).length }
end
CSV.open("file.csv", "wb") do |csv|
paragraphs = text.split /\n\n/
paragraphs.each do |para|
csv << word_frequency(common_words, para)
end
end
Note this is currently case-sensitive but it's a minor adjustment if you want case-insensitivity.
Here's an alternate answer, which is has been tweaked for conciseness (though not as easy to read as my other answer).
require 'csv'
words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
CSV.open("file.csv", "wb") do |csv|
text.split(/\n\n/).map {|p| csv << words.map {|w| p.scan(/\b#{w}\b/).length}}
end
I prefer the slightly longer but more self-documenting code, but it's fun to see how small it can get.
What about this:
# Create an array of regexes to be used in `scan' in the loop.
# `\b' makes sure that `barfoobar' does not match `bar' or `foo'.
p word_list = File.open("frequency_count.txt"){|io| io.read.scan(/\w+/)}.map{|w| /\b#{w}\b/}
File.open("test_essay.txt") do |io|
loop do
# Add lines to `paragraph' as long as there is a continuous line
paragraph = ""
# A `l.chomp.empty?' becomes true at paragraph border
while l = io.gets and !l.chomp.empty?
paragraph << l
end
p word_list.map{|re| paragraph.scan(re).length}
# The end of file has been reached when `l == nil'
break unless l
end
end
To count how many times one word appears in a text:
text = "word aaa word word word bbb ccc ccc"
text.scan(/\w+/).count("word") # => 4
To count a set of words:
text = "word aaa word word word bbb ccc ccc"
wlist = text.scan(/\w+/)
wset = ["word", "ccc"]
result = {}
wset.each {|word| result[word] = wlist.count(word) }
result # => {"word" => 4, "ccc" => 2}
result["ccc"] # => 2

Looking to clean up a small ruby script

I'm looking for a much more idiomatic way to do the following little ruby script.
File.open("channels.xml").each do |line|
if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
end
end
Thanks in advance for any suggestions.
The original:
File.open("channels.xml").each do |line|
if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
end
end
can be changed into this:
m = nil
open("channels.xml").each do |line|
puts m if m = line.match(%r|(mms://{1}[\w\./-]+)|)
end
File.open can be changed to just open.
if XYZ
puts XYZ
end
can be changed to puts x if x = XYZ as long as x has occurred at some place in the current scope before the if statement.
The Regexp '(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)' can be refactored a little bit. Using the %rXX notation, you can create regular expressions without the need for so many backslashes, where X is any matching character, such as ( and ) or in the example above, | |.
This character class [a-zA-Z\.\d\/\w-] (read: A to Z, case insensitive, the period character, 0 to 9, a forward slash, any word character, or a dash) is a little redundant. \w denotes "word characters", i.e. A-Za-z0-9 and underscore. Since you specify \w as a positive match, A-Za-z and \d are redundant.
Using those 2 cleanups, the Regexp can be changed into this: %r|(mms://{1}[\w\./-]+)|
If you'd like to avoid the weird m = nil scoping sorcery, this will also work, but is less idiomatic:
open("channels.xml").each do |line|
m = line.match(%r|(mms://{1}[\w\./-]+)|) and puts m
end
or the longer, but more readable version:
open("channels.xml").each do |line|
if m = line.match(%r|(mms://{1}[\w\./-]+)|)
puts m
end
end
One very easy to read approach is just to store the result of the match, then only print if there's a match:
File.open("channels.xml").each do |line|
m = line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts m if m
end
If you want to start getting clever (and have less-readable code), use $& which is the global variable that receives the match variable:
File.open("channels.xml").each do |line|
puts $& if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
end
Personally, I would probably just use the POSIX grep command. But there is Enumerable#grep in Ruby, too:
puts File.readlines('channels.xml').grep(%r|mms://{1}[\w\./-]+|)
Alternatively, you could use some of Ruby's file and line processing magic that it inherited from Perl. If you pass the -p flag to the Ruby interpreter, it will assume that the script you pass in is wrapped with while gets; ...; end and at the end of each loop it will print the current line. You can then use the $_ special variable to access the current line and use the next keyword to skip iteration of the loop if you don't want the line printed:
ruby -pe 'next unless $_ =~ %r|mms://{1}[\w\./-]+|' channels.xml
Basically,
ruby -pe 'next unless $_ =~ /re/' file
is equivalent to
grep -E re file

Ruby: How to get the first character of a string

How can I get the first character in a string using Ruby?
Ultimately what I'm doing is taking someone's last name and just creating an initial out of it.
So if the string was "Smith" I just want "S".
You can use Ruby's open classes to make your code much more readable. For instance, this:
class String
def initial
self[0,1]
end
end
will allow you to use the initial method on any string. So if you have the following variables:
last_name = "Smith"
first_name = "John"
Then you can get the initials very cleanly and readably:
puts first_name.initial # prints J
puts last_name.initial # prints S
The other method mentioned here doesn't work on Ruby 1.8 (not that you should be using 1.8 anymore anyway!--but when this answer was posted it was still quite common):
puts 'Smith'[0] # prints 83
Of course, if you're not doing it on a regular basis, then defining the method might be overkill, and you could just do it directly:
puts last_name[0,1]
If you use a recent version of Ruby (1.9.0 or later), the following should work:
'Smith'[0] # => 'S'
If you use either 1.9.0+ or 1.8.7, the following should work:
'Smith'.chars.first # => 'S'
If you use a version older than 1.8.7, this should work:
'Smith'.split(//).first # => 'S'
Note that 'Smith'[0,1] does not work on 1.8, it will not give you the first character, it will only give you the first byte.
"Smith"[0..0]
works in both ruby 1.8 and ruby 1.9.
For completeness sake, since Ruby 1.9 String#chr returns the first character of a string. Its still available in 2.0 and 2.1.
"Smith".chr #=> "S"
http://ruby-doc.org/core-1.9.3/String.html#method-i-chr
In MRI 1.8.7 or greater:
'foobarbaz'.each_char.first
Try this:
>> a = "Smith"
>> a[0]
=> "S"
OR
>> "Smith".chr
#=> "S"
In Rails
name = 'Smith'
name.first
>> s = 'Smith'
=> "Smith"
>> s[0]
=> "S"
Another option that hasn't been mentioned yet:
> "Smith".slice(0)
#=> "S"
Because of an annoying design choice in Ruby before 1.9 — some_string[0] returns the character code of the first character — the most portable way to write this is some_string[0,1], which tells it to get a substring at index 0 that's 1 character long.
Try this:
def word(string, num)
string = 'Smith'
string[0..(num-1)]
end
If you're using Rails You can also use truncate
> 'Smith'.truncate(1, omission: '')
#=> "S"
or for additional formatting:
> 'Smith'.truncate(4)
#=> "S..."
> 'Smith'.truncate(2, omission: '.')
#=> "S."
While this is definitely overkill for the original question, for a pure ruby solution, here is how truncate is implemented in rails
# File activesupport/lib/active_support/core_ext/string/filters.rb, line 66
def truncate(truncate_at, options = {})
return dup unless length > truncate_at
omission = options[:omission] || "..."
length_with_room_for_omission = truncate_at - omission.length
stop = if options[:separator]
rindex(options[:separator], length_with_room_for_omission) || length_with_room_for_omission
else
length_with_room_for_omission
end
"#{self[0, stop]}#{omission}"
end
Other way around would be using the chars for a string:
def abbrev_name
first_name.chars.first.capitalize + '.' + ' ' + last_name
end
Any of these methods will work:
name = 'Smith'
puts name.[0..0] # => S
puts name.[0] # => S
puts name.[0,1] # => S
puts name.[0].chr # => S

Resources