I do have a hash like this.
v_cp={"29000"=>["Quimper"],
"29100"=>["Douarnenez",
"Kerlaz",
"Le Juch",
"Pouldergat",
"Poullan-sur-Mer"],
"29120"=>["Combrit",
"Plomeur",
"Saint Jean Trolimon","Pont-L\'Abbe","Tremeoc"],
"29140"=>["Melgven","Rosporden","Tourch"]
I would like to print out each key as a table format on screen like this :
I use :
v_cp.each_key{|k| puts "*"+k+"*";}
But of course I get this output:
which is not what I aim to...
I thought of sprintf or printf but I'm really lost here...
Any help ? Thanks
If the length of each key is fixed you can just slice keys into subgroups and print them out:
v_cp.keys.each_slice(5) { |a| puts a.join(' ') }
If the length can vary, you should also ljust strings:
str_length = 6
v_cp.keys.each_slice(5) do |a|
puts a.map { |e| e.ljust(str_length , ' ') }.join(' ')
end
You can use #print instead of #puts to put line feeds exactly where you want them. Unlike #puts, which automatically adds a new line every time it's called, #print prints out only the string that is passed to it, so you have to specifically print a new line character to get a new line.
For example, to get five keys that are the same size on each row, as in your first image:
example = {
29000 => ['Bonjour'],
29100 => ['Ça va?'],
29200 => ['Hello'],
29300 => ['Doing ok?'],
29400 => ['Some text'],
29500 => ['Something else'],
29600 => ['More stuff'],
29700 => ['This'],
29800 => ['That'],
29900 => ['The other']
}
example.keys.each_with_index do |key, index|
print key.to_s
print ((index + 1) % 5).zero? ? "\n" : ' '
end
# Result:
=begin
29000 29100 29200 29300 29400
29500 29600 29700 29800 29900
=end
(I liked two spaces better than one.)
If the length varies, you can use #ljust to pad smaller strings with trailing spaces, as Fizvlad mentions.
Consider preferring #print over #puts to output anything more complex than a simple string. You often can do with one call to #print what can take multiple calls to #puts, so overall #print is more efficient.
Related
Hi I'm building a function that has me take any instance of "u" or "you" in a string and replace it with a specific word. I can go in and isolate the instances no problem but I cannot get the words to output properly. So far I have.
def autocorrect(input)
#replace = [['you','u'], ['your sister']]
#replace.each{|replaced| input.gsub!(replaced[0], replaced[1])}
input.split(" ")
if (input == "u" && input.length == 1) || input == "you"
input.replace("your sister")
end
input.join(" ")
end
The ideal output would be:
autocorrect("I am so smitten with you")
"I am smitten with your sister"
I don't know how to get the last part correct, I can't think of a good method to use. Any help would be greatly appreciated.
The problem you're having with your code is that you call input.split(" ") but you don't save that to anything, and then you check for input == "u" # ..., and input is still the entire string, so if you called autocorrect('u') or autocorrect('you') you would get "your sister" back, except for the next line: input.join(" ") will throw an error.
This error is because, remember input is still the original string, not an array of each word, and strings don't have a join method.
To get your code working with the fewest changes possible, you can change it to:
def autocorrect(input)
#replace = [['you','u'], ['your sister']]
#replace.each{|replaced| input.gsub!(replaced[0], replaced[1])}
input.split(" ").map do |word|
if (word == "u" && word.length == 1) || word == "you"
"your sister"
else
word
end
end.join(" ")
end
So, now, you are doing something with each word after you split(" ") the input, and you are checking each word against "u" and "you", instead of the entire input string. You then map either the replacement word, or the original, and then join them back into a single string to return them.
As an alternative, shorter way, you can use String#gsub which can take a Hash as the second parameter to do substitutions:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
def autocorrect(input)
replace = { 'you' => 'your sister',
'u' => 'your sister',
'another word' => 'something else entirely' }
input.gsub(/\b(#{replace.keys.join('|')})\b/, replace)
end
autocorrect("I am u so smitten with utopia you and another word")
# => "I am your sister so smitten with utopia your sister and something else entirely"
the regex in that example comes out looking like:
/\b(you|u|another word)\b/
with \b being any word boundary.
Simple array mapping would do the job:
"I am u so smuitten with utopia you".split(' ').map{|word| %w(you u).include?(word) ? 'your sister' : word}.join(' ')
#=> "I am your sister so smuitten with utopia your sister"
Your method would be:
def autocorrect(input)
input.split(' ').map{|word| %w(you u).include?(word) ? 'your sister' : word}.join(' ')
end
autocorrect("I am so smitten with you")
#=> "I am smitten with your sister"
I'd like to search through a txt file for a particular word. If I find that word, I'd like to retrieve the word that immediately follows it in the file. If my text file contained:
"My name is Jay and I want to go to the store"
I'd be searching for the word "want", and would want to add the word "to" to my array. I'll be looking through a very big text file, so any notes on performance would be great too.
The most literal way to read that might look like this:
a = []
str = "My name is Jack and I want to go to the store"
str.scan(/\w+/).each_cons(2) {|x, y| a << y if x == 'to'}
a
#=> ["go", "the"]
To read the file into a string use File.read.
This is one way:
Code
def find_next(fname, word)
enum = IO.foreach(fname)
loop do
e = (enum.next).scan(/\w+/)
ndx = e.index(word)
if ndx
return e[ndx+1] if ndx < e.size-1
loop do
e = enum.next
break if e =~ /\w+/
end
return e[/\w+/]
end
end
nil
end
Example
text =<<_
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
. . . . .
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair…
_
FName = "two_cities"
File.write(FName, text)
find_next(FName, "worst")
# of
find_next(FName, "wisdom")
# it
find_next(FName, "foolishness")
# it
find_next(FName, "dispair")
#=> nil
find_next(FName, "magpie")
#=> nil
Shorter, but less efficient, and problematic with large files:
File.read(FName)[/(?<=\b#{word}\b)\W+(\w+)/,1]
This is probably not the fastest way to do it, but something along these lines should work:
filename = "/path/to/filename"
target_word = "weasel"
next_word = ""
File.open(filename).each_line do |line|
line.split.each_with_index do |word, index|
if word == target_word
next_word = line.split[index + 1]
end
end
end
Given a File, String, or StringIO stored in file:
pattern, match = 'want', nil
catch :found do
file.each_line do |line|
line.split.each_cons(2) do |words|
if words[0] == pattern
match = words.pop
throw :found
end
end
end
end
match
#=> "to"
Note that this answer will find at most one match per file for speed, and linewise operation will save memory. If you want to find multiple matches per file, or find matches across line breaks, then this other answer is probably the way to go. YMMV.
This is the fastest I could come up with, assuming your file is loaded in a string:
word = 'want'
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
This will find ALL words that follow your particular word. So for example:
word = 'want'
string = 'My name is Jay and I want to go and I want a candy'
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
p array #=> ["to", "a"]
Testing this on my machine where I duplicated this string 500,000 times, I was able to reach 0.6 seconds execution time. I've also tried other approaches like splitting the string etc. but this was the fastest solution:
require 'benchmark'
Benchmark.bm do |bm|
bm.report do
word = 'want'
string = 'My name is Jay and I want to go and I want a candy' * 500_000
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
end
end
What would be a good way to remove the hash-tags from a string and then join the hash-tag words together in another string separated by commas:
'Some interesting tweet #hash #tags'
The result would be:
'Some interesting tweet'
And:
'hash,tags'
str = 'Some interesting tweet #hash #tags'
a,b = str.split.partition{|e| e.start_with?("#")}
# => [["#hash", "#tags"], ["Some", "interesting", "tweet"]]
a
# => ["#hash", "#tags"]
b
# => ["Some", "interesting", "tweet"]
a.join(",").delete("#")
# => "hash,tags"
b.join(" ")
# => "Some interesting tweet"
An alternate path is to use scan then remove the hash tags:
tweet = 'Some interesting tweet #hash #tags'
tags = tweet.scan(/#\w+/).uniq
tweet = tweet.gsub(/(?:#{ Regexp.union(tags).source })\b/, '').strip.squeeze(' ') # => "Some interesting tweet"
tags.join(',').tr('#', '') # => "hash,tags"
Dissecting it shows:
tweet.scan(/#\w+/) returns an array ["#hash", "#tags"].
uniq would remove any duplicated tags.
Regexp.union(tags) returns (?-mix:\#hash|\#tags).
Regexp.union(tags).source returns \#hash|\#tags. We don't want the pattern-flags at the start, so using source fixes that.
/(?:#{ Regexp.union(tags).source })\b/ returns the regular expression /(?:\#hash|\#tags)\b/.
tr is an extremely fast way to translate one character or characters to another, or strip them.
The final regex isn't the most optimized that can be generated. I'd actually write code to generate:
/#(?:hash|tags)\b/
but how to do that is left as an exercise for you. And, for short strings it won't make much difference as far as speed goes.
This has an array of hash that starts out empty
It then splits the hash tag based off spaces
It then looks for a hash tag and grabs the rest of the word
It then stores it into the array
array_of_hashetags = []
array_of_words = []
str = "Some interesting tweet #hash #tags"
str.split.each do |x|
if /\#\w+/ =~ x
array_of_hashetags << x.gsub(/\#/, "")
else
array_of_words << x
end
end
Hope the helps
I'd like to count the number of times a set of words appear in each paragraph in a text file. I am able to count the number of times a set of words appears in an entire text.
It has been suggested to me that my code is really buggy, so I'll just ask what I would like to do, and if you want, you can look at the code I have at the bottom.
So, given that "frequency_count.txt" has the words "apple pear grape melon kiwi" in it, I want to know how often "apple" shows up in each paragraph of a separate file "test_essay.txt", how often pear shows up, etc., and then for these numbers to be printed out in a series of lines of numbers, each corresponding to a paragraph.
For instance:
apple, pear, grape, melon, kiwi
3,5,2,7,8
2,3,1,6,7
5,6,8,2,3
Where each line corresponds to one of the paragraphs.
I am very, very new to Ruby, so thank you for your patience.
output_file = '/Users/yirenlu/Quora-Personal-Analytics/weka_input6.csv'
o = File.open(output_file, "r+")
common_words = '/Users/yirenlu/Quora-Personal-Analytics/frequency_count.txt'
c = File.open(common_words, "r")
c.each_line{|$line1|
words1 = $line1.split
words1.each{|w1|
the_file = '/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt'
f = File.open(the_file, "r")
rows = File.readlines("/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt")
text = rows.join
paragraph = text.split(/\n\n/)
paragraph.each{|p|
h = Hash.new
puts "this is each paragraph"
p.each_line{|line|
puts "this is each line"
words = line.split
words.each{|w|
if w1 == w
if h.has_key?(w)
h[w1] = h[w1] + 1
else
h[w1] = 1
end
$x = h[w1]
end
}
}
o.print "#{$x},"
}
}
o.print "\n"
o.print "#{$line1}"
}
If you're used to PHP or Perl you may be under the impression that a variable like $line1 is local, but this is a global. Use of them is highly discouraged and the number of instances where they are strictly required is very short. In most cases you can just omit the $ and use variables that way with proper scoping.
This example also suffers from nearly unreadable indentation, though perhaps that was an artifact of the cut-and-paste procedure.
Generally what you want for counters is to create a hash with a default of zero, then add to that as required:
# Create a hash where the default values for each key is 0
counter = Hash.new(0)
# Add to the counters where required
counter['foo'] += 1
counter['bar'] += 2
puts counter['foo']
# => 1
puts counter['baz']
# => 0
You basically have what you need, but everything is all muddled and just needs to be organized better.
Here are two one-liners to calculate frequencies of words in a string.
The first one is a bit easier to understand, but it's less effective:
txt.scan(/\w+/).group_by{|word| word.downcase}.map{|k,v| [k, v.size]}
# => [['word1', 1], ['word2', 5], ...]
The second solution is:
txt.scan(/\w+/).inject(Hash.new(0)) { |hash, w| hash[w.downcase] += 1; hash}
# => {'word1' => 1, 'word2' => 5, ...}
This could be shorter and easier to read if you use:
The CSV library.
A more functional approach using map and blocks.
require 'csv'
common_words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
def word_frequency(words, text)
words.map { |word| text.scan(/\b#{word}\b/).length }
end
CSV.open("file.csv", "wb") do |csv|
paragraphs = text.split /\n\n/
paragraphs.each do |para|
csv << word_frequency(common_words, para)
end
end
Note this is currently case-sensitive but it's a minor adjustment if you want case-insensitivity.
Here's an alternate answer, which is has been tweaked for conciseness (though not as easy to read as my other answer).
require 'csv'
words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
CSV.open("file.csv", "wb") do |csv|
text.split(/\n\n/).map {|p| csv << words.map {|w| p.scan(/\b#{w}\b/).length}}
end
I prefer the slightly longer but more self-documenting code, but it's fun to see how small it can get.
What about this:
# Create an array of regexes to be used in `scan' in the loop.
# `\b' makes sure that `barfoobar' does not match `bar' or `foo'.
p word_list = File.open("frequency_count.txt"){|io| io.read.scan(/\w+/)}.map{|w| /\b#{w}\b/}
File.open("test_essay.txt") do |io|
loop do
# Add lines to `paragraph' as long as there is a continuous line
paragraph = ""
# A `l.chomp.empty?' becomes true at paragraph border
while l = io.gets and !l.chomp.empty?
paragraph << l
end
p word_list.map{|re| paragraph.scan(re).length}
# The end of file has been reached when `l == nil'
break unless l
end
end
To count how many times one word appears in a text:
text = "word aaa word word word bbb ccc ccc"
text.scan(/\w+/).count("word") # => 4
To count a set of words:
text = "word aaa word word word bbb ccc ccc"
wlist = text.scan(/\w+/)
wset = ["word", "ccc"]
result = {}
wset.each {|word| result[word] = wlist.count(word) }
result # => {"word" => 4, "ccc" => 2}
result["ccc"] # => 2
What is the preferred way of removing the last n characters from a string?
irb> 'now is the time'[0...-4]
=> "now is the "
If the characters you want to remove are always the same characters, then consider chomp:
'abc123'.chomp('123') # => "abc"
The advantages of chomp are: no counting, and the code more clearly communicates what it is doing.
With no arguments, chomp removes the DOS or Unix line ending, if either is present:
"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
From the comments, there was a question of the speed of using #chomp versus using a range. Here is a benchmark comparing the two:
require 'benchmark'
S = 'asdfghjkl'
SL = S.length
T = 10_000
A = 1_000.times.map { |n| "#{n}#{S}" }
GC.disable
Benchmark.bmbm do |x|
x.report('chomp') { T.times { A.each { |s| s.chomp(S) } } }
x.report('range') { T.times { A.each { |s| s[0...-SL] } } }
end
Benchmark Results (using CRuby 2.13p242):
Rehearsal -----------------------------------------
chomp 1.540000 0.040000 1.580000 ( 1.587908)
range 1.810000 0.200000 2.010000 ( 2.011846)
-------------------------------- total: 3.590000sec
user system total real
chomp 1.550000 0.070000 1.620000 ( 1.610362)
range 1.970000 0.170000 2.140000 ( 2.146682)
So chomp is faster than using a range, by ~22%.
Ruby 2.5+
As of Ruby 2.5 you can use delete_suffix or delete_suffix! to achieve this in a fast and readable manner.
The docs on the methods are here.
If you know what the suffix is, this is idiomatic (and I'd argue, even more readable than other answers here):
'abc123'.delete_suffix('123') # => "abc"
'abc123'.delete_suffix!('123') # => "abc"
It's even significantly faster (almost 40% with the bang method) than the top answer. Here's the result of the same benchmark:
user system total real
chomp 0.949823 0.001025 0.950848 ( 0.951941)
range 1.874237 0.001472 1.875709 ( 1.876820)
delete_suffix 0.721699 0.000945 0.722644 ( 0.723410)
delete_suffix! 0.650042 0.000714 0.650756 ( 0.651332)
I hope this is useful - note the method doesn't currently accept a regex so if you don't know the suffix it's not viable for the time being. However, as the accepted answer (update: at the time of writing) dictates the same, I thought this might be useful to some people.
str = str[0..-1-n]
Unlike the [0...-n], this handles the case of n=0.
I would suggest chop. I think it has been mentioned in one of the comments but without links or explanations so here's why I think it's better:
It simply removes the last character from a string and you don't have to specify any values for that to happen.
If you need to remove more than one character then chomp is your best bet. This is what the ruby docs have to say about chop:
Returns a new String with the last character removed. If the string
ends with \r\n, both characters are removed. Applying chop to an empty
string returns an empty string. String#chomp is often a safer
alternative, as it leaves the string unchanged if it doesn’t end in a
record separator.
Although this is used mostly to remove separators such as \r\n I've used it to remove the last character from a simple string, for example the s to make the word singular.
name = "my text"
x.times do name.chop! end
Here in the console:
>name = "Nabucodonosor"
=> "Nabucodonosor"
> 7.times do name.chop! end
=> 7
> name
=> "Nabuco"
Dropping the last n characters is the same as keeping the first length - n characters.
Active Support includes String#first and String#last methods which provide a convenient way to keep or drop the first/last n characters:
require 'active_support/core_ext/string/access'
"foobarbaz".first(3) # => "foo"
"foobarbaz".first(-3) # => "foobar"
"foobarbaz".last(3) # => "baz"
"foobarbaz".last(-3) # => "barbaz"
if you are using rails, try:
"my_string".last(2) # => "ng"
[EDITED]
To get the string WITHOUT the last 2 chars:
n = "my_string".size
"my_string"[0..n-3] # => "my_stri"
Note: the last string char is at n-1. So, to remove the last 2, we use n-3.
Check out the slice() method:
http://ruby-doc.org/core-2.5.0/String.html#method-i-slice
You can always use something like
"string".sub!(/.{X}$/,'')
Where X is the number of characters to remove.
Or with assigning/using the result:
myvar = "string"[0..-X]
where X is the number of characters plus one to remove.
If you're ok with creating class methods and want the characters you chop off, try this:
class String
def chop_multiple(amount)
amount.times.inject([self, '']){ |(s, r)| [s.chop, r.prepend(s[-1])] }
end
end
hello, world = "hello world".chop_multiple 5
hello #=> 'hello '
world #=> 'world'
Using regex:
str = 'string'
n = 2 #to remove last n characters
str[/\A.{#{str.size-n}}/] #=> "stri"
x = "my_test"
last_char = x.split('').last