Ruby - Check the length of each words and group them - ruby

I can't find the methods used to check the length of each word and group them per length.
arr = ["john","roger","matt","john", "james", "Jennifer"]
The method should return:
There are 3 names with 4 characters
There are 2 names with 5 characters
There is 1 names with 1 character
I tried this one and it's working
arr.group_by(&:length).transform_values(&:count)
Thank you

this one will do
arr.map(&:size).tally
but you need Ruby versions >= 2.7

each_with_object is your key.
arr = %w[john roger matt john james Jennifer]
res =
arr.each_with_object({}) do |name, obj|
obj[name.length] ||= []
obj[name.length].push(name)
end

Related

Space before first word after .join array to string

To convert array to string I used Array#join and got space between the
beginning of the string, first quote mark, and the first word. I do not understand why this is happening.
I resolved with String#strip but I would like to understand
def order(words)
arr_new = []
arr = words.split(" ")
nums = ["1","2","3","4","5","6","7","8","9"]
arr.each do |word|
nums.each do |num|
if word.include? num
arr_new[num.to_i] = word
end
end
end
arr_new.join(" ").strip
end
order("is2 Thi1s T4est 3a")
Without .strip the output is:
" Thi1s is2 3a T4est"
After .strip:
"Thi1s is2 3a T4est"
The reason you're seeing the extra space is because arrays in ruby are 0 indexed, so you have an nil array element because your first insert is a index 1
x = []
x[1] = "test"
This creates an array as such:
[
nil,
"test"
]
If you created an empty array named x and assigned x[10] = "test" you'd have 10 nil values, and the word "test" in your array.
So, your array, before joining, is actually:
[nil, "Thi1s", "is2", "3a", "T4est"]
You have a couple options:
Change your strings to start with zero
Change your assignment to adjust the offset (subtract one)
Use compact before you join (this will remove nils)
Use strip as you noted
I'd suggest compact because it would address a few edge cases (such as "gaps" in your numbers.
More info in the array docs
#Jay's explanation is indeed correct.
I'll simply suggest a cleaner version of your code that doesn't have the same problem.
This assumes that the 1-9 order isn't dynamic. Aka wouldn't work if you wanted to sort by random characters for example.
def order(words)
words.split.sort_by { |word| word[/\d/].to_i }.join ' '
end

Counting words in Ruby with some exceptions

Say that we want to count the number of words in a document. I know we can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don't want to count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
You can wrap this up pretty neatly:
text.each_line do |line|
total_words += line.split.reject do |word|
word.match(/\A(\d+|\w|\S*\#\S+\.\S+)\z/)
end.length
end
Roughly speaking that defines an approximate email address.
Remember Ruby strongly encourages the use of variables with names like total_words and not totalWords.
assuming you can represent all the exceptions in a single regular expression regex_variable, you could do:
text.each_line(){ |line| totalWords = totalWords + line.split.count {|wrd| wrd !~ regex_variable }
your regular expression could look something like:
regex_variable = /\d.|^[a-z]{1}$|\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
I don't claim to be a regex expert, so you may want to double check that, particularly the email validation part
In addition to the other answers, a little gem hunting came up with this:
WordsCounted Gem
Get the following data from any string or readable file:
Word count
Unique word count
Word density
Character count
Average characters per word
A hash map of words and the number of times they occur
A hash map of words and their lengths
The longest word(s) and its length
The most occurring word(s) and its number of occurrences.
Count invividual strings for occurrences.
A flexible way to exclude words (or anything) from the count. You can pass a string, a regexp, an array, or a lambda.
Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
Filters special characters but respects hyphens and apostrophes.
Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as ["São", "Paulo"] and not ["S", "", "o", "Paulo"].
Opens and reads files. Pass in a file path or a url instead of a string.
Have you ever started answering a question and found yourself wandering, exploring interesting, but tangential issues, or concepts you didn't fully understand? That's what happened to me here. Perhaps some of the ideas might prove useful in other settings, if not for the problem at hand.
For readability, we might define some helpers in the class String, but to avoid contamination, I'll use Refinements.
Code
module StringHelpers
refine String do
def count_words
remove_punctuation.split.count { |w|
!(w.is_number? || w.size == 1 || w.is_email_address?) }
end
def remove_punctuation
gsub(/[.!?,;:)](?:\s|$)|(?:^|\s)\(|\-|\n/,' ')
end
def is_number?
self =~ /\A-?\d+(?:\.\d+)?\z/
end
def is_email_address?
include?('#') # for testing only
end
end
end
module CountWords
using StringHelpers
def self.count_words_in_file(fname)
IO.foreach(fname).reduce(0) { |t,l| t+l.count_words }
end
end
Note that using must be in a module (possibly a class). It does not work in main, presumably because that would make the methods available in the class self.class #=> Object, which would defeat the purpose of Refinements. (Readers: please correct me if I'm wrong about the reason using must be in a module.)
Example
Let's first informally check that the helpers are working correctly:
module CheckHelpers
using StringHelpers
s = "You can reach my dog, a 10-year-old golden, at fido#dogs.org."
p s = s.remove_punctuation
#=> "You can reach my dog a 10 year old golden at fido#dogs.org."
p words = s.split
#=> ["You", "can", "reach", "my", "dog", "a", "10",
# "year", "old", "golden", "at", "fido#dogs.org."]
p '123'.is_number? #=> 0
p '-123'.is_number? #=> 0
p '1.23'.is_number? #=> 0
p '123.'.is_number? #=> nil
p "fido#dogs.org".is_email_address? #=> true
p "fido(at)dogs.org".is_email_address? #=> false
p s.count_words #=> 9 (`'a'`, `'10'` and "fido#dogs.org" excluded)
s = "My cat, who has 4 lives remaining, is at abbie(at)felines.org."
p s = s.remove_punctuation
p s.count_words
end
All looks OK. Next, put I'll put some text in a file:
FName = "pets"
text =<<_
My cat, who has 4 lives remaining, is at abbie(at)felines.org.
You can reach my dog, a 10-year-old golden, at fido#dogs.org.
_
File.write(FName, text)
#=> 125
and confirm the file contents:
File.read(FName)
#=> "My cat, who has 4 lives remaining, is at abbie(at)felines.org.\n
# You can reach my dog, a 10-year-old golden, at fido#dogs.org.\n"
Now, count the words:
CountWords.count_words_in_file(FName)
#=> 18 (9 in ech line)
Note that there is at least one problem with the removal of punctuation. It has to do with the hyphen. Any idea what that might be?
Something like...?
def is_countable(word)
return false if word.size < 2
return false if word ~= /^[0-9]+$/
return false if is_an_email_address(word) # you need a gem for this...
return true
end
wordCount = text.split().inject(0) {|count,word| count += 1 if is_countable(word) }
Or, since I am jumping to the conclusion that you can just split your entire text into an array with split(), you might need:
wordCount = 0
text.each_line do |line|
line.split.each{|word| wordCount += 1 if is_countable(word) }
end

Find the longest substring in a string

I would like to find the longest sequence of repeated characters in a string.
ex:
"aabbccc" #=> ccc
"aabbbddccdddd" #=> dddd
etc
In the first example, ccc is the longest sequence because c is repeated 3 times. In the second example, dddd is the longest sequence because d is repeated 4 times.
It should be something like this:
b = []
a.scan(/(.)(.)(.)/) do |x,y,z|
b<<x<<y<<z if x==y && y==z
end
but with some flags to keep the count of repeating, I guess
This should work:
string = 'aabbccc'
string.chars.chunk {|a| a}.max_by {|_, ary| ary.length}.last.join
Update:
Explanation of |_, ary|: at this point we have array of 2-element arrays. We only need to use the second one and we ignore the first one. If instead we do |char, ary| some IDEs would complain about unused local variable. Placing _ tells ruby to ignore that value.
Using regex:
We can achieve same thing with regex:
string.scan(/([a-z])(\1*)/).map(&:join).max_by(&:length)
Here's a solution using a regular expression:
LETTER_MATCH = Regexp.new(('a'..'z').collect do |letter|
"#{letter}+"
end.join('|'))
def repeated(string)
string.scan(LETTER_MATCH).sort_by(&:length).last
end
Here's another solution. It's bigger but it still works)
def most_friquent_char_in_a_row(my_str)
my_str = my_str.chars
temp=[]
ary=[]
for i in 0..my_str.count-1
if my_str[i]==my_str[i+1]
temp<<my_str[i] unless temp.include?(my_str[i])
temp<<my_str[i+1]
else
ary<<temp
temp=[]
end
end
result = ary.max_by(&:size).join
p "#{result} - #{result.size}"
end

Find out which words in a large list occur in a small string

I have a static 'large' list of words, about 300-500 words, called 'list1'
given a relatively short string str of about 40 words, what is the fastest method in ruby to get:
the number of times a word in list1 occurs in str (counting multiple occurrences)
a list of which words in list1 occur one or more times in the string str
the number of words in (2)
'Occuring' in str means either as a whole word in str, or as a partial within a word in str. So if 'fred' is in list1 and str contained 'fred' and 'freddie' that would be two matches.
Everything is lowercase, so any matching does not have to care about case.
For example:
list1 ="fred sam sandy jack sue bill"
str = "and so sammy went with jack to see fred and freddie"
so str contains sam, jack, fred (twice)
for part (1) the expression would return 4 (sam+jack+fred+fred)
for part (2) the expression would return "sam jack fred"
and part (3) is 3
The 'ruby way' to do this eludes me after 4 hours... with iteration it's easy enough (but slow). Any help would be appreciated!
Here's my shot at it:
def match_freq(exprs, strings)
rs, ss, f = exprs.split.map{|x|Regexp.new(x)}, strings.split, {}
rs.each{|r| ss.each{|s| f[r] = f[r] ? f[r]+1 : 1 if s=~r}}
[f.values.inject(0){|a,x|a+x}, f, f.size]
end
list1 = "fred sam sandy jack sue bill"
str = "and so sammy went with jack to see fred and freddie"
x = match_freq(list1, str)
x # => [4, {/sam/=>1, /fred/=>2, /jack/=>1}, 3]
The output of "match_freq" is an array of your output items (a,b,c). The algorithm itself is O(n*m) where n is the number of items in list1 and m is the size of the input string, I don't think you can do better than that (in terms of big-oh). But there are smaller optimizations that might pay off like keeping a separate counter for the total number of matches instead of computing it afterwards. This was just my quick hack at it.
You can extract just the matching words from the output as follows:
matches = x[1].keys.map{|x|x.source}.join(" ") # => "sam fred jack"
Note that the order won't be preserved necessarily, if that's important you'll have to keep a separate list of the order they were found.
Here's an alternative implementation, for your edification:
def match_freq( words, str )
words = words.split(/\s+/)
counts = Hash[ words.map{ |w| [w,str.scan(w).length] } ]
counts.delete_if{ |word,ct| ct==0 }
occurring_words = counts.keys
[
counts.values.inject(0){ |sum,ct| sum+ct }, # Sum of counts
occurring_words,
occurring_words.length
]
end
list1 = "fred sam sandy jack sue bill"
str = "and so sammy went with jack to see fred and freddie"
x = match_freq(list1, str)
p x #=> [4, ["fred", "sam", "jack"], 3]
Note that if I needed this data I would probably just return the 'counts' hash from the method and then do whatever analysis I wanted on it. If I was going to return multiple 'values' from an analysis method, I might return a Hash of named values. Although, returning an array allows you to unsplat the results:
hits, words, word_count = match_freq(list1, str)
p hits, words, word_count
#=> 4
#=> ["fred", "sam", "jack"]
#=> 3
For faster regular expressions, use https://github.com/mudge/re2. It is a ruby wrapper for Google re2 https://code.google.com/p/re2/

How to generate a random string in Ruby

I'm currently generating an 8-character pseudo-random uppercase string for "A" .. "Z":
value = ""; 8.times{value << (65 + rand(25)).chr}
but it doesn't look clean, and it can't be passed as an argument since it isn't a single statement. To get a mixed-case string "a" .. "z" plus "A" .. "Z", I changed it to:
value = ""; 8.times{value << ((rand(2)==1?65:97) + rand(25)).chr}
but it looks like trash.
Does anyone have a better method?
(0...8).map { (65 + rand(26)).chr }.join
I spend too much time golfing.
(0...50).map { ('a'..'z').to_a[rand(26)] }.join
And a last one that's even more confusing, but more flexible and wastes fewer cycles:
o = [('a'..'z'), ('A'..'Z')].map(&:to_a).flatten
string = (0...50).map { o[rand(o.length)] }.join
If you want to generate some random text then use the following:
50.times.map { (0...(rand(10))).map { ('a'..'z').to_a[rand(26)] }.join }.join(" ")
this code generates 50 random word string with words length less than 10 characters and then join with space
Why not use SecureRandom?
require 'securerandom'
random_string = SecureRandom.hex
# outputs: 5b5cd0da3121fc53b4bc84d0c8af2e81 (i.e. 32 chars of 0..9, a..f)
SecureRandom also has methods for:
base64
random_bytes
random_number
see: http://ruby-doc.org/stdlib-1.9.2/libdoc/securerandom/rdoc/SecureRandom.html
I use this for generating random URL friendly strings with a guaranteed maximum length:
string_length = 8
rand(36**string_length).to_s(36)
It generates random strings of lowercase a-z and 0-9. It's not very customizable but it's short and clean.
This solution generates a string of easily readable characters for activation codes; I didn't want people confusing 8 with B, 1 with I, 0 with O, L with 1, etc.
# Generates a random string from a set of easily readable characters
def generate_activation_code(size = 6)
charset = %w{ 2 3 4 6 7 9 A C D E F G H J K M N P Q R T V W X Y Z}
(0...size).map{ charset.to_a[rand(charset.size)] }.join
end
Others have mentioned something similar, but this uses the URL safe function.
require 'securerandom'
p SecureRandom.urlsafe_base64(5) #=> "UtM7aa8"
p SecureRandom.urlsafe_base64 #=> "UZLdOkzop70Ddx-IJR0ABg"
p SecureRandom.urlsafe_base64(nil, true) #=> "i0XQ-7gglIsHGV2_BNPrdQ=="
The result may contain A-Z, a-z, 0-9, “-” and “_”. “=” is also used if padding is true.
Since Ruby 2.5, it's really easy with SecureRandom.alphanumeric:
len = 8
SecureRandom.alphanumeric(len)
=> "larHSsgL"
It generates random strings containing A-Z, a-z and 0-9 and therefore should be applicable in most use-cases. And they are generated randomly secure, which might be a benefit, too.
This is a benchmark to compare it with the solution having the most upvotes:
require 'benchmark'
require 'securerandom'
len = 10
n = 100_000
Benchmark.bm(12) do |x|
x.report('SecureRandom') { n.times { SecureRandom.alphanumeric(len) } }
x.report('rand') do
o = [('a'..'z'), ('A'..'Z'), (0..9)].map(&:to_a).flatten
n.times { (0...len).map { o[rand(o.length)] }.join }
end
end
user system total real
SecureRandom 0.429442 0.002746 0.432188 ( 0.432705)
rand 0.306650 0.000716 0.307366 ( 0.307745)
So the rand solution only takes about 3/4 of the time of SecureRandom. That might matter if you generate a lot of strings, but if you just create some random string from time to time I'd always go with the more secure implementation since it is also easier to call and more explicit.
[*('A'..'Z')].sample(8).join
Generate a random 8 letter string (e.g. NVAYXHGR)
([*('A'..'Z'),*('0'..'9')]-%w(0 1 I O)).sample(8).join
Generate a random 8 character string (e.g. 3PH4SWF2), excludes 0/1/I/O. Ruby 1.9
I can't remember where I found this, but it seems like the best and the least process intensive to me:
def random_string(length=10)
chars = 'abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ0123456789'
password = ''
length.times { password << chars[rand(chars.size)] }
password
end
require 'securerandom'
SecureRandom.urlsafe_base64(9)
If you want a string of specified length, use:
require 'securerandom'
randomstring = SecureRandom.hex(n)
It will generate a random string of length 2n containing 0-9 and a-f
Array.new(n){[*"0".."9"].sample}.join,
where n=8 in your case.
Generalized: Array.new(n){[*"A".."Z", *"0".."9"].sample}.join, etc.
From: "Generate pseudo random string A-Z, 0-9".
Here is one line simple code for random string with length 8:
random_string = ('0'..'z').to_a.shuffle.first(8).join
You can also use it for random password having length 8:
random_password = ('0'..'z').to_a.shuffle.first(8).join
require 'sha1'
srand
seed = "--#{rand(10000)}--#{Time.now}--"
Digest::SHA1.hexdigest(seed)[0,8]
Ruby 1.9+:
ALPHABET = ('a'..'z').to_a
#=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
10.times.map { ALPHABET.sample }.join
#=> "stkbssowre"
# or
10.times.inject('') { |s| s + ALPHABET.sample }
#=> "fdgvacnxhc"
Be aware: rand is predictable for an attacker and therefore probably insecure. You should definitely use SecureRandom if this is for generating passwords. I use something like this:
length = 10
characters = ('A'..'Z').to_a + ('a'..'z').to_a + ('0'..'9').to_a
password = SecureRandom.random_bytes(length).each_char.map do |char|
characters[(char.ord % characters.length)]
end.join
Here is one simple code for random password with length 8:
rand_password=('0'..'z').to_a.shuffle.first(8).join
Another method I like to use:
rand(2**256).to_s(36)[0..7]
Add ljust if you are really paranoid about the correct string length:
rand(2**256).to_s(36).ljust(8,'a')[0..7]
SecureRandom.base64(15).tr('+/=lIO0', 'pqrsxyz')
Something from Devise
I think this is a nice balance of conciseness, clarity and ease of modification.
characters = ('a'..'z').to_a + ('A'..'Z').to_a
# Prior to 1.9, use .choice, not .sample
(0..8).map{characters.sample}.join
Easily modified
For example, including digits:
characters = ('a'..'z').to_a + ('A'..'Z').to_a + (0..9).to_a
Uppercase hexadecimal:
characters = ('A'..'F').to_a + (0..9).to_a
For a truly impressive array of characters:
characters = (32..126).to_a.pack('U*').chars.to_a
Just adding my cents here...
def random_string(length = 8)
rand(32**length).to_s(32)
end
This solution needs external dependency, but seems prettier than another.
Install gem faker
Faker::Lorem.characters(10) # => "ang9cbhoa8"
You can use String#random from the Facets of Ruby Gem facets.
It basically does this:
class String
def self.random(len=32, character_set = ["A".."Z", "a".."z", "0".."9"])
characters = character_set.map { |i| i.to_a }.flatten
characters_len = characters.length
(0...len).map{ characters[rand(characters_len)] }.join
end
end
My favorite is (:A..:Z).to_a.shuffle[0,8].join. Note that shuffle requires Ruby > 1.9.
Given:
chars = [*('a'..'z'),*('0'..'9')].flatten
Single expression, can be passed as an argument, allows duplicate characters:
Array.new(len) { chars.sample }.join
I was doing something like this recently to generate an 8 byte random string from 62 characters. The characters were 0-9,a-z,A-Z. I had an array of them as was looping 8 times and picking a random value out of the array. This was inside a Rails app.
str = ''
8.times {|i| str << ARRAY_OF_POSSIBLE_VALUES[rand(SIZE_OF_ARRAY_OF_POSSIBLE_VALUES)] }
The weird thing is that I got good number of duplicates. Now randomly this should pretty much never happen. 62^8 is huge, but out of 1200 or so codes in the db i had a good number of duplicates. I noticed them happening on hour boundaries of each other. In other words I might see a duple at 12:12:23 and 2:12:22 or something like that...not sure if time is the issue or not.
This code was in the before create of an ActiveRecord object. Before the record was created this code would run and generate the 'unique' code. Entries in the DB were always produced reliably, but the code (str in the above line) was being duplicated much too often.
I created a script to run through 100000 iterations of this above line with small delay so it would take 3-4 hours hoping to see some kind of repeat pattern on an hourly basis, but saw nothing. I have no idea why this was happening in my Rails app.
We've been using this on our code:
class String
def self.random(length=10)
('a'..'z').sort_by {rand}[0,length].join
end
end
The maximum length supported is 25 (we're only using it with the default anyway, so hasn't been a problem).
Someone mentioned that 'a'..'z' is suboptimal if you want to completely avoid generating offensive words. One of the ideas we had was removing vowels, but you still end up with WTFBBQ etc.
With this method you can pass in an abitrary length. It's set as a default as 6.
def generate_random_string(length=6)
string = ""
chars = ("A".."Z").to_a
length.times do
string << chars[rand(chars.length-1)]
end
string
end
I like Radar's answer best, so far, I think. I'd tweak a bit like this:
CHARS = ('a'..'z').to_a + ('A'..'Z').to_a
def rand_string(length=8)
s=''
length.times{ s << CHARS[rand(CHARS.length)] }
s
end
''.tap {|v| 4.times { v << ('a'..'z').to_a.sample} }
My 2 cents:
def token(length=16)
chars = [*('A'..'Z'), *('a'..'z'), *(0..9)]
(0..length).map {chars.sample}.join
end

Resources