I'm looking to split a numeric random string like "12345567" into the array ["12","345","567"] as simply as possible. basically changing a number into a human readable number array with splits at thousands,million, billions, etc..
my previous solution cuts it from the front rather than back
"'12345567".to_s.scan(/.{1,#{3}}/)
#> ["123","455","67"]
If you are on Rails, you can use the number_with_delimiter helper. In plain Ruby, you can include it.
require 'action_view'
require 'action_view/helpers'
include ActionView::Helpers::NumberHelper
number_with_delimiter("12345567", :delimiter => ',')
# => "12,345,567"
You can do a split on the comma, to get an Array
You could try the below.
> "12345567".scan(/\d+?(?=(?:\d{3})*$)/)
=> ["12", "345", "567"]
\d+? will do a non-greedy match of one or more digits which must be followed by exactly three digits, zero or more times and further followed by the end of a line.
\d+? will do a non-greedy match of one or more digits.
(?=..) called positive lookahead assertion which asserts that the match must be followed by,
(?:\d{3})* exactly three digits of zero or more times. So this would match an empty string or 111 or 111111 like multiples of 3.
$ End of the line anchor which matches the boundary which exists at the last.
OR
> "12345567".scan(/.{1,3}(?=(?:.{3})*$)/)
=> ["12", "345", "567"]
Here's one non-regex solution:
s = "12345567"
sz = s.size
n_first = sz % 3
((n_first>0) ? [s[0,n_first]] : []) + (n_first...sz).step(3).map { |i| s[i,3] }
#=> ["12", "345", "567"]
Another:
s.reverse.chars.each_slice(3).map { |a| a.join.reverse }.reverse
#=> ["12", "345", "567"]
A recursive approach:
def split(str)
str.size <= 3 ? [str] : (split(str[0..-4]) + [str[-3..-1]])
end
Hardly readable, though. Perhaps a more explicit code layout:
def split(str)
if str.size <= 3 then
[str] # Too short, keep it all.
else
split(str[0..-4]) + [str[-3..-1]] # Append the last 3, and recurse on the head.
end
end
Disclaimer: No test whatsoever on performance (or attempt to go for a clear tail recursion)! Just an alternative to explore.
It's hard to tell what you want, but maybe:
"12345567".scan(/^..|.{1,3}/)
=> ["12", "345", "567"]
Related
Given a sentence, I want to count all the duplicated words:
It is an exercice from Exercism.io Word count
For example for the input "olly olly in come free"
plain
olly: 2
in: 1
come: 1
free: 1
I have this test for exemple:
def test_with_quotations
phrase = Phrase.new("Joe can't tell between 'large' and large.")
counts = {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
assert_equal counts, phrase.word_count
end
this is my method
def word_count
phrase = #phrase.downcase.split(/\W+/)
counts = phrase.group_by{|word| word}.map {|k,v| [k, v.count]}
Hash[*counts.flatten]
end
For the test above I have this failure when I run it in the terminal:
2) Failure:
PhraseTest#test_with_apostrophes [word_count_test.rb:69]:
--- expected
+++ actual
## -1 +1 ##
-{"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
+{"first"=>1, "don"=>2, "t"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
My problem is to remove all chars except 'apostrophe...
the regex in the method almost works...
phrase = #phrase.downcase.split(/\W+/)
but it remove the apostrophes...
I don't want to keep the single quote around a word, 'Hello' => Hello
but Don't be cruel => Don't be cruel
Maybe something like:
string.scan(/\b[\w']+\b/i).each_with_object(Hash.new(0)){|a,(k,v)| k[a]+=1}
The regex employs word boundaries (\b).
The scan outputs an array of the found words and for each word in the array they are added to the hash, which has a default value of zero for each item which is then incremented.
Turns out my solution whilst finding all items and ignoring case will still leave the items in the case they were found in originally.
This would now be a decision for Nelly to either accept as is or to perform a downcase on the original string or the array item as it is added to the hash.
I'll leave that decision up to you :)
Given:
irb(main):015:0> phrase
=> "First: don't laugh. Then: don't cry."
Try:
irb(main):011:0> Hash[phrase.downcase.scan(/[a-z']+/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
=> {"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
With your update, if you want to remove single quotes, do that first:
irb(main):038:0> p2
=> "Joe can't tell between 'large' and large."
irb(main):039:0> p2.gsub(/(?<!\w)'|'(?!\w)/,'')
=> "Joe can't tell between large and large."
Then use the same method.
But you say -- gsub(/(?<!\w)'|'(?!\w)/,'') will remove the apostrophe in 'Twas the night before. Which I reply you will eventually need to build a parser that can determine the distinction between an apostrophe and a single quote if /(?<!\w)'|'(?!\w)/ is not sufficient.
You can also use word boundaries:
irb(main):041:0> Hash[p2.downcase.scan(/\b[a-z']+\b/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
=> {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
But that does not solve 'Tis the night either.
Another way:
str = "First: don't 'laugh'. Then: 'don't cry'."
reg = /
[a-z] #single letter
[a-z']+ #one or more letters or apostrophe
[a-z] #single letter
'? #optional single apostrophe
/ix #case-insensitive and free-spacing regex
str.scan(reg).group_by(&:itself).transform_values(&:count)
#=> {"First"=>1, "don't"=>2, "laugh"=>1, "Then"=>1, "cry'"=>1}
An anagram group is a group of words such that any one can be converted into any other just by rearranging the letters. For example, "rats", "tars" and "star" are an anagram group.
Now I have an array of words and I am going to find the anagram words
to find this I have written the following code
actually it works for some words like scar and cars, but it doesn't work
for [scar , carts].
temp=[]
words.each do |e|
temp=e.split(//) # make an array of letters
words.each do |z|
if z.match(/#{temp}/) # match to find scar and cars
puts "exp is True"
else
puts "exp is false"
end
end
end
I just think that while [abc] means a or b or c I can separate my words to letters and then look for other cases in the array
Your algorithm is incorrect and inefficient (quadratic time complexity). Why regex?
Here's another idea. Define the signature of a word such that all the letters of a word are sorted. For example, the signature of hello is ehllo.
By this definition, anagrams are words that have the same signature, for example, rats, tars and star all have the signature arst. The code to implement this idea is straight-forward.
Two words are anagrams if they contain the same letters. There are several ways to figure out whether they do, the most obvious one is sorting the letters alphabetically. Then you want to separate the words into groups. Here's an idea:
words = %w[cats scat rats tars star scar cars carts]
words.group_by {|word| word.each_char.sort }.values
# => [['cats', 'scat'], ['rats', 'tars', 'star'], ['scar', 'cars'], ['carts']]
The problem is that /#{e.split(//)}/ here is pretty much nonsensical.
To illustrate this, lets see what happens:
word = 'wtf'
letters = word.split(//) # => ["w", "t", "f"]
regex = /#{letters}/ # => /["w", "t", "f"]/
'"'.match(regex) # => 0
','.match(regex) # => 0
' '.match(regex) # => 0
't'.match(regex) # => 0
What happens is interpolating something in a regex replaces it with the result of its to_s method. And since character sets match a single character in what's inside, you will get a regex that matches " or , or or any of the letters in the original word.
Therefore, I will unfortunately call your solution unsalvageable.
A very easy way to check if two words are anagrams is to sort their characters and see if the result is the same.
The faster way would be:
def is_anagram? w1, w2
w1.chars.sort == w2.chars.sort
end
You could also do something like this I suppose:
def is_anagram? w1, w2
w2 = w2.chars
w1.chars.permutation.to_a.include?(w2)
end
then run it like this:
is_anagram? "rats", "star"
=> true
Note:
This post has been edited as per Cary Swoveland's advice.
words = ['demo', 'none', 'tied', 'evil', 'dome', 'mode', 'live',
'fowl', 'veil', 'wolf', 'diet', 'vile', 'edit', 'tide',
'flow', 'neon']
groups = words.group_by { |word| word.split('').sort }
groups.each { |x, y| p y }
Say that we want to count the number of words in a document. I know we can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don't want to count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
You can wrap this up pretty neatly:
text.each_line do |line|
total_words += line.split.reject do |word|
word.match(/\A(\d+|\w|\S*\#\S+\.\S+)\z/)
end.length
end
Roughly speaking that defines an approximate email address.
Remember Ruby strongly encourages the use of variables with names like total_words and not totalWords.
assuming you can represent all the exceptions in a single regular expression regex_variable, you could do:
text.each_line(){ |line| totalWords = totalWords + line.split.count {|wrd| wrd !~ regex_variable }
your regular expression could look something like:
regex_variable = /\d.|^[a-z]{1}$|\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
I don't claim to be a regex expert, so you may want to double check that, particularly the email validation part
In addition to the other answers, a little gem hunting came up with this:
WordsCounted Gem
Get the following data from any string or readable file:
Word count
Unique word count
Word density
Character count
Average characters per word
A hash map of words and the number of times they occur
A hash map of words and their lengths
The longest word(s) and its length
The most occurring word(s) and its number of occurrences.
Count invividual strings for occurrences.
A flexible way to exclude words (or anything) from the count. You can pass a string, a regexp, an array, or a lambda.
Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
Filters special characters but respects hyphens and apostrophes.
Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as ["São", "Paulo"] and not ["S", "", "o", "Paulo"].
Opens and reads files. Pass in a file path or a url instead of a string.
Have you ever started answering a question and found yourself wandering, exploring interesting, but tangential issues, or concepts you didn't fully understand? That's what happened to me here. Perhaps some of the ideas might prove useful in other settings, if not for the problem at hand.
For readability, we might define some helpers in the class String, but to avoid contamination, I'll use Refinements.
Code
module StringHelpers
refine String do
def count_words
remove_punctuation.split.count { |w|
!(w.is_number? || w.size == 1 || w.is_email_address?) }
end
def remove_punctuation
gsub(/[.!?,;:)](?:\s|$)|(?:^|\s)\(|\-|\n/,' ')
end
def is_number?
self =~ /\A-?\d+(?:\.\d+)?\z/
end
def is_email_address?
include?('#') # for testing only
end
end
end
module CountWords
using StringHelpers
def self.count_words_in_file(fname)
IO.foreach(fname).reduce(0) { |t,l| t+l.count_words }
end
end
Note that using must be in a module (possibly a class). It does not work in main, presumably because that would make the methods available in the class self.class #=> Object, which would defeat the purpose of Refinements. (Readers: please correct me if I'm wrong about the reason using must be in a module.)
Example
Let's first informally check that the helpers are working correctly:
module CheckHelpers
using StringHelpers
s = "You can reach my dog, a 10-year-old golden, at fido#dogs.org."
p s = s.remove_punctuation
#=> "You can reach my dog a 10 year old golden at fido#dogs.org."
p words = s.split
#=> ["You", "can", "reach", "my", "dog", "a", "10",
# "year", "old", "golden", "at", "fido#dogs.org."]
p '123'.is_number? #=> 0
p '-123'.is_number? #=> 0
p '1.23'.is_number? #=> 0
p '123.'.is_number? #=> nil
p "fido#dogs.org".is_email_address? #=> true
p "fido(at)dogs.org".is_email_address? #=> false
p s.count_words #=> 9 (`'a'`, `'10'` and "fido#dogs.org" excluded)
s = "My cat, who has 4 lives remaining, is at abbie(at)felines.org."
p s = s.remove_punctuation
p s.count_words
end
All looks OK. Next, put I'll put some text in a file:
FName = "pets"
text =<<_
My cat, who has 4 lives remaining, is at abbie(at)felines.org.
You can reach my dog, a 10-year-old golden, at fido#dogs.org.
_
File.write(FName, text)
#=> 125
and confirm the file contents:
File.read(FName)
#=> "My cat, who has 4 lives remaining, is at abbie(at)felines.org.\n
# You can reach my dog, a 10-year-old golden, at fido#dogs.org.\n"
Now, count the words:
CountWords.count_words_in_file(FName)
#=> 18 (9 in ech line)
Note that there is at least one problem with the removal of punctuation. It has to do with the hyphen. Any idea what that might be?
Something like...?
def is_countable(word)
return false if word.size < 2
return false if word ~= /^[0-9]+$/
return false if is_an_email_address(word) # you need a gem for this...
return true
end
wordCount = text.split().inject(0) {|count,word| count += 1 if is_countable(word) }
Or, since I am jumping to the conclusion that you can just split your entire text into an array with split(), you might need:
wordCount = 0
text.each_line do |line|
line.split.each{|word| wordCount += 1 if is_countable(word) }
end
I tried to write a function which will be able to randomly change letters in word except first and last one.
def fun(string)
z=0
s=string.size
tab=string
a=(1...s-1).to_a.sample s-1
for i in 1...(s-1)
puts tab[i].replace(string[a[z]])
z=z+1
end
puts tab
end
fun("sample")
My output is:
p
l
a
m
sample
Anybody know how to make it my tab be correct?
it seems to change in for block, because in output was 'plamp' so it's random as I wanted but if I want to print the whole word (splampe) it doesn't working. :(
What about:
def fun(string)
first, *middle, last = string.chars
[first, middle.shuffle, last].join
end
fun("sample") #=> "smalpe"
s = 'sample'
[s[0], s[1..-2].chars.shuffle, s[-1]].join
# => "slpmae"
Here is my solution:
def fun(string)
first = string[0]
last = string[-1]
middle = string[1..-2]
puts "#{first}#{middle.split('').shuffle.join}#{last}"
end
fun('sample')
there are some problems with your function. First, when you say tab=string, tab is now a reference to string, so, when you change characters on tab you change the string characters too. I think that for clarity is better to keep the index of sample (1....n)to reference the position in the original array.
I suggest the usage of tab as a new array.
def fun(string)
if string.length <= 2
return
z=1
s=string.size
tab = []
tab[0] = string[0]
a=(1...s-1).to_a.sample(s-1)
(1...s-1).to_a.each do |i|
tab[z] = string[a[i - 1]]
z=z+1
end
tab.push string[string.size-1]
tab.join('')
end
fun("sample")
=> "spalme"
Another way, using String#gsub with a block:
def inner_jumble(str)
str.sub(/(?<=\w)\w{2,}(?=\w)/) { |s| s.chars.shuffle.join }
end
inner_jumble("pneumonoultramicroscopicsilicovolcanoconiosis") # *
#=> "poovcanaiimsllinoonroinuicclprsciscuoooomtces"
inner_jumble("what ho, fellow coders?")
#=> "waht ho, folelw coedrs?"
(?<=\w) is a ("zero-width") positive look-behind that requires the match to immediately follow a word character.
(?=\w) is a ("zero-width") positive look-ahead that requires the match to be followed immediately by a word character.
You could use \w\w+ in place of \w{2,} for matching two or more consecutive word characters.
If you only want it to apply to individual words, you can use gsub or sub.
*A lung disease caused by inhaling very fine ash and sand dust, supposedly the longest word in some English dictionaries.
I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.
You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end
str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"