counting length of number and character inside string using regex in ruby - ruby

how to counting length of number and character inside string using regex in ruby?
if i have some case like this, how to resolve it?
example :
abc = "12345678a"
after counting using regex, i want get result like this :
number = 8
char = 1
how to do that?

Try following
abc = "12345678a"
abc.scan(/\d/).length
# => 8
abc.scan(/\D/).length
# => 1

No regex:
abc = "12345678a"
p abc.count("0-9") # => 8
p abc.count("a-zA-Z") # => 1

This is an optional, but I still think regex is better.
irb(main):051:0> number, char = abc.bytes.to_a.partition { |e| e >= 48 and e <= 57}
=> [[49, 50, 51, 52, 53, 54, 55, 56], [97]]
irb(main):053:0> number.count
=> 8
irb(main):054:0> char.count
=> 1
partition: Returns two arrays, the first containing the elements of enum for which the block evaluates to true, the second containing the rest.

Related

ruby delete hidden characters from entire file

I have text file which contains the below hidden characters which I got with sed.
\033[H\033[2J\033
When I open file with vi and seeing the above code as below,
^[[H^[[2J^[[H^[[2J
Due to this hidden characters am facing some issue while processing file. Is there any to get rid of this hidden characters in entire file before processing it.
If the file size is not too big, you can read the whole file contents in, and then remove all the escaped sequences.
content = File.read('your_input_file_path')
content.gsub!(/\033\[(?:H|2J)/, '')
content.split(/\r?\n/).each do |line|
# process line
end
You can generalize the Regex used according to the escaped sequence pattern. In you example, it seems it's \033[ followed by an optional digit, and then a letter. Which can be updated as:
content.gsub!(/\033\[\d?[A-Z]/, '')
One way to remove non-printing characters with ASCII values less than 32 (" ".ord #=> 32) follows.
def remove_invisible(infile, outfile)
File.write(outfile,
File.read(infile).
codepoints.
reject { |n| n < 32 }.
map(&:chr).
join
)
Suppose File.read(infile) returns
str = "\033[H\033[2J\033"
#=> "\e[H\e[2J\e"
then
a = str.codepoints
#=> [27, 91, 72, 27, 91, 50, 74, 27]
b = a.reject { |n| n < 32 }
#=> [91, 72, 91, 50, 74]
c = b.map(&:chr)
#=> ["[", "H", "[", "2", "J"]
c.join
#=> "[H[2J"

Bracket notation on Ruby numbers

I found that when using bracket notation on the number 100 in Ruby, I get this:
irb(main):001:0> 100[0]
=> 0
irb(main):002:0> 100[1]
=> 0
irb(main):003:0> 100[2]
=> 1
So I assumed it was getting the digits, indexed like this:
NUMBER: 1|0|0
-----
INDEX: 2|1|0
I tried this on the number 789 with unexpected results.
irb(main):004:0> 789[0]
=> 1
irb(main):005:0> 789[1]
=> 0
irb(main):006:0> 789[2]
=> 1
I would expect it to return 9, then 8, then 7 if it was getting the digits. From this result, that is clearly not happening, so what exactly does using bracket notation on a number do?
These are the binary bits that you're pulling off. Another way to see this is using to_s with an argument indicating the desired base.
>> 789.to_s(2)
=> "1100010101"
String indexing is from left-to-right, so you can't compare [] on the string, but note how (from right-to-left) the digits are 1, 0, 1.
Here's the docs if you're interested: http://ruby-doc.org/core-1.9.3/Fixnum.html#method-i-5B-5D

How to change more than one string variables to integer and minus 1?

I want to change 8 strings to integer type and minus 1. I wrote the code as:
foo1, foo2, too3 ... = foo1.to_i - 1, foo2.to_i - 1, foo3.to_i -1, ...
But I think it's too complex. Are there some better ways to achieve this goal?
[:foo1, :foo2, ... etc. ...].each { |foo| eval "#{foo} = #{foo}.to_i - 1" }
Although it's a bad idea if you've decided to do it.
Putting them in an Array would be the simplest thing.
%w{ 1 2 3 4 5 6 7 8 }.map!(&:to_i).map!(&:pred)
=> [0, 1, 2, 3, 4, 5, 6, 7]
You should not use this, eval is dangerous and dynamic. I would recommend moving your values into a hash where you can control the keys. If you want to do literally what you asked:
(1..2).map {|n | eval("foo#{n}").to_i - 1}
Example:
> foo1 = 2
=> 2
> foo2 = 3
=> 3
> (1..2).map {|n | eval("foo#{n}").to_i - 1}
=> [1, 2]
... non-terrifying way to store/process these values:
> hash = { :foo1 => 2, :foo2 => 3 }
=> {:foo1=>2, :foo2=>3}
hash.keys.inject({}) { |h, key| h[key] = hash[key].to_i - 1; h }
=> {:foo1=>1, :foo2=>2}
When you are working with 8 variables and need to do the same operation on them it usually means that they are linked somehow and can be grouped, for example in a hash:
data = {:foo1 => "1", :foo2 => "2", ...}
data2 = Hash[data.map { |key, value| [key, value.to_i - 1] }]
Note that instead of updating inplace values I create a new object, a functional approach is usually more clear.
Based on your comment to #Winfields solution this might be enough:
foo1 = "123"
foo2 = "222"
too3 = "0"
def one_less_int(*args) # the splat sign (#) puts all arguments in an array
args.map{|str| str.to_i-1}
end
p one_less_int(foo1, foo2, too3) #=> [122, 221, -1]
But putting everything in an array beforehand, as others are suggesting, is more explicit.

Anagrams Code Kata, Ruby Solution very slow

I've been having a play with Ruby recently and I've just completed the Anagrams Code Kata from http://codekata.pragprog.com.
The solution was test driven and utilises the unique prime factorisation theorem, however it seems to run incredibly slow. Just on the 45k file it's been running for about 10 minutes so far. Can anyone give me any pointers on improving the performance of my code?
class AnagramFinder
def initialize
#words = self.LoadWordsFromFile("dict45k.txt")
end
def OutputAnagrams
hash = self.CalculatePrimeValueHash
#words.each_index{|i|
word = #words[i]
wordvalue = hash[i]
matches = hash.select{|key,value| value == wordvalue}
if(matches.length > 1)
puts("--------------")
matches.each{|key,value|
puts(#words[key])
}
end
}
end
def CalculatePrimeValueHash
hash = Hash.new
#words.each_index{|i|
word = #words[i]
value = self.CalculatePrimeWordValue(word)
hash[i] = value
}
hash
end
def CalculatePrimeWordValue(word)
total = 1
hash = self.GetPrimeAlphabetHash
word.downcase.each_char {|c|
value = hash[c]
total = total * value
}
total
end
def LoadWordsFromFile(filename)
contentsArray = []
f = File.open(filename)
f.each_line {|line|
line = line.gsub(/[^a-z]/i, '')
contentsArray.push line
}
contentsArray
end
def GetPrimeAlphabetHash
hash = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
end
end
Frederick Cheung has a few good points, but I thought I might provide you with a few descriptive examples.
I think your main problem is that you create your index in a way that forces you to do linear searches in it.
Your word list (#words) seems to look something like this:
[
"ink",
"foo",
"kin"
]
That is, it is just an array of words.
Then you create your hash index with CalculatePrimeValueHash, with hash keys being equal to the word's index in #words.
{
0 => 30659, # 23 * 43 * 31, matching "ink"
1 => 28717, # 13 * 47 * 47, matching "foo"
2 => 30659 # 31 * 23 * 43, matching "kin"
}
I would consider this a good start, but the thing is if you keep it like this, you will have to iterate through the hash to find what hash keys (i.e. indexes in #words) that belong together, and then iterate through those to join them. That is, the basic problem here is that you do things too granularly.
If you instead were to build this hash with the prime values as hash keys, and have them point to an array of the words with that key, you would get a hash index like this instead:
{
30659 => ["ink", "kin"],
28717 => ["foo"]
}
With this kind of structure, the only thing you have to do to write your output, is to just iterate over the hash values and print them, since they are already grouped.
Another thing with your code, is that it seems to generate a whole bunch of throwaway objects , which will make sure to keep your garbarge collector busy, and that is generally quite a big choke point in ruby.
It might also be a good thing to go find either a benchmark tool and/or a profiler to analyze your code and see where it could be approved upon.
Fundamentally your code is slow because for each word (45k) of them you iterate over the entire hash (45k of them) looking for words with the same signature, so you're doing 45k * 45k of these comparisons. Another way of phrasing that is to say that your complexity is n^2 in the number of words.
The code below implements your basic idea but runs in a few seconds on the 236k word file I happen to have lying around. It could definitely be faster - the second pass over the data to find the things with > 1 items could be eliminated but would be less readable
It's also a lot shorter than your code, around a third, while staying readable, largely because I used more standard library functions and idiomatic ruby.
For example, the load_words method uses collect to turn one array into another, rather than iterating over one array and adding things to a second one. Similarly the signature function uses inject rather than iterating over the characters. Lastly I've used group_by to do the actual grouping. All of these methods happen to be in Enumerable - it's well worth becoming very familiar with these.
signature_for_word could become even pithier with
word.each_char.map {|c| CHAR_MAP[c.downcase]}.reduce(:*)
This takes the word, splits it into characters and then maps each one of those to the right number. reduce(:*) (reduce is an alias for inject) then multiplies them all together.
class AnagramFinder
CHAR_MAP ={ "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
def initialize
#words = load_words("/usr/share/dict/words")
end
def find_anagrams
words_by_signature = #words.group_by {|word| signature_for_word word}
words_by_signature.each do |signaure, words|
if words.length > 1
puts '----'
puts words.join('; ')
end
end
end
def signature_for_word(word)
word.downcase.each_char.inject(1) {| total, c| total * CHAR_MAP[c]}
end
def load_words(filename)
File.readlines(filename).collect {|line| line.gsub(/[^a-z]/i, '')}
end
end
You can start limiting the slowness by using the Benchmark tool. Some examples here:
http://www.skorks.com/2010/03/timing-ruby-code-it-is-easy-with-benchmark/
First of all it would be interesting to see how long it takes to run self.calculate_prime_value_hash and after that the calculate_prime_word_value.
Quite often the slowness boils down to the number of times the inners loops are run so you can also log how many times they are run.
One very quick improvement you can do is to set the prime alhabet hash as a constant because it's not changed at all:
PRIME_ALPHABET_HASH = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }

Ruby regex for a split every four characters not working

I'm trying to split a sizeable string every four characters. This is how I'm trying to do it:
big_string.split(/..../)
This is yielding a nil array. As far as I can see, this should be working. It even does when I plug it into an online ruby regex test.
Try scan instead:
$ irb
>> "abcd1234beefcake".scan(/..../)
=> ["abcd", "1234", "beef", "cake"]
or
>> "abcd1234beefcake".scan(/.{4}/)
=> ["abcd", "1234", "beef", "cake"]
If the number of characters isn't divisible by 4, you can also grab the remaining characters:
>> "abcd1234beefcakexyz".scan(/.{1,4}/)
=> ["abcd", "1234", "beef", "cake", "xyz"]
(The {1,4} will greedily grab between 1 and 4 characters)
Hmm, I don't know what Rubular is doing there and why - but
big_string.split(/..../)
does translate into
split the string at every 4-character-sequence
which should correctly result into something like
["", "", "", "abc"]
Whoops.
str = 'asdfasdfasdf'
c = 0
out = []
inum = 4
(str.length / inum).round.times do |s|
out.push(str[c, round(s * inum)])
c += inum
end

Resources