I wondering if there is a way to return the first letter of a word. like if you type in word("hey") it will return just the letter h. or if you wanted to you could return the letter e. individually by themselves. I was considering using the break method or scan but I can't seem to make them work.
another method you can look at is chr which returns the first character of a string
>> 'hey'.chr # 'h'
you can also look at http://www.ruby-doc.org/core-1.9.3/String.html#method-i-slice to see how you can combine regexp and indexes to get a part of a string.
UPDATE: If you are on ruby 1.8, it's a bit hackish but
>> 'hey'[0] # 104
>> 'hey'[0..0] # 'h'
>> 'hey'.slice(0,1) # 'h'
Yes:
def first_letter(word)
word[0]
end
Or, if using Ruby 1.8:
def first_letter(word)
word.chars[0]
end
Use the syntax str[index] to get a specific letter of a word (0 is first letter, 1 second, and so on).
This is a naive implementation, but you could use method_missing to create a DSL that'd allow you to query a word for letters at different positions:
def self.method_missing(method, *args)
number_dictionary = {
first: 1,
second: 2,
third: 3,
fourth: 4,
fifth: 5,
sixth: 6,
seventh: 7,
eighth: 8,
ninth: 9,
tenth: 10
}
if method.to_s =~ /(.+)_letter/ && number = number_dictionary[$1.to_sym]
puts args[0][number - 1]
else
super
end
end
first_letter('hey') # => 'h'
second_letter('hey') # => 'e'
third_letter('hey') # => 'y'
Using your example - the word "hey":
h = "hey"
puts h[0]
This should return h.
Related
I'm creating a function that takes a string and creates an acronym but am running into errors.
When I input "Complementary metal-oxide semiconductor" I get "CS" in return when expecting "CMOS". Any suggestions why this might happen? I pass it plenty of other strings and it works, just doesn't work in this case.
class Acronym
def self.abbreviate(phrase)
letters = phrase.split("")
acronym = []
letters.each do |letter|
previous = letters.index(letter) - 1
if previous == -1
acronym.push(letter)
elsif letters[previous] == " " || letters[previous] == "-"
acronym.push(letter)
end
end
acronym.join("").upcase
end
end
Simplifies to
def acronym(str)
str.split(/ |-/).map(&:first).join.upcase
end
The above depends on the Rails activesupport library. Here's a Ruby-only variation:
str.split(/ |-/).map { |s| s[0] }.join.upcase
The issue with your code is that index() returns the first occurrence of the given letter. So, two problems:
The 'm' in 'metal' is not the first occurrence of 'm' in the string. It appears in the word 'complementary'. Thus, whenever it sees an 'm' in the string, previous will always be 'o' and thus not trigger a push().
Anytime the first letter in your string recurs (regardless of position), it will trigger your first condition. You can see the effect if you change the initial 'C' to 'c' in your test string. The result will be CSCC because there are two 'c's in 'semiconductor'.
As an alternative, here is an option that uses regex:
def self.abbreviate(phrase)
phrase.gsub('-', ' ')
.scan(/(\A\w|(?<=\s)\w)/)
.flatten
.join.upcase
end
Step by step:
Borrowing the .gsub from #DollarChills to turn the '-' into a space.
scan() returns an array of all matches. The regex matches the first word in the string and any word that is preceded by a space.
The result of the scan is actually an array of arrays, so flatten unnests them.
Combine into a string and upcase
You could try using gsub to ignore the hyphen.
<%= ('Complementary metal-oxide semiconductor').gsub('-', ' ') %>
Returns: Complementary metaloxide semiconductor
You have a bug in previous = letters.index(letter) - 1
see if you can spot it:
arr = [:a, :b, :c, :a]
previous_indexes = arr.map { |n| arr.index(n) - 1 }
you_are_expecting = [-1, 0, 1, 2]
previous_indexes == you_are_expecting
# => false
arr.index(:a) # => 0
arr.index(:b) # => 1
arr.index(:c) # => 2
arr.index(:a) # => 0
To get indexes with iterating, use with_index:
arr = %i[a b c a]
arr.map.with_index { |x, i| [x, i] }
# => [[:a, 0], [:b, 1], [:c, 2], [:a, 3]]
If you make that fix, your code does what you intended.
A suggestion though: you can often avoid dealing with the details of array indexes. Take a look at how #Mori's answer works by operating at a higher level.
I want to parse the formal list from https://www.loc.gov/marc/bibliographic/ecbdlist.html into a nested structure of hashes and arrays.
At first, I used a recursive approach - but ran into the problem that Ruby (and BTW also Python) can handle only less than 1000 recursive calls (stack level too deep).
I found "select_before" and it seemed great:
require 'pp'
# read list into array and get rid of unnecessary lines
marc = File.readlines('marc21.txt', 'r:utf-8')[0].lines.map(&:chomp).select { |line| line if !line.match(/^\s*$/) && !line.match(/^--.+/) }
# magic starts here
marc = marc.slice_before { |line| line[/^ */].size == 0 }.to_a
marc = marc.inject({}) { |hash, arr| hash = hash.merge( arr[0] => arr[1..-1] ) }
I now want to iterate these steps throughout the array. As the indentation levels in the list vary ([0, 2, 3, 4, 5, 6, 8, 9, 10, 12] not all of them always present), I use a helper method get_indentation_map to use only the smallest amount of indentation in each iteration.
But adding only one level (far from the goal of turning the whole array into the new structure), I get the error "no implicit conversion of Regex into Integer" the reason of which I fail to see:
def get_indentation_map( arr )
arr.map { |line| line[/^ */].size }
end
# starting again after slice_before of the unindented lines (== 0)
marc = marc.inject({}) do |hash, arr|
hash = hash.merge( arr[0] => arr[1..-1] ) # so far like above
# now trying to do the same on the next level
hash = hash.inject({}) do |h, a|
indentation_map = get_indentation_map( a ).uniq.sort
# only slice before smallest indentation
a = a.slice_before { |line| line[/^ */].size == indentation_map[0] }.to_a
h = h.merge( a[0] => a[1..-1] )
end
hash
end
I would be very grateful for hints how to best parse this list. I aim at a json-like structure in which every entry is the key for the further indented lines (if there are). Thanks in advance.
I've been attempting a coding exercise to mask all but the last four digits or characters of any input.
I think my solution works but it seems a bit clumsy. Does anyone have ideas about how to refactor it?
Here's my code:
def mask(string)
z = string.to_s.length
if z <= 4
return string
elsif z > 4
array = []
string1 = string.to_s.chars
string1[0..((z-1)-4)].each do |s|
array << "#"
end
array << string1[(z-4)..(z-1)]
puts array.join(", ").delete(", ").inspect
end
end
positive lookahead
A positive lookahead makes it pretty easy. If any character is followed by at least 4 characters, it gets replaced :
"654321".gsub(/.(?=.{4})/,'#')
# "##4321"
Here's a description of the regex :
r = /
. # Just one character
(?= # which must be followed by
.{4} # 4 characters
) #
/x # free-spacing mode, allows comments inside regex
Note that the regex only matches one character at a time, even though it needs to check up to 5 characters for each match :
"654321".scan(r)
# => ["6", "5"]
/(.)..../ wouldn't work, because it would consume 5 characters for each iteration :
"654321".scan(/(.)..../)
# => [["6"]]
"abcdefghij".scan(/(.)..../)
# => [["a"], ["f"]]
If you want to parametrize the length of the unmasked string, you can use variable interpolation :
all_but = 4
/.(?=.{#{all_but}})/
# => /.(?=.{4})/
Code
Packing it into a method, it becomes :
def mask(string, all_but = 4, char = '#')
string.gsub(/.(?=.{#{all_but}})/, char)
end
p mask('testabcdef')
# '######cdef'
p mask('1234')
# '1234'
p mask('123')
# '123'
p mask('x')
# 'x'
You could also adapt it for sentences :
def mask(string, all_but = 4, char = '#')
string.gsub(/\w(?=\w{#{all_but}})/, char)
end
p mask('It even works for multiple words')
# "It even #orks for ####iple #ords"
Some notes about your code
string.to_s
Naming things is very important in programming, especially in dynamic languages.
string.to_s
If string is indeed a string, there shouldn't be any reason to call to_s.
If string isn't a string, you should indeed call to_s before gsub but should also rename string to a better description :
object.to_s
array.to_s
whatever.to_s
join
puts array.join(", ").delete(", ").inspect
What do you want to do exactly? You could probably just use join :
[1,2,[3,4]].join(", ").delete(", ")
# "1234"
[1,2,[3,4]].join
# "1234"
delete
Note that .delete(", ") deletes every comma and every whitespace, in any order. It doesn't only delete ", " substrings :
",a b,,, cc".delete(', ')
# "abcc"
["1,2", "3,4"].join(', ').delete(', ')
# "1234"
Ruby makes this sort of thing pretty trivial:
class String
def asteriskify(tail = 4, char = '#')
if (length <= tail)
self
else
char * (length - tail) + self[-tail, tail]
end
end
end
Then you can apply it like this:
"moo".asteriskify
# => "moo"
"testing".asteriskify
# => "###ting"
"password".asteriskify(5, '*')
# => "***sword"
Try this one
def mask(string)
string[0..-5] = '#' * (string.length - 4)
string
end
mask("12345678")
=> "####5678"
I will add my solution to this topic too :)
def mask(str)
str.match(/(.*)(.{4})/)
'#' * ($1 || '').size + ($2 || str)
end
mask('abcdef') # => "##cdef"
mask('x') # => "x"
I offer this solution mainly to remind readers that String#gsub without a block returns an enumerator.
def mask(str, nbr_unmasked, mask_char)
str.gsub(/./).with_index { |s,i| i < str.size-nbr_unmasked ? mask_char : s }
end
mask("abcdef", 4, '#')
#=> "##cdef"
mask("abcdef", 99, '#')
#=> "######"
Try using tap
def mask_string(str)
str.tap { |p| p[0...-4] = '#' * (p[0...-4].length) } if str.length > 4
str
end
mask_string('ABCDEF') # => ##CDEF
mask_string('AA') # => AA
mask_string('S') # => 'S'
Given a search string and a result string (which is guaranteed to contain all letters of the search string, case-insensitive, in order), how can I most efficiently get an array of ranges representing the indices in the result string corresponding to the letters in the search string?
Desired output:
substrings( "word", "Microsoft Office Word 2007" )
#=> [ 17..20 ]
substrings( "word", "Network Setup Wizard" )
#=> [ 3..5, 19..19 ]
#=> [ 3..4, 18..19 ] # Alternative, acceptable, less-desirable output
substrings( "word", "Watch Network Daemon" )
#=> [ 0..0, 10..11, 14..14 ]
This is for an autocomplete search box. Here's a screenshot from a tool similar to Quicksilver that underlines letters as I'm looking to do. Note that--unlike my ideal output above--this screenshot does not prefer longer single matches.
Benchmark Results
Benchmarking the current working results shows that #tokland's regex-based answer is basically as fast as the StringScanner-based solutions I put forth, with less code:
user system total real
phrogz1 0.889000 0.062000 0.951000 ( 0.944000)
phrogz2 0.920000 0.047000 0.967000 ( 0.977000)
tokland 1.030000 0.000000 1.030000 ( 1.035000)
Here is the benchmark test:
a=["Microsoft Office Word 2007","Network Setup Wizard","Watch Network Daemon"]
b=["FooBar","Foo Bar","For the Love of Big Cars"]
test = { a=>%w[ w wo wor word ], b=>%w[ f fo foo foobar fb fbr ] }
require 'benchmark'
Benchmark.bmbm do |x|
%w[ phrogz1 phrogz2 tokland ].each{ |method|
x.report(method){ test.each{ |words,terms|
words.each{ |master| terms.each{ |term|
2000.times{ send(method,term,master) }
} }
} }
}
end
To have something to start with, how about that?
>> s = "word"
>> re = /#{s.chars.map{|c| "(#{c})" }.join(".*?")}/i # /(w).*?(o).*?(r).*?(d)/i/
>> match = "Watch Network Daemon".match(re)
=> #<MatchData "Watch Network D" 1:"W" 2:"o" 3:"r" 4:"D">
>> 1.upto(s.length).map { |idx| match.begin(idx) }
=> [0, 10, 11, 14]
And now you only have to build the ranges (if you really need them, I guess the individual indexes are also ok).
Ruby's Abbrev module is a good starting point. It breaks down a string into a hash consisting of the unique keys that can identify the full word:
require 'abbrev'
require 'pp'
abbr = Abbrev::abbrev(['ruby'])
>> {"rub"=>"ruby", "ru"=>"ruby", "r"=>"ruby", "ruby"=>"ruby"}
For every keypress you can do a lookup and see if there's a match. I'd filter out all keys shorter than a certain length, to reduce the size of the hash.
The keys will also give you a quick set of words to look up the subword matches in your original string.
For fast lookups to see if there's a substring match:
regexps = Regexp.union(
abbr.keys.sort.reverse.map{ |k|
Regexp.new(
Regexp.escape(k),
Regexp::IGNORECASE
)
}
)
Note that it's escaping the patterns, which would allow characters to be entered, such as ?, * or ., and be treated as literals, instead of special characters for regex, like they would normally be treated.
The result looks like:
/(?i-mx:ruby)|(?i-mx:rub)|(?i-mx:ru)|(?i-mx:r)/
Regexp's match will return information about what was found.
Because the union "ORs" the patterns, it will only find the first match, which will be the shortest occurrence in the string. To fix that reverse the sort.
That should give you a good start on what you want to do.
EDIT: Here's some code to directly answer the question. We've been busy at work so it's taken a couple days to get back this:
require 'abbrev'
require 'pp'
abbr = Abbrev::abbrev(['ruby'])
regexps = Regexp.union( abbr.keys.sort.reverse.map{ |k| Regexp.new( Regexp.escape(k), Regexp::IGNORECASE ) } )
target_str ='Ruby rocks, rub-a-dub-dub, RU there?'
str_offset = 0
offsets = []
loop do
match_results = regexps.match(target_str, str_offset)
break if (match_results.nil?)
s, e = match_results.offset(0)
offsets << [s, e - s]
str_offset = 1 + s
end
pp offsets
>> [[0, 4], [5, 1], [12, 3], [27, 2], [33, 1]]
If you want ranges replace offsets << [s, e - s] with offsets << [s .. e] which will return:
>> [[0..4], [5..6], [12..15], [27..29], [33..34]]
Here's a late entrant that's making a move as it nears the finish line.
code
def substrings( search_str, result_str )
search_chars = search_str.downcase.chars
next_char = search_chars.shift
result_str.downcase.each_char.with_index.take_while.with_object([]) do |(c,i),a|
if next_char == c
(a.empty? || i != a.last.last+1) ? a << (i..i) : a[-1]=(a.last.first..i)
next_char = search_chars.shift
end
next_char
end
end
demo
substrings( "word", "Microsoft Office Word 2007" ) #=> [17..20]
substrings( "word", "Network Setup Wizard" ) #=> [3..5, 19..19]
substrings( "word", "Watch Network Daemon" ) #=> [0..0, 10..11, 14..14]
benchmark
user system total real
phrogz1 1.120000 0.000000 1.120000 ( 1.123083)
cary 0.550000 0.000000 0.550000 ( 0.550728)
I don't think there are any built in methods that will really help with this, probably the best way is to go through each letter in the word you're searching for and build up the ranges manually. Your next best option would probably be to build a regex like in #tokland's answer.
Here's my implementation:
require 'strscan'
def substrings( search, master )
[].tap do |ranges|
scan = StringScanner.new(master)
init = nil
last = nil
prev = nil
search.chars.map do |c|
return nil unless scan.scan_until /#{c}/i
last = scan.pos-1
if !init || (last-prev) > 1
ranges << (init..prev) if init
init = last
end
prev = last
end
ranges << (init..last)
end
end
And here's a shorter version using another utility method (also needed by #tokland's answer):
require 'strscan'
def substrings( search, master )
s = StringScanner.new(master)
search.chars.map do |c|
return nil unless s.scan_until(/#{c}/i)
s.pos - 1
end.to_ranges
end
class Array
def to_ranges
return [] if empty?
[].tap do |ranges|
init,last = first
each do |o|
if last && o != last.succ
ranges << (init..last)
init = o
end
last = o
end
ranges << (init..last)
end
end
end
I am trying to return the index's to all occurrences of a specific character in a string using Ruby. A example string is "a#asg#sdfg#d##" and the expected return is [1,5,10,12,13] when searching for # characters. The following code does the job but there must be a simpler way of doing this?
def occurances (line)
index = 0
all_index = []
line.each_byte do |x|
if x == '#'[0] then
all_index << index
end
index += 1
end
all_index
end
s = "a#asg#sdfg#d##"
a = (0 ... s.length).find_all { |i| s[i,1] == '#' }
require 'enumerator' # Needed in 1.8.6 only
"1#3#a#".enum_for(:scan,/#/).map { Regexp.last_match.begin(0) }
#=> [1, 3, 5]
ETA: This works by creating an Enumerator that uses scan(/#/) as its each method.
scan yields each occurence of the specified pattern (in this case /#/) and inside the block you can call Regexp.last_match to access the MatchData object for the match.
MatchData#begin(0) returns the index where the match begins and since we used map on the enumerator, we get an array of those indices back.
Here's a less-fancy way:
i = -1
all = []
while i = x.index('#',i+1)
all << i
end
all
In a quick speed test this was about 3.3x faster than FM's find_all method, and about 2.5x faster than sepp2k's enum_for method.
Here's a long method chain:
"a#asg#sdfg#d##".
each_char.
each_with_index.
inject([]) do |indices, (char, idx)|
indices << idx if char == "#"
indices
end
# => [1, 5, 10, 12, 13]
requires 1.8.7+
Another solution derived from FMc's answer:
s = "a#asg#sdfg#d##"
q = []
s.length.times {|i| q << i if s[i,1] == '#'}
I love that Ruby never has only one way of doing something!
Here's a solution for massive strings. I'm doing text finds on 4.5MB text strings and the other solutions grind to a halt. This takes advantage of the fact that ruby .split is very efficient compared to string comparisions.
def indices_of_matches(str, target)
cuts = (str + (target.hash.to_s.gsub(target,''))).split(target)[0..-2]
indicies = []
loc = 0
cuts.each do |cut|
loc = loc + cut.size
indicies << loc
loc = loc + target.size
end
return indicies
end
It's basically using the horsepower behind the .split method, then using the separate parts and the length of the searched string to work out locations. I've gone from 30 seconds using various methods to instantaneous on extremely large strings.
I'm sure there's a better way to do it, but:
(str + (target.hash.to_s.gsub(target,'')))
adds something to the end of the string in case the target is at the end (and the way split works), but have to also make sure that the "random" addition doesn't contain the target itself.
indices_of_matches("a#asg#sdfg#d##","#")
=> [1, 5, 10, 12, 13]