Ruby - Abbreviating a string containing a name to first name last initial - ruby

Fairly simple question I need to take a string containing, for example, "Bob Smith" and return "Bob S." - or "Javier de Luca" and return "Javier de L.". In other words, abbreviate the last word in a string to just the first initial and add a period.
Here's what I have - it works, but it seems clumsy.
str = str.split(' ')
str[str.length - 1] = "#{str.last[0]}."
str = str.join(' ')
Surely, there's a more elegant way.

>> "Bob Smith".sub(/(.+\b.).+\z/, '\1.')
=> "Bob S."
>> "Javier de Luca".sub(/(.+\b.).+\z/, '\1.')
=> "Javier de L."
This regular expression captures the entire string until the second character of the last word. It then replaces this string with the capture plus a period ala \1.

What about this:
name = 'Javier de Luca'
name.sub!(/(\w)\w+$/, '\1.')

You could use tap in 1.9:
str = str.split(/\s+/).tap { |a| a[-1].sub!(/(.).+/) { "#{$1}." } }.join(' ')
Using a[-1].sub! will modify the last element in-place so the tap block modifies a as well as passing it through to the join call. And, the .+ takes care of leaving strange names like Joe B alone; if you want that to become Joe B. them use .* instead of .+.

Related

How to replace every 4th character of a string using .gsub in Ruby?

Beginner here, obviously. I need to add a sum and a string together and from the product, I have to replace every 4th character with underscore, the end product should look something like this: 160_bws_np8_1a
I think .gsub is the way, but I can find a way to format the first part in .gsub where I have to specify every 4th character.
total = (1..num).sum
final_output = "#{total.to_s}" + "06bwsmnp851a"
return final_output.gsub(//, "_")
This would work:
s = '12345678901234'
s.gsub(/(...)./, '\1_')
#=> "123_567_901_34"
The regex matches 3 characters (...) that are captured (parentheses) followed by another character (.). Each match is replaced by the first capture (\1) and a literal underscore (_).
s = "12345678901234"
Here are two ways to do that. Both return
"123_567_901_34"
Match every four-character substring and replace the match with the first three characters of the match followed by an underscore
s.gsub(/.{4}/) { |s| s[0,3] << '_' }
Chain the enumerator s.gsub(/./) to Enumerator#with_index and replace every fourth character with an underscore
s.gsub(/./).with_index { |c,i| i%4 == 3 ? '_' : c }
See the form of String#gsub that takes a single argument and no block.

Regex to obfuscate substring of a repeating substring

Given a string like:
abc_1234 xyz def_123aa4a56
I want to replace parts of it so the output is:
abc_*******z def_*******56
The rules are:
abc_ and def_ are kind of delimiters, so anything between the two are part of the previous delimiter string.
The string between the abc_ and def_, and the next delimited string should be replaced by *, except for the last 2 characters of that substring. In the above example, abc_1234 xyz (note trailing space), got turned into abc_*******z
prefixes = %w|abc_ def_|
input = "Hello abc_111def_frg def_333World abc_444"
input.gsub(/(#{Regexp.union(prefixes)})../, "\\1**")
#⇒ "Hello abc_**1def_**g def_**3World abc_**4"
Is this what you are looking for?
str = "Hello abc_111def_frg def_333World abc_444"
str.scan(/(?<=abc_|def_)(?:[[:alpha:]]+|[[:digit:]]+)/)
# => ["111", "frg", "333", "444"]
I've assumed the string following "abc_" or "def_" is either all digits or all letters. It won't work if, for example, you wished to extract "a1b" from "abc_a1b cat". You need to better define the rules for what terminates the strings you want.
The regular expression reads, "Following the string "abc_" or "def_" (a positive lookbehind that is not part of the match), match a string of digits or a string of letters".
Given:
> s
=> "abc_1234 xyz def_123aa4a56"
You can do:
> s.gsub(/(?<=abc_|def_)(.*?)(..)(?=(?:abc_|def_|$))/) { |m| "*" * $1.length<<$2 }
=> "abc_*******z def_*******56"

How to convert this 'hash-like' string to a key value pair

I'm using Ruby 2.2 and have a string that looks like this:
myvar = '{"myval1"=>"value1","mayval2"=>"value2"}'
How can I get this into a key-value pair and/or hash of some sort? When I do myvar['myval1'] I get back 'myval1', which isn't quite what I'm after. The answer's probably staring right at me but nothing's worked so far.
As I've seen times and times again - simply mentioning eval makes people instantly upset, even if it was a proper use case (which this is not).
So I'm going to go with another hate magnet - parsing nested structures with regexes.
Iteration (1) - a naive approach:
JSON.parse(myvar.gsub(/=>/, ':'))
Problem - will mess up your data if the string key/values contain =>.
Iteration (2) - even number of "s remaining mean you are not inside a string:
JSON.parse(myvar.gsub(/=>(?=(?:[^"]*"){2}*[^"]*$)/, ':'))
Problem - there might be a " inside a string, that is escaped with a slash.
Iteration (3) - like iteration (2), but count only " that are preceded by unescaped slashes. An unescaped slash would be a sequence of odd number of slashes:
eq_gt_finder = /(?<non_quote>
(?:
[^"\\]|
\\{2}*\\.
)*
){0}
=>(?=
(?:
\g<non_quote>
"
\g<non_quote>
){2}*
$
)/x
JSON.parse(myvar.gsub(eq_gt_finder, ':'))
See it in action
Q: Are you an infallible divine creature that is absolutely certain this will work 100% of the time?
A: Nope.
Q: Isn't this slow and unreadable as shit?
Q: Ok?
A: Yep.
You can change that string to valid JSON easily and use JSON.parse then:
require 'JSON'
myvar = '{"myval1"=>"value1","mayval2"=>"value2"}'
hash = JSON.parse(myvar.gsub(/=>/, ': '))
#=> { "myval1" => "value1", "mayval2" => "value2" }
hash['myval1']
#=> "value1"

Remove words from string which are present in some set

I want to remove words from a string which are there in some set. One way is iterate over this set and remove the particular word using str.gsub("subString", ""). Does this kind of function already exits ?
Example string :
"Hotel Silver Stone Resorts"
Strings in set:
["Hotel" , "Resorts"]
Output should be:
" Silver Stone "
You can build a union of several patterns with Regexp::union:
words = ["Hotel" , "Resorts"]
re = Regexp.union(words)
#=> /Hotel|Resorts/
"Hotel Silver Stone Resorts".gsub(re, "")
#=> " Silver Stone "
Note that you might have to escape your words.
You can subtract one array from another in ruby. Result is that all elements from the first array are removed from the second.
Split the string on whitespace, remove all extra words in one swift move, rejoin the sentence.
s = "Hotel Silver Stone Resorts"
junk_words = ['Hotel', 'Resorts']
def strip_junk(original, junk)
(original.split - junk).join(' ')
end
strip_junk(s, junk_words) # => "Silver Stone"
It certainly looks better (to my eye). Not sure about performance characteristics (too lazy to benchmark it)
I am not sure what you wanted but as I understood
sentence = 'Hotel Silver Stone Resorts'
remove_words = ["Hotel" , "Resorts"] # you can add words to this array which you wanted to remove
sentence.split.delete_if{|x| remove_words.include?(x)}.join(' ')
=> "Silver Stone"
OR
if you have an array of strings, it's easier:
sentence = 'Hotel Silver Stone Resorts'
remove_words = ["Hotel" , "Resorts"]
(sentence.split - remove_words).join(' ')
=> "Silver Stone"
You could try something different , but I don't know if it will be faster or not (depends on the length of your strings and set)
require 'set'
str = "Hotel Silver Stone Resorts"
setStr = Set.new(str.split)
setToRemove = Set.new( ["Hotel", "Resorts"])
modifiedStr = (setStr.subtract setToRemove).to_a.join " "
Output
"Silver Stone"
It uses the Set class which is faster for retrieving single element (built on Hash).
But again, the underlying transformation with to_a may not improve speed if your strings / set are very big.
It also remove implicitly the duplicates in your string and your set (when your create the sets)

Lookbehind and lookahead regex

I have a strings like this:
journals/cl/SantoNR90:::Michele Di Santo::Libero Nigro::Wilma
Russo:::Programmer-Defined Control Abstractions in Modula-2
I need to capture Michele Di Santo, Libero Nigro, Wilma Russo but not the last one.
This regex matches almost what I need:
/(?<=::).*?(?=::)/
But it has problem, it captures the third colon
str.scan(/(?<=::).*?(?=::)/) #=> [":Michele Di Santo", ...]
As you can see, the first match has a colon at the beginning.
How to fix this regex to avoid this third colon?
Don't use regex for this. All you need to do is split the input string on :::, take the second string from the resulting array, and split that on ::. Faster to code, faster to run, and easier to read than a regex version.
Edit: The code:
str.split(':::')[1].split('::')
Running on CodePad: http://codepad.org/1BNNwoh6
An expression to do that could be:
(?<=::)[^:].*?(?=::)
Although if the string to be searched is always in the form of "xxx:::A::B::C:::xxx" and you only care about A, B and C, consider using something more specific, and using the capture groups to get A, B and C:
:::(.+?)::(.+?)::(.+?):::
$1, $2 and $3 will contain the group matches.
I'd use a simple split because the string is basically a CSV with colons instead of commas:
str = 'journals/cl/SantoNR90:::Michele Di Santo::Libero Nigro::Wilma Russo:::Programmer-Defined Control Abstractions in Modula-2'
items = split(':')
str1, str2, str3 = items[3], items[5], items[7]
=> [
[0] "Michele Di Santo",
[1] "Libero Nigro",
[2] "Wilma Russo"
]
You could also use:
str1, str2, str3 = str.split(':').select{ |s| s > '' }[1, 3]
If it's possible to have quoted colons, use the CSV module and set your field delimiter to ':'.

Resources