Use middle of a string variable in Chef - ruby

I have a code like this in Chef
{
'home/user1/folder/file.erb'=>'/home/user1/folder/file',
'home/user2/folder/file.erb'=>'/home/user2/folder/file',
'home/user3/folder/file.erb'=>'/home/user3/folder/file',
'home/user4/folder/file.erb'=>'/home/user4/folder/file',
}.each do |s,d|
template d do
source s
owner user
group user
mode '600'
end
end
How do I replace value of owner and group with user1, user2, user3... from variable d?
Thanks!

Split Your Hash Values on /
There are certainly other ways to do this, but given your example an easy trick is simply to grab the user's directory from each hash value into a block-local variable at the top of each loop, which you can then reuse as needed. For example:
{
'home/user1/folder/file.erb' => '/home/user1/folder/file',
'home/user2/folder/file.erb' => '/home/user2/folder/file',
'home/user3/folder/file.erb' => '/home/user3/folder/file',
'home/user4/folder/file.erb' => '/home/user4/folder/file',
}.each do |src, dst|
# capture username for use as owner & group
usr = dst.split(?/)[2]
template dest do
source src
owner usr
group usr
mode '600'
end
end
Using String#split works by breaking the string into an Array of elements using / as a separator. Indexing into the array with [2] gives you the third element, which is the username, which you are apparently also using for the group.
The fact that it's the third element rather than the second isn't intuitive. However, when you use #split on your sample code, you get results like this:
'/home/user4/folder/file'.split ?/
#=> ["", "home", "user4", "folder", "file"]
Because of the way #split works, your inputs will yield an empty string as the first element of each destination array. Since Ruby arrays are zero-indexed, the element you want is the third one (e.g. [2]) in each of your sample values.
There are certainly other ways to do this, but this is a simple way to do what you want without making significant changes to your code. It often helps to remember that Chef (and Puppet!) are really just DSLs built on top of Ruby, so you can often use standard Ruby methods to get the job done.

Related

How do I remove a common substring using Ruby?

I have read How do I remove substring after a certain character in a string using Ruby?. This is close, but different.
I have these emails with a mask:
email1 = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'
email2 = 'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'
email3 = 'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'
I want to remove the substrings that are after .br, .com and .net. The return must be:
email1 = 'giovanna.macedo#lojas100.com.br'
email2 = 'alvaro-neves#stockshop.com'
email3 = 'filiallojas123#filiallojas.net'
You can do that with the method String#[] with an argument that is a regular expression.
r = /.*?\.(?:rb|com|net|br)(?!\.br)/
'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'[r]
#=> "giovanna.macedo#lojas100.com.br"
'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'[r]
#=> "alvaro-neves#stockshop.com"
'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'[r]
#=> "filiallojas123#filiallojas.net"
The regular expression reads as follows: "Match zero or more characters non-greedily (?), follow by a period, followed by 'rb' or 'com' or 'net' or 'br', which is not followed by .br. (?!\.br) is a negative lookahead.
Alternatively the regular expression can be written in free-spacing mode to make it self-documenting:
r = /
.*? # match zero or more characters non-greedily
\. # match '.'
(?: # begin a non-capture group
rb # match 'rb'
| # or
com # match 'com'
| # or
net # match 'net'
| # or
br # match 'br'
) # end non-capture group
(?! # begin a negative lookahead
\.br # match '.br'
) # end negative lookahead
/x # invoke free-spacing regex definition mode
This should work for your scenario:
expr = /^(.+\.(?:br|com|net))-[^']+(')$/
str = "email = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'"
str.gsub(expr, '\1\2')
Use the String#delete_suffix Method
This was tested with Ruby 3.0.2. Your mileage may vary with other versions that don't support String#delete_suffix or its related bang method. Since you're trying to remove the exact same suffix from all your emails, you can simply invoke #delete_suffix! on each of your strings. For example:
common_suffix = "-215000695716b.ct.domain.com.br".freeze
emails = [email1, email2, email3]
emails.each { _1.delete_suffix! common_suffix }
You can then validate your results with:
emails
#=> ["giovanna.macedo#lojas100.com.br", "alvaro-neves#stockshop.com", "filiallojas123#filiallojas.net"]
email1
#=> "giovanna.macedo#lojas100.com.br"
email2
#=> "alvaro-neves#stockshop.com"
email3
#=> "filiallojas123#filiallojas.net"
You can see that the array has replaced each value, or you can call each of the array's variables individually if you want to check that the strings have actually been modified in place.
String Methods are Usually Faster, But Your Mileage May Vary
Since you're dealing with String objects instead of regular expressions, this solution is likely to be faster at scale, although I didn't bother to benchmark all solutions to compare. If you care about performance, you can measure larger samples using IRB's new measure command, it took only 0.000062s to process the strings this way on my system, and String methods generally work faster than regular expressions at large scales. You'll need to do more extensive benchmarking if performance is a core concern, though.
Making the Call Shorter
You can even make the call shorter if you want. I left it a bit verbose above so you could see what the intent was at each step, but you can trim this to a single one-liner with the following block:
# one method chain, just wrapped to prevent scrolling
[email1, email2, email3].
map { _1.delete_suffix! "-215000695716b.ct.domain.com.br" }
Caveats
You Need Fixed-String Suffixes
The main caveat here is that this solution will only work when you know the suffix (or set of suffixes) you want to remove. If you can't rely on the suffixes to be fixed, then you'll likely need to pursue a regex solution in one way or another, even if it's just to collect a set of suffixes.
Dealing with Frozen Strings
Another caveat is that if you've created your code with frozen string literals, you'll need to adjust your code to avoid attempting in-place changes to frozen strings. There's more than one way to do this, but a simple destructuring assignment is probably the easiest to follow given your small code sample. Consider the following:
# assume that the strings in email1 etc. are frozen, but the array
# itself is not; you can't change the strings in-place, but you can
# re-assign new strings to the same variables or the same array
emails = [email1, email2, email3]
email1, email2, email3 =
emails.map { _1.delete_suffix "-215000695716b.ct.domain.com.br" }
There are certainly other ways to work around frozen strings, but the point is that while the now-common use of the # frozen_string_literal: true magic comment can improve VM performance or memory usage in large programs, it isn't always the best option for string-mangling code. Just keep that in mind, as tools like RuboCop love to enforce frozen strings, and not everyone stops to consider the consequences of such generic advice to the given problem domain.
I would just use the chomp(string) method like so:
mask = "-215000695716b.ct.domain.com.br"
email1.chomp(mask)
#=> "giovanna.macedo#lojas100.com.br"
email2.chomp(mask)
#=> "alvaro-neves#stockshop.com"
email3.chomp(mask)
#=> "filiallojas123#filiallojas.net"

Use ruby to remove a part of a string on each entry in an array where it exists

I have a list of file paths, for example
[
'Useful',
'../Some.Root.Directory/Path/Interesting',
'../Some.Root.Directory/Path/Also/Interesting'
]
(I mention that they're file paths in case there is something that makes this task easier because they're files but they can be considered simply a set of strings some of which may start with a particular string)
and I need to make this into a set of pairs so that I have the original list but also
[
'Useful',
'Interesting',
'Also/Interesting'
]
I expected I'd be able to do this
'../Some.Root.Directory/Path/Interesting'.gsub!('../Some.Root.Directory/Path/', '')
or
'../Some.Root.Directory/Path/Interesting'.gsub!('\.\.\/Some\.Root\.Directory\/Path\/', '')
but neither of those replaces the provided string/pattern with an empty string...
So in irb
puts '../Some.Root.Directory/Path/Interesting'.gsub('\.\.\/Some\.Root\.Directory\/Path\/', '')
outputs
../Some.Root.Directory/Path/Interesting
and the desired output is
Interesting
How can I do this?
NB the path will be passed in so really I have
file_path.gsub!(removal_path, '')
If you are positive that strings start with removal_path you can do:
string[removal_path.size..-1]
to get the remaining part.
If you want to get pairs of the original paths and the shortened ones, you can use sub in combination with map:
a = [
'../Some.Root.Directory/Path/Interesting',
'../Some.Root.Directory/Path/Also/Interesting'
]
b = a.map do |v|
[v, v.sub('../Some.Root.Directory/Path', '')]
end
puts b
This will return an Array of arrays - each sub-array contains the original path plus the shortened one. As noted by #sawa - you can simply use sub instead of gsub, since you want to replace only a single occurrence.

Capturing groups don't work as expected with Ruby scan method

I need to get an array of floats (both positive and negative) from the multiline string. E.g.: -45.124, 1124.325 etc
Here's what I do:
text.scan(/(\+|\-)?\d+(\.\d+)?/)
Although it works fine on regex101 (capturing group 0 matches everything I need), it doesn't work in Ruby code.
Any ideas why it's happening and how I can improve that?
See scan documentation:
If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
You should remove capturing groups (if they are redundant), or make them non-capturing (if you just need to group a sequence of patterns to be able to quantify them), or use extra code/group in case a capturing group cannot be avoided.
In this scenario, the capturing group is used to quantifiy a pattern sequence, thus all you need to do is convert the capturing group into a non-capturing one by replacing all unescaped ( with (?: (there is only one occurrence here):
text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)
See demo, output:
-45.124
1124.325
Well, if you need to also match floats like .04 you can use [+-]?\d*\.?\d+. See another demo
There are cases when you cannot get rid of a capturing group, e.g. when the regex contains a backreference to a capturing group. In that case, you may either a) declare a variable to store all matches and collect them all inside a scan block, or b) enclose the whole pattern with another capturing group and map the results to get the first item from each match, c) you may use a gsub with just a regex as a single argument to return an Enumerator, with .to_a to get the array of matches:
text = "11234566666678"
# Variant a:
results = []
text.scan(/(\d)\1+/) { results << Regexp.last_match(0) }
p results # => ["11", "666666"]
# Variant b:
p text.scan(/((\d)\2+)/).map(&:first) # => ["11", "666666"]
# Variant c:
p text.gsub(/(\d)\1+/).to_a # => ["11", "666666"]
See this Ruby demo.
([+-]?\d+\.\d+)
assumes there is a leading digit before the decimal point
see demo at Rubular
If you need capture groups for a complex pattern match, but want the entire expression returned by .scan, this can work for you.
Suppose you want to get the image urls in this string perhaps from a markdown text with html image tags:
str = %(
Before
<img src="https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-1842z4b73d71">
After
<img src="https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-a235b84bf150.png">).strip
You may have a regular expression defined to match just the urls, and maybe used a Rubular example like this to build/test your Regexp
image_regex =
/https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b/
Now you don't need each sub-capture group, but just the the entire expression in your your .scan, you can just wrap the whole pattern inside a capture group and use it like this:
image_regex =
/(https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b)/
str.scan(image_regex).map(&:first)
=> ["https://user-images.githubusercontent.com/1949900/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png",
"https://user-images.githubusercontent.com/1949900/75255473-02bca700-57b0-11ea-852a-58424698cfb0.png"]
How does this actually work?
Since you have 3 capture groups, .scan alone will return an Array of arrays with, one for each capture:
str.scan(image_regex)
=> [["https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png", "user-", "githubusercontent"],
["https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-0714c8f76f68", nil, "zenhubusercontent"]]
Since we only want the 1st (outter) capture group, we can just call .map(&:first)

Ruby - best way to extract regex capture groups?

I was reading a regex group matching question and I see that there are two ways to reference capture groups from a regex expression, namely,
Match string method e.g. string.match(/(^.*)(:)(.*)/i).captures
Perl-esque capture group variables such as $1, $2, etc obtained from if match =~ /(^.*)(:)(.*)/i
Update: As mentioned by 0xCAFEBABE there is a third option too - the last_match method
Which is better? With 1), for safety, you would have to use an if statement to guard against nils so why not just extract the information then? Instead of a second step calling the string captures method. So option 2) looks more convenient to me.
Since v2.4.6, Ruby has had named_captures, which can be used like this. Just add the ?<some_name> syntax inside a capture group.
/(\w)(\w)/.match("ab").captures # => ["a", "b"]
/(\w)(\w)/.match("ab").named_captures # => {}
/(?<some_name>\w)(\w)/.match("ab").captures # => ["a"]
/(?<some_name>\w)(\w)/.match("ab").named_captures # => {"some_name"=>"a"}
Even more relevant, you can reference a named capture by name!
result = /(?<some_name>\w)(\w)/.match("ab")
result["some_name"] # => "a"
For simple tasks, directly accessing the pseudo variables $1, etc. may be short and easier, but when things get complicated, accessing things via MatchData instances is (nearly) the only way to go.
For example, suppose you are doing nested gsub:
string1.gsub(regex1) do |string2|
string2.gsub(regex2) do
... # Impossible/difficult to refer to match data of outer loop
end
end
Within the inner loop, suppose you wanted to refer to a captured group of the outer gsub. Calling $1, $2, etc. would not give the right result because the last match data has changed by doing the inner gsub loop. This will be a source of bug.
It is necessary to refer to captured groups via match data:
string1.gsub(regex1) do |string2|
m1 = $~
string2.gsub(regex2) do
m2 = $~
... # match data of the outer loop can be accessed via `m1`.
# match data of the inner loop can be accessed via `m2`.
end
end
In short, if you want to do short hackish things for simple tasks, you can use the pseudo variables. If you want to keep your code more structured and expandable, you should access data through match data.

How should I parse a fixed length record file in Ruby?

I was wondering if anyone had any advice on parsing a file with fixed length records in Ruby. The file has several sections, each section has a header, n data elements and a footer. For example (This is total nonsense - but has roughly similar content)
1923 000-230SomeHeader 0303030
209231-231992395 MoreData
293894-329899834 SomeData
298342-323423409 OtherData
3 3423942Footer record 9832422
Headers, Footers and Data rows each begin with a specific number (1,2 & 3) in this example.
I have looked at http://rubyforge.org/projects/file-formatter/ and it looks good - except that the documentation is light and I can't see how to have n data elements.
Cheers,
Dan
There are a number of ways to do this. The unpack method of string could be used to define a pattern of fields as follows :-
"209231-231992395 MoreData".unpack('aa5A1A9a4Z*')
This returns an array as follows :-
["2", "09231", "-", "231992395", " ", "MoreData"]
See the documentation for a description of the pack/unpack format.
Several options exist as usual.
If you want to do it manually I would suggest something like this:
very pseudo-code:
Read file
while lines in file
handle_line(line)
end
def handle_line
type=first_char
parse_line(type)
end
def parse_line
split into elements and do_whatever_to_them
end
Splitting the line into elements of fixed with can be done with for instance unpack()
irb(main):001:0> line="1923 000-230SomeHeader 0303030"
=> "1923 000-230SomeHeader 0303030"
irb(main):002:0* list=line.unpack("A1A5A7a15A10")
=> ["1", "923", "000-230", "SomeHeader ", "0303030"]
irb(main):003:0>
The pattern used for unpack() will vary with field lengths on the different kinds of records and the code will depend on wether you want trailing spaces and such. See unpack reference for details.

Resources