Ruby select filename from array of filenames - ruby

I have an array of filenames in ruby.
I want to select the filename that contains a specific string in it
for example
array = ["/some/place/once.txt", "/some/place/two.txt","/some/place/three.txt"]
and i want to select only the filename that has the word "two" in it
so I want to get filename = array.select { |e| e.include? "two" }
but for some reason filename contains everything that is in array.
How to make it work?

Given this data:
array = ["/some/place/once.txt", "/some/place/two.txt","/some/place/three.txt"]
You can always find all matching entries with grep and just take the first:
array.grep(/two/).first
# => ["/some/place/two.txt"]
Or you can always scan using find:
array.find { |s| s.include?('two') }
# => "/some/place/two.txt"
Using select should produce an array result of all matches, but is otherwise identical. Your behaviour cannot be reproduced.

Related

Check if one of the values from both lists are present in the string in ruby

I have two arrays with logins and file extensions:
logins = ['bob', 'mark', 'joe']
extensions = ['.doc', '.xls']
I need to check if one of the values from both lists are present in the string (string is like str = "aaa bob test.txt test text"), and if yes do some work.
How to correctly perform this checking in Ruby.
Now I'm perform this with several loop and if statements.
[logins, extensions].all? do |list|
list.any? { |match| str.include? match }
end
You have two lists, logins and extensions. You want to make sure that all? of them do something and that 'something' is that the string includes any? of their elements.
The answer using regex is better performing, though, even if a little less simple to write.
You could also use Regexp.union :
str = 'aaa bob test.xls test text'
logins = Regexp.union(['bob', 'mark', 'joe'])
extensions = Regexp.union(['.doc', '.xls'])
str =~ logins && str =~ extensions
# => 12
It returns either nil if one of both didn't match or an integer if both matched.
As an alternative, with Ruby 2.4 :
str.match?(logins) && str.match?(extensions)
which would return a boolean.
You can simply add the array to get the union with unique vales and then iterate to check if the string has a matching value.
str = 'aaa bob test.txt test text'
(logins + extensions).any? { |word| str.include?(word) }
This returns 'true' or 'false'

Search array element in a file?

I have an array from a file.txt.
I want to loop through the list and check if each element exists in myfile.txt. If the element exists, go to the next element. If it does not exist I want to add it to the not-found array.
I tried using this code:
names = ["baba", "lily", "joe", "tsaki"]
names_not_found = []
for i in 0..names.length
while line = file.gets
puts if File.open('myfile.txt').lines.any?{|line| line.include?('names') << names_not_found}
end
end
puts names_not_found
end
I'm not to sure if I'm on the right track.
I am a bit confused by the other 2 answers, as I thought you wanted to find the elements of your names Array that are not found in myfile.txt. My answer will find those names. The other solutions find lines of myfile.txt that are not equal to any of your names elements. There certainly is some misunderstanding, so my apologies if this is not what you want.
You can read the whole file into a String once, and simply use .include? (which you already use) to see which names are mentioned in it. Note this simply checks for substrings, so if the file contains a "joey" it will "find" "joe" because it's part of it. So you might want to use regular expressions with word boundaries, but I suppose that's beyond the scope somewhat.
names = ["baba", "lily", "joe", "tsaki"]
contents = File.read('myfile.txt')
names_not_found = names.reject { |name| contents.include? name }
# => ["baba"]
# contents of myfile.txt:
# hello lily
# joe
# tsaki!!
# panda
I would do as below:
csv_class_names = File.readlines("css_file")
js_file_string = File.read('js_file')
names_not_found = csv_class_names.reject { |class| js_file_string.include?(class.chomp) }
puts names_not_found

Match string that doesn't contain a specific word

I'm working with ruby with the match method and I want to match an URL that doesn't contain a certain string with a regular Expression:
ex:
http://website1.com/url_with_some_words.html
http://website2.com/url_with_some_other_words.html
http://website3.com/url_with_the_word_dog.html
I want to match the URLs that doesn't contain the word dog, so the 1st and the 2nd ones should be matched
Just use a negative lookahead ^(?!.*dog).*$.
Explanation
^ : match begin of line
(?!.*dog) : negative lookahead, check if the word dog doesn't exist
.* : match everything (except newlines in this case)
$ : match end of line
Online demo
Just use
string !~ /dog/
to select strings you need.
There's actually an incredibly simple way to do this, using select.
array_of_urls.select { |url| !url.match(/dog/) }
this will return an array of the url's that don't contain the word 'dog' anywhere in it.
Another thing you can use is:
!url['dog']
With your example:
array = []
array << 'http://website1.com/url_with_some_words.html'
array << 'http://website2.com/url_with_some_other_words.html'
array << 'http://website3.com/url_with_the_word_dog.html'
array.select { |url| !url['dog'] }
You could also reject the urls that do contain 'dog':
array.reject { |url| url['dog'] }

Regex with named capture groups getting all matches in Ruby

I have a string:
s="123--abc,123--abc,123--abc"
I tried using Ruby 1.9's new feature "named groups" to fetch all named group info:
/(?<number>\d*)--(?<chars>\s*)/
Is there an API like Python's findall which returns a matchdata collection? In this case I need to return two matches, because 123 and abc repeat twice. Each match data contains of detail of each named capture info so I can use m['number'] to get the match value.
Named captures are suitable only for one matching result.
Ruby's analogue of findall is String#scan. You can either use scan result as an array, or pass a block to it:
irb> s = "123--abc,123--abc,123--abc"
=> "123--abc,123--abc,123--abc"
irb> s.scan(/(\d*)--([a-z]*)/)
=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
irb* p [number,chars]
irb> end
["123", "abc"]
["123", "abc"]
["123", "abc"]
=> "123--abc,123--abc,123--abc"
Chiming in super-late, but here's a simple way of replicating String#scan but getting the matchdata instead:
matches = []
foo.scan(regex){ matches << $~ }
matches now contains the MatchData objects that correspond to scanning the string.
You can extract the used variables from the regexp using names method. So what I did is, I used regular scan method to get the matches, then zipped names and every match to create a Hash.
class String
def scan2(regexp)
names = regexp.names
scan(regexp).collect do |match|
Hash[names.zip(match)]
end
end
end
Usage:
>> "aaa http://www.google.com.tr aaa https://www.yahoo.com.tr ddd".scan2 /(?<url>(?<protocol>https?):\/\/[\S]+)/
=> [{"url"=>"http://www.google.com.tr", "protocol"=>"http"}, {"url"=>"https://www.yahoo.com.tr", "protocol"=>"https"}]
#Nakilon is correct showing scan with a regex, however you don't even need to venture into regex land if you don't want to:
s = "123--abc,123--abc,123--abc"
s.split(',')
#=> ["123--abc", "123--abc", "123--abc"]
s.split(',').inject([]) { |a,s| a << s.split('--'); a }
#=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
This returns an array of arrays, which is convenient if you have multiple occurrences and need to see/process them all.
s.split(',').inject({}) { |h,s| n,v = s.split('--'); h[n] = v; h }
#=> {"123"=>"abc"}
This returns a hash, which, because the elements have the same key, has only the unique key value. This is good when you have a bunch of duplicate keys but want the unique ones. Its downside occurs if you need the unique values associated with the keys, but that appears to be a different question.
If using ruby >=1.9 and the named captures, you could:
class String
def scan2(regexp2_str, placeholders = {})
return regexp2_str.to_re(placeholders).match(self)
end
def to_re(placeholders = {})
re2 = self.dup
separator = placeholders.delete(:SEPARATOR) || '' #Returns and removes separator if :SEPARATOR is set.
#Search for the pattern placeholders and replace them with the regex
placeholders.each do |placeholder, regex|
re2.sub!(separator + placeholder.to_s + separator, "(?<#{placeholder}>#{regex})")
end
return Regexp.new(re2, Regexp::MULTILINE) #Returns regex using named captures.
end
end
Usage (ruby >=1.9):
> "1234:Kalle".scan2("num4:name", num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
or
> re="num4:name".to_re(num4:'\d{4}', name:'\w+')
=> /(?<num4>\d{4}):(?<name>\w+)/m
> m=re.match("1234:Kalle")
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
> m[:num4]
=> "1234"
> m[:name]
=> "Kalle"
Using the separator option:
> "1234:Kalle".scan2("#num4#:#name#", SEPARATOR:'#', num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
I needed something similar recently. This should work like String#scan, but return an array of MatchData objects instead.
class String
# This method will return an array of MatchData's rather than the
# array of strings returned by the vanilla `scan`.
def match_all(regex)
match_str = self
match_datas = []
while match_str.length > 0 do
md = match_str.match(regex)
break unless md
match_datas << md
match_str = md.post_match
end
return match_datas
end
end
Running your sample data in the REPL results in the following:
> "123--abc,123--abc,123--abc".match_all(/(?<number>\d*)--(?<chars>[a-z]*)/)
=> [#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">]
You may also find my test code useful:
describe String do
describe :match_all do
it "it works like scan, but uses MatchData objects instead of arrays and strings" do
mds = "ABC-123, DEF-456, GHI-098".match_all(/(?<word>[A-Z]+)-(?<number>[0-9]+)/)
mds[0][:word].should == "ABC"
mds[0][:number].should == "123"
mds[1][:word].should == "DEF"
mds[1][:number].should == "456"
mds[2][:word].should == "GHI"
mds[2][:number].should == "098"
end
end
end
I really liked #Umut-Utkan's solution, but it didn't quite do what I wanted so I rewrote it a bit (note, the below might not be beautiful code, but it seems to work)
class String
def scan2(regexp)
names = regexp.names
captures = Hash.new
scan(regexp).collect do |match|
nzip = names.zip(match)
nzip.each do |m|
captgrp = m[0].to_sym
captures.add(captgrp, m[1])
end
end
return captures
end
end
Now, if you do
p '12f3g4g5h5h6j7j7j'.scan2(/(?<alpha>[a-zA-Z])(?<digit>[0-9])/)
You get
{:alpha=>["f", "g", "g", "h", "h", "j", "j"], :digit=>["3", "4", "5", "5", "6", "7", "7"]}
(ie. all the alpha characters found in one array, and all the digits found in another array). Depending on your purpose for scanning, this might be useful. Anyway, I love seeing examples of how easy it is to rewrite or extend core Ruby functionality with just a few lines!
A year ago I wanted regular expressions that were more easy to read and named the captures, so I made the following addition to String (should maybe not be there, but it was convenient at the time):
scan2.rb:
class String
#Works as scan but stores the result in a hash indexed by variable/constant names (regexp PLACEHOLDERS) within parantheses.
#Example: Given the (constant) strings BTF, RCVR and SNDR and the regexp /#BTF# (#RCVR#) (#SNDR#)/
#the matches will be returned in a hash like: match[:RCVR] = <the match> and match[:SNDR] = <the match>
#Note: The #STRING_VARIABLE_OR_CONST# syntax has to be used. All occurences of #STRING# will work as #{STRING}
#but is needed for the method to see the names to be used as indices.
def scan2(regexp2_str, mark='#')
regexp = regexp2_str.to_re(mark) #Evaluates the strings. Note: Must be reachable from here!
hash_indices_array = regexp2_str.scan(/\(#{mark}(.*?)#{mark}\)/).flatten #Look for string variable names within (#VAR#) or # replaced by <mark>
match_array = self.scan(regexp)
#Save matches in hash indexed by string variable names:
match_hash = Hash.new
match_array.flatten.each_with_index do |m, i|
match_hash[hash_indices_array[i].to_sym] = m
end
return match_hash
end
def to_re(mark='#')
re = /#{mark}(.*?)#{mark}/
return Regexp.new(self.gsub(re){eval $1}, Regexp::MULTILINE) #Evaluates the strings, creates RE. Note: Variables must be reachable from here!
end
end
Example usage (irb1.9):
> load 'scan2.rb'
> AREA = '\d+'
> PHONE = '\d+'
> NAME = '\w+'
> "1234-567890 Glenn".scan2('(#AREA#)-(#PHONE#) (#NAME#)')
=> {:AREA=>"1234", :PHONE=>"567890", :NAME=>"Glenn"}
Notes:
Of course it would have been more elegant to put the patterns (e.g. AREA, PHONE...) in a hash and add this hash with patterns to the arguments of scan2.
Piggybacking off of Mark Hubbart's answer, I added the following monkey-patch:
class ::Regexp
def match_all(str)
matches = []
str.scan(self) { matches << $~ }
matches
end
end
which can be used as /(?<letter>\w)/.match_all('word'), and returns:
[#<MatchData "w" letter:"w">, #<MatchData "o" letter:"o">, #<MatchData "r" letter:"r">, #<MatchData "d" letter:"d">]
This relies on, as others have said, the use of $~ in the scan block for the match data.
I like the match_all given by John, but I think it has an error.
The line:
match_datas << md
works if there are no captures () in the regex.
This code gives the whole line up to and including the pattern matched/captured by the regex. (The [0] part of MatchData) If the regex has capture (), then this result is probably not what the user (me) wants in the eventual output.
I think in the case where there are captures () in regex, the correct code should be:
match_datas << md[1]
The eventual output of match_datas will be an array of pattern capture matches starting from match_datas[0]. This is not quite what may be expected if a normal MatchData is wanted which includes a match_datas[0] value which is the whole matched substring followed by match_datas[1], match_datas[[2],.. which are the captures (if any) in the regex pattern.
Things are complex - which may be why match_all was not included in native MatchData.

Get id from string with Ruby

I have strings like this:
"/detail/205193-foo-var-bar-foo.html"
"/detail/183863-parse-foo.html"
"/detail/1003-bar-foo-bar.html"
How to get ids (205193, 183863, 1003) from it with Ruby?
Just say s[/\d+/]
[
"/detail/205193-foo-var-bar-foo.html",
"/detail/183863-parse-foo.html",
"/detail/1003-bar-foo-bar.html"
].each { |s| puts s[/\d+/] }
could also do something like this
"/detail/205193-foo-var-bar-foo.html".gsub(/\/detail\//,'').to_i
=> 205193
regex = /\/detail\/(\d+)-/
s = "/detail/205193-foo-var-bar-foo.html"
id = regex.match s # => <MatchData "/detail/205193-" 1:"205193">
id[1] # => "205193"
$1 # => "205193"
The MatchData object will store the entire matched portion of the string in the first element, and any matched subgroups starting from the second element (depending on how many matched subgroups there are)
Also, Ruby provides a shortcut to the most recent matched subgroup with $1 .
One easy way to do it would be to strip out the /detail/ part of your string, and then just call to_i on what's left over:
"/detail/1003-bar-foo-bar.html".gsub('/detail/','').to_i # => 1003
s = "/detail/205193-foo-var-bar-foo.html"
num = (s =~ /detail\/(\d+)-/) ? Integer($1) : nil

Resources