Regex without order [closed] - ruby

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Suppose I have a list of characters [a,b,c] and I want to write a regular expression such that
any string is accepted if it has all the elements in the character list at-least once and the characters can appear in any order in the string.
Example of accepted strings
abc, aabbbc, bbaac, cab
Example of strings not accepteed
aaabb, bab, caa, aacd, deeff

Sets are much more suited for this purpose than regular expressions. What you're really trying to do is find out if (a, b, c) is a valid subset of your various strings. Here's an example of how to do that in Ruby:
> require "set"
=> true
> reference = Set.new("abc".split(""))
=> #<Set: {"a", "b", "c"}>
> test1 = Set.new("aabbbc".split(""))
=> #<Set: {"a", "b", "c"}>
> test2 = Set.new("caa".split(""))
=> #<Set: {"c", "a"}>
> reference.subset? test1
=> true
> reference.subset? test2
=> false

Consider this before reading on: regexes are not always the best way to solve a problem. If you are considering a regex but it's not obvious or easy to proceed, you may want to stop and consider if there is an easy non-regex solution handy.
I don't know what your specific situation is or why you think you need regex, so I'll assume you already know the above and answer your question as-is.
Based on the documentation, I beleive that Ruby supports positive lookaheads (also known as zero-width assertions). Being primarily a .NET programmer, I don't know Ruby well enough to say whether or not it supports non-fixed-length lookaheads (it's not found in all regex flavors), but if it does then you can easily apply three different lookaheads at the beginning of your expression to find each of the patterns or characters you need:
^(?=.*a)(?=.*b)(?=.*c).*
This will fail if any one of the lookaheads does not pass. This approach is potentially extremely powerful because you can have complex sub expressions in your lookahead. For example:
^(?=.*a[bc]{2})(?=.*-\d)(?=.*#.{3}%).*
will test that the input contains an a follwed by two characters which are each either a b or a c, a - followed by any digit and a # followed by any three characters followed by a %, in any particular order. So the following strings would pass:
#acb%-9
#-22%abb
This kind of complex pattern matching is difficult to succinctly duplicate.
To address this comment:
No there cannot be... so abcd is not accepted
You can use a negative lookahead to ensure that characters other than the desired characters are not present in the input:
^(?=.*a)(?=.*b)(?=.*c)(?!.*[^abc]).*
(As noted by Gene, the .* at the end is not necessary... I probably should have mentioned that. It's just there in case you actually want to select the text)

def acceptable? s
s =~ /(?=.*a)(?=.*b)(?=.*c)/
end
acceptable? 'abc' # => 0
acceptable? 'aabbbc' # => 0
acceptable? 'bbaac' # => 0
acceptable? 'cab' # => 0
acceptable? 'aaabb' # => nil
acceptable? 'bab' # => nil
acceptable? 'caa' # => nil
acceptable? 'aacd' # => nil
acceptable? 'deeff' # => nil
acceptable? 'abcd' # => 0

A regex that matches only the defined characters could be this:
(?=[bc]*a)(?=[ac]*b)(?=[ab]*c)[abc]*

Related

Ruby - replace a letter for another Explanation for my code [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 months ago.
Improve this question
I got to solve the problem I was trying to solve, but the thing is that I am not sure why it worked, I just started adding methods.
So if anyone could explain why worked:
def replace(string1, letter_a, letter_b)
replacements = {letter_a => letter_b}
#this is the part I am not sure why is working:
initial_string.split('').map{|i| replacements[i] || i}.join
end
Firstly I recommend to use built-in methods String#gsub or String#tr
string.gsub(%r{#{replaceable_letter}}, replacing_letter)
"abcdef".gsub(/a/, "b") # => "bbcdef"
string.tr(replaceable_letter, replacing_letter)
"abcdef".tr("a", "b") # => "bbcdef"
Instead of initial_string.split('').map you can use initial_string.each_char.map
Explanation of your code:
replacements = {letter_a => letter_b}
is hash where replaceable letter is key and replacing letter is value
For example { "a" => "b" }
Than you split your string to chars array
After that map over this array
For every char you check the hash, for example:
replacements["a"] # => "b"
replacements["c"] # => nil
If hash has such key, you take replacing letter, if not take origin letter. Compare and read about || operator:
nil || "f" # => "f"
"b" || "a" # => "b"
And finally join new array

How do I split a string in ruby? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
In my test step, I need to split letters.
find(:xpath,"//*[]").text
gives a string like "No#xyz1", where No# is a static part.
I need xyz1. How do I get that part?
String#[] with positive lookbehind comes to the rescue:
"No#xyz1"[/(?<=No#).*/]
#⇒ "xyz1"
So in your matcher you can use:
find(:xpath,"//*[]").text[/(?<=No#).*/] == "xyz1"
There are many ways to achieve that. You can, for example, use regular expression and scan method:
[1] pry(main)> "No#xyz1".scan(/No#(.+)/).first.first
=> "xyz1"
or a "dummy" split on string too:
[4] pry(main)> "No#xyz1".split("No#")
=> ["", "xyz1"]
[5] pry(main)> "No#xyz1".split("No#").last
=> "xyz1"
I would recommend the first one, though.
String#partition works fine here.
Searches sep or pattern (regexp) in the string and returns the part
before it, the match, and the part after it. If it is not found,
returns two empty strings and str.
"No#xyz1".partition("No#")
# => ["", "No#", "xyz1"]
"No#xyz1".partition("NotHere")
# => ["No#xyz1", "", ""]
So you can use :
"No#xyz1".partition("No#").last
# => "xyz1"

Ruby transliteration using hash [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I try to make Cyrillic => Latin transliteration using hash, I use # encoding: utf-8 and ruby 1.9.3. I want this code to change the value of file_name. Why does this code leave file_name unchanged?
abc = Hash.new
abc = {"a" => "a", "b" => "б", "v" => "в", 'g' => "г", 'd'=> "д", 'jo' => "ё", 'zh' => "ж", 'th' => "з", 'i' => "и", 'l' => "л", 'm' => "м", 'n' => "н",'p' => "п", 'r' => "р", 's' => "с", 't' => "т", 'u' => "у", 'f' => "ф", 'h' => "х", 'c' => "ц", 'ch' => "ч", 'sh' => "ш", 'sch' => "щ", 'y' => "ы",'u' => "ю", 'ja' => "я"}
file_name.each_char do |c|
abc.each {|key, value| if c == value then c = key end }
end
The problem with .each_char is that the block variable - c in your question - does not point back to the character in the string allowing to alter the string in situ. There are ways you could make that per-character mapping work from there (using a .map followed by a .join for instance) - but they are inefficient compared to .tr! or .gsub! for your purpose, because breaking the string out into an array of characters and reconstructing it involves creating many Ruby objects.
I think you need to do something like
file_name.tr!( 'aбвгдилмнпрстуфхцыю', 'abvgdilmnprstufhcyu' )
which covers the single letter conversions very efficiently. You then have some multi-letter conversions. I would use gsub! for that, and an inverted copy of your hash
latin_of = {"ё"=>"jo", "ж"=>"zh", "з"=>"th", "ч"=>"ch",
"ш"=>"sh", "щ"=>"sch", "я"=>"ja"}
file_name.gsub!( /[ёжзчшщя]/ ) { |cyrillic| latin_of[ cyrillic ] }
Note, unlike each_char, the return value of the block in .gsub! is used to replace whatever you matched in the original string. The above code uses an inversion of your original hash to quickly find the correct Latin replacement for the matched Cyrillic character.
You don't need tr! . . . instead, if you prefer, just use an inversion of your original hash in one pass using this second syntax. The cost of using two methods probably means you don't really gain that much from using .tr!. But you should know about String#tr! method, it can be very handy.
Edit: As suggested in comments, .gsub! can do a lot more for you here. Assuming latin_of was the complete hash with Cyrillic keys and the Latin values, you could do this:
file_name.gsub!( Regexp.union(latin_of.keys), latin_of )
Two things to note:
Regexp.union(latin_of.keys) is taking an array of the keys you want to convert and ensuring gsub will find them ready for replacement in the String
gsub! accepts a hash as the second parameter, and converts each match by looking it up as a key and replacing it with the associated value - exactly the behaviour you are looking for.

How do I write a regular expression that will match characters in any order?

I'm trying to write a regular expressions that will match a set of characters without regard to order. For example:
str = "act"
str.scan(/Insert expression here/)
would match:
cat
act
tca
atc
tac
cta
but would not match ca, ac or cata.
I read through a lot of similar questions and answers here on StackOverflow, but have not found one that matches my objectives exactly.
To clarify a bit, I'm using ruby and do not want to allow repeat characters.
Here is your solution
^(?:([act])(?!.*\1)){3}$
See it here on Regexr
^ # matches the start of the string
(?: # open a non capturing group
([act]) # The characters that are allowed and a capturing group
(?!.*\1) # That character is matched only if it does not occur once more, Lookahead assertion
){3} # Defines the amount of characters
$
The only special think is the lookahead assertion, to ensure the character is not repeated.
^ and $ are anchors to match the start and the end of the string.
[act]{3} or ^[act]{3}$ will do it in most regular expression dialects. If you can narrow down the system you're using, that will help you get a more specific answer.
Edit: as mentioned by #georgydyer in the comments below, it's unclear from your question whether or not repeated characters are allowed. If not, you can adapt the answer from this question and get:
^(?=[act]{3}$)(?!.*(.).*\1).*$
That is, a positive lookahead to check a match, and then a negative lookahead with a backreference to exclude repeated characters.
Here's how I'd go about it:
regex = /\b(?:#{ Regexp.union(str.split('').permutation.map{ |a| a.join }).source })\b/
# => /(?:act|atc|cat|cta|tac|tca)/
%w[
cat act tca atc tac cta
ca ac cata
].each do |w|
puts '"%s" %s' % [w, w[regex] ? 'matches' : "doesn't match"]
end
That outputs:
"cat" matches
"act" matches
"tca" matches
"atc" matches
"tac" matches
"cta" matches
"ca" doesn't match
"ac" doesn't match
"cata" doesn't match
I use the technique of passing an array into Regexp.union for a lot of things; I works especially well with the keys of a hash, and passing the hash into gsub for rapid search/replace on text templates. This is the example from the gsub documentation:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
Regexp.union creates a regex, and it's important to use source instead of to_s when extracting the actual pattern being generated:
puts regex.to_s
=> (?-mix:\b(?:act|atc|cat|cta|tac|tca)\b)
puts regex.source
=> \b(?:act|atc|cat|cta|tac|tca)\b
Notice how to_s embeds the pattern's flags inside the string. If you don't expect them you can accidentally embed that pattern into another, which won't behave as you expect. Been there, done that and have the dented helmet as proof.
If you really want to have fun, look into the Perl Regexp::Assemble module available on CPAN. Using that, plus List::Permutor, lets us generate more complex patterns. On a simple string like this it won't save much space, but on long strings or large arrays of desired hits it can make a huge difference. Unfortunately, Ruby has nothing like this, but it is possible to write a simple Perl script with the word or array of words, and have it generate the regex and pass it back:
use List::Permutor;
use Regexp::Assemble;
my $regex_assembler = Regexp::Assemble->new;
my $perm = new List::Permutor split('', 'act');
while (my #set = $perm->next) {
$regex_assembler->add(join('', #set));
}
print $regex_assembler->re, "\n";
(?-xism:(?:a(?:ct|tc)|c(?:at|ta)|t(?:ac|ca)))
See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for more information about using Regexp::Assemble with Ruby.
I will assume several things here:
- You are looking for permutations of given characters
- You are using ruby
str = "act"
permutations = str.split(//).permutation.map{|p| p.join("")}
# and for the actual test
permutations.include?("cat")
It is no regex though.
No doubt - the regex that uses positive/negative lookaheads and backreferences is slick, but if you're only dealing with three characters, I'd err on the side of verbosity by explicitly enumerating the character permutations like #scones suggested.
"act".split('').permutation.map(&:join)
=> ["act", "atc", "cat", "cta", "tac", "tca"]
And if you really need a regex out of it for scanning a larger string, you can always:
Regexp.union "act".split('').permutation.map(&:join)
=> /\b(act|atc|cat|cta|tac|tca)\b/
Obviously, this strategy doesn't scale if your search string grows, but it's much easier to observe the intent of code like this in my opinion.
EDIT: Added word boundaries for false positive on cata based on #theTinMan's feedback.

Ruby split on numbers vs letters

I'd like to split the following string on letters:
1234B
There are always only ever 4 digits and one letter. I just want to split those out.
Here is my attempt, I think I have the method right and the regex matches the number but I dont think my syntax or my regex is pertinent to the problem Im attempting to solve.
"1234A".split(/^\d{4}/)
What you want is not clear, but a general solution to this kind of situation is:
"1234A".scan(/\d+|\D+/)
# => ["1234", "A"]
If there are always 4 digits and 1 letter, there's no need to use regular expressions to split the string. Just do this:
str = "1234A"
digits,letter = str[0..3],str[4]
Looking at it purely from the perspective of splitting any string into groups of 4:
"1234A".scan(/.{1,4}/)
# => ["1234", "A"]
Another no-regex version:
str = "1234A"
str.chars.to_a.last # => "A"
str.chop # => "1234"

Resources