how to use Regular Expression match tld in url? - tld

how to use Regular Expression match tld in url?
Need to match the tld, including almost all countries, organizations. Can do without the regular expression, but requires an efficient match

Do you need to use a regex? Often using a regular expression is overkill. A few lines of code will be faster, and more maintainable than a big regular expression.
If your language has a split method just use that on a "." and the tld will be the last item in the array. If you're stuck in C++ or something just search backward from the end of the string to the first ., then the rest of the string from that point is the tld.
arr = url.split(".")
tld = arr[length - 1]
or
int period = url.find_from_last('.');
tld = url.substring(period, npos);
(I forgot the exact syntax for C++ std::string, but something similar to above)

Related

What does /anystring/ mean in ruby?

I came across this: /sera/ === coursera. What does /sera/ mean? Please tell me. I do not understand the meaning of the expression above.
It's a regular expression. The more formal version of same is this:
coursera.match(/sera/)
Or:
/sera/.match(coursera)
These are both functionally similar. Either a string matches a regular expression, or a regular expression can be tested for matches against a string.
The long explanation of your original code is: Are the characters sera can be found in the variable coursera?
If you do this:
"coursera".match(/sera/)
# => #<MatchData "sera">
You get a MatchData result which means it matched. For more complicated expressions you can capture parts of the string using arbitrary patterns and so on. The general rule here is regular expressions in Ruby look like /.../ or vaguely like %r[...] in form.
You may also see the =~ operator used which is something Ruby inherited from Perl. It also means match.

Precedence of Ruby regular expressions?

I am reviewing regular expressions and cannot understand why a regular expression won't match a given string, specifically:
regex = /(ab*)+(bc)?/
mystring = "abbc"
The match matches "abb" but leaves the c off. I tested this using Rubular and in IRB and don't understand why the regex doesn't match the entire string. I thought that (ab*)+ would match "ab" and then (bc)? would match "bc".
Am I missing something in terms of precedence for regular expression operations?
Regular expressions try to match the first part of the regular expression as much as possible by default, and they do not backtrack to try to make larger sections match if they don't have to. Since you make (bc) optional, the (ab*) can match as much as it wants (the non-zero repetition after it doesn't have much to do) and doesn't try backtracking to try other matching alternatives.
If you want the whole string to be matched (which will force some backtracking in this case) make sure you anchor both ends of the string:
regex = /^(ab*)+(bc)?$/
The regex with parenthesis assumes you have two matches in your string.
The first one is abb because (ab*) means a and zero or more b. You have two b, so the match is abb. Then you have only c in your string, so it doesn't match the second condition which is bc.

Tokenize (lex? parse?) a regular expression

Using Ruby I'd like to take a Regexp object (or a String representing a valid regex; your choice) and tokenize it so that I may manipulate certain parts.
Specifically, I'd like to take a regex/string like this:
regex = /var (\w+) = '([^']+)';/
parts = ["foo","bar"]
and create a replacement string that replaces each capture with a literal from the array:
"var foo = 'bar';"
A naïve regex-based approach to parsing the regex, such as:
i = -1
result = regex.source.gsub(/\([^)]+\)/){ parts[i+=1] }
…would fail for things like nested capture groups, or non-capturing groups, or a regex that had a parenthesis inside a character class. Hence my desire to properly break the regex into semantically-valid pieces.
Is there an existing Regex parser available for Ruby? Is there a (horror of horrors) known regex that cleanly matches regexes? Is there a gem I've not found?
The motivation for this question is a desire to find a clean and simple answer to this question.
I have a JavaScript project on GitHub called: Dynamic (?:Regex Highlighting)++ with Javascript! you may want to look at. It parses PCRE compatible regular expressions written in both free-spacing and non-free-spacing modes. Since the regexes are written in the less-feature-rich JavaScript syntax, these regexes could be easily converted to Ruby.
Note that regular expressions may contain arbitrarily nested parentheses structures and JavaScript has no recursive regex features, so the code must parse the tree of nested parens from the-inside-out. Its a bit tricky but works quite well. Be sure to try it out on the highlighter demo page, where you can input and dynamically highlight any regex. The JavaScript regular expressions used to parse regular expressions are documented here.

How to match anything EXCEPT this string?

How can I match a string that is NOT partners?
Here is what I have that matches partners:
/^partners$/i
I've tried the following to NOT match partners but doesn't seem to work:
/^(?!partners)$/i
Your regex
/^(?!partners)$/i
only matches empty lines because you didn't include the end-of-line anchor in your lookahead assertion. Lookaheads do just that - they "look ahead" without actually matching any characters, so only lines that match the regex ^$ will succeed.
This would work:
/^(?!partners$)/i
This reports a match with any string (or, since we're in Ruby here, any line in a multi-line string) that's different from partners. Note that it only matches the empty string at the start of the line. Which is enough for validation purposes, but the match result will be "" (instead of nil which you'd get if the match failed entirely).
not easily but with the look ahead operator it can.
Here the ruby regex
^((?!partners).)*$
Cheers
If you only want to get a true value when string is not partners then there is no need to use regex and you can just use a string comparison (which ignores case).
If you for some reason need a positive regex match for any string which does not contain partners (if it's a part of a larger regex for example) you could use several different constructs, like:
`^(?:(?!partners).)*$`
or
^(?:[^p]+|p(?!artners))*$
For example, in Java:
!"partners".equalsIgnoreCase(aString)

Ruby String: how to match a Regexp from a defined position

I want to match a regexp from a ruby string only from a defined position. Matches before that position do not interest me. Moreover, I'd like \A to match this position.
I found this solution:
code[index..-1][/\A[a-z_][a-zA-Z0-9_]*/]
This match the regexp at position index in the string code. If the match is not exactly at position index, it return nil.
Is there a more elegant way to do this (I want to avoid to create the temporary string with the first slice)?
Thanks
You could use ^.{#{index}} inside the regular expression. Don't know if that's what you want, because I don't understand your question completely. Can you maybe add an example with the tested String? And have you heard of Rubular? Great way to test your regular expressions.
This is how you could do it if I understand your question correctly:
code.match(/^.{#{index}}your_regex_here/)
The index variable will be put inside your regular expression. When index = 4, it will check if there's 4 characters from the beginning. Then it will check your own regular expression and only return true if yours is valid as well. I hope it helps. Good luck.
EDIT
And if you want to get the matched value for your regular expression:
code.scan(/^.{#{index}}([a-z_][a-zA-Z0-9_]*)/).join
It puts the matched result (inside the brackets) in an Array and joins it into a String.

Resources