I'm currently building a code generator, which aims to generate boiler plate for me once I write the templates and/or translations, in whatever language I have to work with.
I have a problem with a regex in Ruby. The regex aims to select whatever is between {{{ and }}}, so I can generate functions according to my needs.
My regex is currently :
/\{\{\{(([a-zA-Z]|\s)+)\}\}\}/m
My test data set is:
{{{Demande aaa}}} => {{{tagadatsouintsouin tutu}}}
The results are:
[["Demande aaa", "a"], ["tagadatsouintsouin tutu", "u"]]
Each time the regex picks the last character twice. That's not exactly what I want, I need something more like this:
/\{\{\{((\w|\W)+)\}\}\}/m
But this has a flaw too, the results are:
[["Demande aaa}}} => {{{tagadatsouintsouin tutu", "u"]]
Whereas, I wish to get:
[["Demande aaa"],["tagadatsouintsouin tutu"]]
How do I correct these regexes? I could use two sets of delimiters, but it won't teach me anything.
Edit :
All your regex run against my data sample, so you all got a point.
Regex may be overkill, and probably are overkill for my purpose. So i have two questions.
First, do the regex keep the same exact indentation ? This should be able to handle whole functions.
Second, is there something fitter for that task ?
Detailled explanation of the purpose of this tool. I'm bored to write boiler plate code in php - symfony. So i wish to generate this according to templates.
My intent is to build some views, some controllers, and even parts of model this way.
Pratical example : In my model, i wish to generate some functions according to the type of an object's attribute. For examples, i have functions displaying correctly money. So i need to build the corect function, according to my attribute, and then put in , inside m output file.
So there is some translations which themselves need translations.
So a fictive example :
{{{euro}}} => {{{ function getMyAttributeEuro()
{
return formating($this->get[[MyAttribute]]);
} }}}
In order to stock my translations, should i use regex, like
I wish to build something a bit clever, so it can build most of the basic code with no bug. So i can work on interesting code.
You have one set of capturing parentheses too many.
/\{\{\{([a-zA-Z\s]+)\}\}\}/
Also, you don't need the /m modifier because there is no dot (.) in your regex whose behaviour would be affected by it.
I'm partial to:
data = '{{{Demande aaa}}} => {{{tagadatsouintsouin tutu}}}'
data.scan(/\{{3}(.+?)}{3}/).flatten.map{ |r| r.squeeze(' ') }
=> ["Demande aaa", "tagadatsouintsouin tutu"]
or:
data.scan(/\{{3}(.+?)}{3}/).flatten.map{ |r| [ r.squeeze(' ') ] }
=> [["Demande aaa"], ["tagadatsouintsouin tutu"]]
or:
data.scan(/\{{3}(.+?)}{3}/).map{ |r| [ r[0].squeeze(' ') ] }
=> [["Demande aaa"], ["tagadatsouintsouin tutu"]]
if you need the sub-arrays.
I'm not big on trying to everything possible inside the regex. I prefer to keep it short and sweet, then polish the output once I've found what I was looking for. It's a maintenance issue, because regex make my head hurt, and I stopped thinking of them as a macho thing years ago. Regex are a very useful tool, but too often they are seen as the answer to every problem, which they're not.
Some people, when confronted with a problem, think “I know,
I'll use regular expressions.” Now they have two problems.
-- Jamie Zawinski
You want non capturing groups (?:...), but here is another way.
/\{\{\{(.*?)\}\}\}/m
Just a shot
/\{\{\{([\w\W]+?)\}\}\}/
Added non-greedyness to your regex
Here this seems to work
Related
I would like to move name property to the top of the block like this.
There are many resouce block in a file.
Before
resource "datadog_monitor" "A" {
enable_logs_sample = "true"
name = "name"
tags = ["env:dev"]
}
resource "datadog_monitor" "B" {
enable_logs_sample = "true"
name = "name"
tags = ["env:dev"]
}
After
resource "datadog_monitor" "A" {
name = "name"
enable_logs_sample = "true"
tags = ["env:dev"]
}
resource "datadog_monitor" "B" {
name = "name"
enable_logs_sample = "true"
tags = ["env:dev"]
}
OK, I think :help :global and the range/address mechanism is one of Vim's best and most underrated feature so it might deserve a detailed run down.
The core editing pattern is the same as in the commands I suggested in my previous answer to a similar question of yours:
on each line matching a specific regular expression pattern,
do something.
Note that it is a "pattern", not a one-off trick. You are not supposed to "learn" this answer by heart, or commit it to "muscle memory", or bookmark it for the next time you have the exact same problem. Instead, you are supposed to grok the logic behind it in a way that allows you to:
recognize a situation where it might come handy,
apply it without too much thinking.
So, in order to implement the editing pattern described above, we use the :global command, which works like this:
:[range]global/<pattern>/[range]<command>
where:
in range (optional, default is %),
mark each line matching <pattern>,
then execute <command> on range (optional, default is .).
Like lots of things in Vim, :global is conceptually cool but not necessarily useful on its own. The more familiar you are with ranges/addresses and the more Ex commands you know, the more useful it is.
In this specific case, ranges don't matter much but addresses and Ex commands do… and their sum makes problems like these solvable in, IMO, a pretty intuitive way.
Now, let's go back to our problem:
move every "name" line to the top of the block
and express it in terms that match our editing pattern:
mark every line matching name,
then move it below the closest line above matching resource.
Which is a simple:
:g/name/m?resource?
Of course, the exact regular expression patterns to use are context-dependent.
The trick is to internalize the patterns so that you already know how to use any new building block you might come upon.
There is really nothing even remotely god-like, here.
(It's been a while since I've been here.)
I've been using the first version of PHRets v1 for years, and understood it well enough to get by, but now I'm trying to understand the advantages of v2.6.2. I've got it all installed and the basics are working fine. My issues are pretty much with comprehending fine points of query syntax that goes into the rets=>Search() statement. (I'm much more familiar with SQL statements). Specifically, I'd like to have a query return a list of properties, EXCLUDING those which already have the status of "Sold".
Here's where I am stuck: If I start with this
`$results = $rets->Search('Property', 'A','*',['Select' => 'LIST_8,LIST_105,LIST_15,LIST_19,listing_office_shortid']);`
That works well enough. BUT I'd like to fit in a filter like:
"LIST_15 != Sold", or "NOT LIST_15=Sold"...something like that. I don't get how to fit/type that into a PHRets Search().
I like PHRets but it is so hard to find well-organized/complete documentation about specific things like this. Thanks in advance.
As in my comment above I've figured out that the filter goes in the third argument position ('*', as in the original question). The tricky thing was having to find a specific "sold" code for each class of properties and placing it in that position like so: '(LIST_15=~B4ZIT1Y75TZ)', (notice the =~ combination of characters that means "does not equal" in this context). I've found the code strings for each of the property types (not clear WHY they would need to be unique for each type of property: "Sold" is Sold for any type, after all) but the correct code for a single-family residential property (type 'A' ...at least for the MLS in which I have to search is:
$results = $rets->Search('Property', 'A','(LIST_15=~B4ZIT1Y75TZ)',['Select' => 'LIST_8,LIST_105,LIST_15,LIST_19,listing_office_shortid']);
(again, the code to go with LIST_15 will be different for the different types of properties.) I think there is a better answer that involves more naturalistic language, but this works and I guess I will have to be satisfied with it for now. I hope this is of some use to anyone else struggling with this stuff.
I don't know if it can be called an algorithm but i think its close.
I will be pulling data from an API that will have certain words in the title, eg:
Great Software 2.0 Download Now
Buy Great Software for just $10
Great Software Torrent Download
So, i want to do different things based on the presence of certain words such as Download, Buy etc. For eg, if it has the word 'buy' in it, i would like to extract the word buy and the amount value that is present in the title and show it in another div, so in this case it would be "Buy for $10" or "Buy $10" etc. I can do if/else as well but I don't want to use if else because there could be more such conditions in the future. So what i am thinking about is using the send method. eg:
def buy(string)
'Buy for just' + string.scan(/\$\d+/).first
end
def whichkeyword(title)
send (title.scan(/(download|buy)/i)[0][0]).downcase.to_sym, title
end
whichkeyword('Buy this software for $10 now')
is there a better way to do this? Or is this even a good way to do it? Any help would be appreciated
First of all, use send if and only you are to call private method, use public_send otherwise.
In this particular case metaprogramming is an overkill. It requires too much redundant code, plus it requires the code to be changed for new items. I would go with building a hash like:
#hash = { 'buy' => { text: 'Buy for just %{placeholder}', re: /\$\d+/ } }
This hash might be places somewhere outside of the code, e. g. it might be stored in yml file near the code and loaded in advance. That way you might be able to change a behaviour without modifying the code, that is handy for instance in gem.
As we have a hash defined/loaded, I would call the method:
def format string
key = string[/#{Regexp.union(#hash.keys).source}/i].downcase
puts #hash[key][:text] % { placeholder: string[#hash[key][:re]] }
end
Yielding:
▶ format("Buy this software for $10 now")
#⇒ Buy for just $10
There are many advantages over declaring methods, e. g. now matches might contain spaces, you might easily add/remove matchers etc.
First of all, your algorithm can work, but has some troubles in it, like what if no keyword is applied.
I have two solutions for you:
NLP
If you want to do it much more dynamic, you can use NLP - Natural language Processing. NLP will find main words in you sentence and then you can find the good solution for each.
A good gem for that is Treat that you can use with stanford-core-nlp. After processing the data you can find the verbs and even synonyms in the sentence and figure out what to do.
sentence('Buy this software for $10 now').verbs # ['buy']
Simple Hash
This solution is less dynamic, but much more simple. Like you did with the scan, just use Constant to manage your keywords, and the output from them(I would do it with lambdas). you can also add default to the hash
KEYWORDS = Hash.new('Default Title').merge(
buy: -> { },
download: -> { }
)
KEYWORDS[sentence[/(#{KEYWORDS.keys.join('|')})/i].downcase]
I think this solution is good enough.
The only thing that looks strange is scan(/(download|buy)/i)[0][0].
As for me I don't very much like using [] syntax in Ruby.
I think using scan here is not necessary.
What about
def whichkeyword(title)
title =~ /(download|buy)/i
send $1.downcase.to_sym, title unless $1.nil?
end
UPDATE
def whichkeyword(title)
action = title[/(download|buy)/i]
public_send action.downcase.to_sym, title if action
end
I'm scanning names and descriptions of different items in order to see if there are any keyword matches.
In the code below it will return things like 'googler' or 'applecobbler', when what I'm trying to do is get exact matches only:
[name, description].join(" ").downcase.scan(/apple|microsoft|google/)
How should I do this?
My regex skills are pretty weak, but I think you need to use a word boundary:
[name, description].join(" ").downcase.scan(/\b(apple|microsoft|google)\b/)
Rubular example
Depends on what information you want, but if you just want exact match, you do not need regex for the comparing part. Just compare the relevant strings.
splitted_strings = [name, description].join(" ").downcase.split(/\b/)
splitted_strings & %w[apple microsoft google]
# => the words that match given in the order of appearance
Add proper boundaries entities in your regexp (\b). You can also use #grep method. instead of joining:
array.grep(your_regexp)
Looking at the question, and the situation I'd want to do those things, here's what I'd do for an actual program, where I had lists of sources, and their associated texts, and wanted to know the hits, I'd probably write something like this:
require 'pp'
names = ['From: Apple', 'From: Microsoft', 'From: Google.com']
descriptions = [
'"an apple a day..."',
'Microsoft Excel flight simulator... according to Microsoft',
'Searches of Google revealed multiple hits for "google"'
]
targets = %w[apple microsoft google]
regex = /\b(?:#{ Regexp.union(targets).source })\b/i
names.zip(descriptions) do |n,d|
name_hits, description_hits = [n, d].map{ |s| s.scan(regex) }
pp [name_hits, description_hits]
end
Which outputs:
[["Apple"], ["apple"]]
[["Microsoft"], ["Microsoft", "Microsoft"]]
[["Google"], ["Google", "google"]]
This would let me know the letter-case of the words, so I could try to differentiate the apple fruit from Apple the company, and get word counts, helping to show relevance of the text.
The regex looks like:
/\b(?:apple|microsoft|google)\b/i
It's case insensitive but scan will returns words in their original case.
names, descriptions and targets could all come from a database or separate files, helping to separate the data from the code and the need to modify the code as the targets change.
I'd use a list of target words and use Regexp.union to quickly build the pattern.
As part of a chat app I'm writing, I need to use regular expressions to match asterisks and underscores in chat messages and turn them into <strong> and <em> tags. Since I'm terrible with regex, I'm really stuck here. Ideally, we would have it set up such that:
One to three words, but not more, can be marked for strong/em.
Patterns such as "un*believ*able" would be matched.
Only one or the other (strong OR em) work within one line.
The above parameters are in order of importance, with only #1 being utterly necessary - the others are just prettiness. The closest I came to anything that worked was:
text = text.sub(/\*([(0-9a-zA-Z).*])\*/,'<b>\1<\/b>')
text = text.sub(/_([(0-9a-zA-Z).*])_/,'<i>\1<\/i>')
But it obviously doesn't work with any of our params.
It's odd that there's not an example of something similar already out there, given the popularity of using asterisks for bold and whatnot. If there is, I couldn't find it outside of plugins/gems (which won't work for this instance, as I really only need it in in one place in my model). Any help would be appreciated.
This should help you finish what you are doing:
sub(/\*(.*)\*/,'<b>\1</b>')
sub(/_(.*)_/,'<i>\1</i>')
Firstly, your criteria are a little strange, but, okay...
It seems that a possible algorithm for this would be to find the number of matches in a message, count them to see if there are less than 4, and then try to perform one set of substitutions.
strong_regexp = /\*([^\*]*)\*/
em_regexp = /_([^_]*)_/
def process(input)
if input ~= strong_regexp && input.match(strong_regexp).size < 4
input.sub strong_regexp, "<b>\1<\b>"
elsif input ~= em_regexp && intput.match(em_regexp).size < 4
input.sub em_regexp, "<i>\1<\i>"
end
end
Your specifications aren't entirely clear, but if you understand this, you can tweak it yourself.