Regex to capture a stored procedure definition - ruby

I have a file with stored procedures defined that look like:
CREATE PROCEDURE [XXXX].[procedure_name_here]
My regex so far is:
\.\[.*\]+.*
reference: http://rubular.com/r/Z0FiI78bqF
I need some help as it doesn't seem to be 100% correct.
Note: the [xxxx]. part may or may not be present (optional), but the word CREATE has to be there otherwise it would be another stored procedure calling a stored procedure, I'm just looking for the actual definition.

If you can rely on the procedure name being between square brackets immediately after a dot, then you can write
\.\[(.*?)\]
the parentheses will capture the procedure name string for you.
If the dot is optional, or you need more verification that the line is a procedure definition, then use
CREATE\s+PROCEDURE\s+(?:\[.*?\]\.)?\[(.*?)\]

SQL is not a regular language. That means it cannot correctly be parsed with a regular expression. It is certainly possible to write a correct parser for SQL, and it is certainly possible to write a correct parser for the subset of SQL that you care about. But it is not possible to do so with only a regular expression.
You can use regular expressions as a guess. But, depending on the choice of regular expression, you will get both false positives, where the regular expression says some text is valid SQL when it's not, and false negatives, where the regular expression says some text is invalid as SQL when it is actually perfectly valid.
So you are left with: either write a correct parser for SQL, or write a regular expression that makes a best guess - and make sure all your SQL files are written to pass that regular expression.

Related

How to get first character that is causing reg expression not to match

We have one quite complex regular expression which checks for string structure.
I wonder if there is an easy way to find out which character in the string that is causing reg expression not to match.
For example,
string.match(reg_exp).get_position_which_fails
Basically, the idea is how to get "position" of state machine when it gave up.
Here is an example of regular expression:
%q^[^\p{Cc}\p{Z}]([^\p{Cc}\p{Zl}\p{Zp}]{0,253}[^\p{Cc}\p{Z}])?$
The short answer is: No.
The long answer is that a regular expression is a complicated finite state machine that may be in a state trying to match several different possible paths simultaneously. There's no way of getting a partial match out of a regular expression without constructing a regular expression that allows partial matches.
If you want to allow partial matches, either re-engineer your expression to support them, or write a parser that steps through the string using a more manual method.
You could try generating one of these automatically with Ragel if you have a particularly difficult expression to solve.

SQLAlchemy: Force column alias quoting

I want SQLAlchemy to generate the following SQL code:
SELECT t171 AS "3Harm" FROM production
I've been playing around with something similar to this SQLAlchemy ORM snippet:
session.query(Production.t171.label('3harm'))
The problem here is that this doesn't properly quote "3harm" in the generated SQL. Instead of "3harm" this generates the unquoted 3harm, which is invalid because it starts with a numerical character and therefore raises the following Oracle exception:
ORA-00923: FROM keyword not found where expected
I can get this to work by capitalizing any character in the alias name:
session.query(Production.t171.label('3Harm'))
But I would still prefer to use all lowercase column names since the rest of my program is standardized for all lowercase. Any idea how to force quote the lowercase version?
Found the solution while looking for something else.
Any column can be forced to use quotes with column.quote = True.
So for the original example:
column = Production.t171.label('3harm')
column.quote = True
session.query(column)
Success!
The SQL you want to generate isn't valid; rather than this:
SELECT t171 AS '3Harm' FROM production
... you need the identifier to be enclosed in double quotes, not single quotes:
SELECT t171 AS "3Harm" FROM production
So it looks like you should be able to do this:
session.query(Production.t171.label('"3harm"'))
or maybe:
session.query(Production.t171.label("3harm"))
But I don't use SQLAlchemy and I don't have any way to check if either is valid; you might need to escape the double quotes in the first one, for instance, though from this perhaps the second is more likely to work... and I don't understand why 3Harm would work unquoted.

Content Inside Parenthesis Regular Expression Ruby

I'm trying to take out the the content inside the parenthesis. For example, if the string is "(blah blah) This is stack(over)flow", I want to just take out "(blah blah)" but leave "(over)" alone. I'm trying
/\A\(.*\)/
but returns "(blah blah) This is stack(over)", and I'm sure why it's returning that.
Easiest fix:
/\A\(.*?\)/
Normally, * will try to match as much as it possibly can, so it'll match all the way to the last ) in the line. This is called "greedy" matching. Putting ? after +/*/? makes them non-greedy, and they'll match the shortest possible string.
But note that this won't work for nested parentheses. That's rather more complicated. Given your example, I assume this is for a pretty simple ad-hoc format where nesting isn't a concern.

Extract function names from function calls in C files

Is it posible to extract function calls in C source files, e.g.,
...
myfunc(1);
...
or
...
myfunc(anotherfunc(1, 2));
....
by just using Ruby regular expression? If not, would a parser generator such as ANTLR be useful?
This is not a full-proof pattern for finding out method calls but should just serve the pattern that you are interested in.
[a-zA-Z\s]*\([a-zA-Z0-9]*(\([a-zA-Z0-9\s]*[\s,]*[\sa-zA-Z0-9]*\))?\);
This regex will match following method call patterns.
1. myfunc(another(one,two));
2. myfunc();
3. myfunc(another());
4. myfunc(oneArg);
You can also use the regular expressions already written from grammar that are used by emacs -- imenu , etags, ecb, c-mode etc.
In the purest sense you can't, because the possibility to nest function calls recursively makes it a non-regular language. That is, you cannot write a regular expression that matches an arbitrary function call and extracts all of the contained function names.
But of course you could search incrementally for sequences of characters allowed in function names (ie., must start with a letter or underscore, followed by letters, underscore, numbers, etc...) followed by an left parenthesis, or something along those lines.
Keep in mind, however, that any such approach is prone to errors: what if a function is referenced in a comment? What if it appears inside a string constant? Really, to catch all the special cases you would have to (almost) properly parse the full C file.
Most modern regular expression engines have features to parse more than regular languages e.g. by means of back-references to subexpressions. But you shouldn't go down that road. With a proper parser such as ANTLR that can parse context-free languages you'll make your own life a lot easier.

Using parentheses in strings of text make xpath fail

My question is this: is it possible to write an xpath in which parentheses are interpreted as part of a string?
My selenium script keeps failing as soon as I use parentheses in a contains function.
For example:
//li/div/span[contains(text(),"Komkommer (BONUS)")]
I have the same problem. It has to do with encoding. Your parenthesis string may have different encoding than the comparison base.

Resources