I have a very basic question about parsing a fragment that contains comment.
First we import my favorite language, Pico:
import lang::pico::\syntax::Main;
Then we execute the following:
parse(#Id,"a");
gives, as expected:
Id: (Id) `a`
However,
parse(#Id,"a\n%% some comment\n");
gives a parse error.
What do I do wrong here?
There are multiple problems.
Id is a lexical, meaning layout (comments) are never there
Layout is only inserted between elements in a production and the Id lexical has only a character class, so no place to insert layout.
Even if Id was a syntax non terminal with multiple elements, it would parse comments between them not before or after.
For more on the difference between syntax, lexical, and layout see: Rascal Syntax Definitions.
If you want to parse comments around a non terminal, we have the start modified for the non terminal. Normally, layout is only inserted between elements in the production, with start it is also inserted before and after it.
Example take this grammer:
layout L = [\t\ ]* !>> [\t\ ];
lexical AB = "A" "B"+;
syntax CD = "C" "D"+;
start syntax EF = "E" "F"+;
this will be transformed into this grammar:
AB = "A" "B"+;
CD' = "C" L "D"+;
EF' = L "E" L "F"+ L;
"B"+ = "B"+ "B" | "B";
"D"+ = "D"+ L "D" | "D";
"F"+ = "F"+ L "F" | "F";
So, in particular if you'd want to parse a string with layout around it, you could write this:
lexical Id = [a-z]+;
start syntax P = Id i;
layout L = [\ \n\t]*;
parse(#start[P], "\naap\n").top // parses and returns the P node
parse(#start[P], "\naap\n").top.i // parses and returns the Id node
parse(P, "\naap"); // parse error at 0 because start wrapper is not around P
Related
Here's the deluge script to capitalize the first letter of the sentence and make the other letters small that isn't working:
a = zoho.crm.getRecordById("Contacts",input.ID);
d = a.get("First_Name");
firstChar = d.subString(0,1);
otherChars = d.removeFirstOccurence(firstChar);
Name = firstChar.toUppercase() + otherChars.toLowerCase();
mp = map();
mp.put("First_Name",d);
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
info Name;
info b;
I tried capitalizing the first letter of the alphabet and make the other letters small. But it isn't working as expected.
Try using concat
Name = firstChar.toUppercase().concat( otherChars.toLowerCase() );
Try removing the double-quotes from the Name value in the the following statement. The reason is that Name is a variable holding the case-adjusted name, but "Name" is the string "Name".
From:
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
To
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":Name});
Is there any efficient way to find the duplicate substring? Here, duplicate means that two same substring close to each other have the same value without overlap. For example, the source string is:
ABCDDEFGHFGH
'D' and 'FGH' is duplicated. 'F' appear two times in the sequence, however, they are not close to each other, so it does not duplicate. so our algorithm will return ['D', 'FGH']. I want to know whether there exists an elegant algorithm instead the brute force method?
It relates to Longest repeated substring problem, which builds Suffix Tree to provide string searching in linear time and space complexity Θ(n)
Not very efficient (suffix tree/array are better for very large strings), but very short regular expression solution (C#):
string source = #"ABCDDEFGHFGH";
string[] result = Regex
.Matches(source, #"(.+)\1")
.OfType<Match>()
.Select(match => match.Groups[1].Value)
.ToArray();
Explanation
(.+) - group of any (at least 1) characters
\1 - the same group (group #1) repeated
Test
Console.Write(string.Join(", ", result));
Outcome
D, FGH
In case of ambiguity, e.g. "AAAA" where we can provide "AA" as well as "A" the solution performs greedy and thus "AA" is returned.
Without using any regex which might turn out to be very slow, I guess it's best to use two cursors running hand to hand. The algorithm is pretty obvious from the below JS code.
function getNborDupes(s){
var cl = 0, // cursor left
cr = 0, // cursor right
ts = "", // test string
res = []; // result array
while (cl < s.length){
cr = cl;
while (++cr < s.length){
ts = s.slice(cl,cr); // ts starting from cl to cr (char # cr excluded)
// check ts with subst from cr to cr + ts.length (char # cr + ts.length excluded)
// if they match push it to result advance cursors to cl + ts.length and continue
ts === s.substr(cr,ts.length) && (res.push(ts), cl = cr += ts.length);
}
cl++;
}
return res;
}
var str = "ABCDDEFGHFGH";
console.log(getNborDupes(str));
Throughout the whole process ts will take the following values.
A
AB
ABC
ABCD
ABCDD
ABCDDE
ABCDDEF
ABCDDEFG
ABCDDEFGH
ABCDDEFGHF
ABCDDEFGHFG
B
BC
BCD
BCDD
BCDDE
BCDDEF
BCDDEFG
BCDDEFGH
BCDDEFGHF
BCDDEFGHFG
C
CD
CDD
CDDE
CDDEF
CDDEFG
CDDEFGH
CDDEFGHF
CDDEFGHFG
D
E
EF
EFG
EFGH
EFGHF
EFGHFG
F
FG
FGH
Though the cl = cr += ts.length part decides whether or not to re-start searching on from before or after the matching sub-string. As of currently the above code; "ABABABAB" input would return ["AB","AB"] for but if you make it cr = cl += ts.length then you should expect the result to be ["AB", "AB", "AB"].
I have a set of .txt files that show a custom languange, files I want to systematically modify using a Ruby script. The syntax of that language is as follows:
(I will use [some text] as meta variables for expressions, like [atom 1] to indicate an arbitrary atom, and [atom 2] to indicate an arbitrary atom diferent from the former)
Atoms: alphanumerical strings, possibly surrounded by double quotes. Examples:
same_realm
"Ok"
Statements: either
[atom_1] = [atom_2]
or
[atom_1] = { [atom or statement 1] ... [atom or statement n] }
comments: in any line of the text, any character after a # is ignored. Example:
[atom_1] = [atom_2] #This is a comment and will be ignored
If an statement is of the form [atom 1] = {[atom or statement 1] ... [atom or statement n]}, we call [atom 1] the head of the satement and [atom or statement 1] ... [atom or statement n] the body of the statement.
Before and after =, { and } there can be an arbitrary number (possibly 0) of space characters.
Between two consecutive atoms must be at least one space character, but can be any number higher than that.
So, the two expressions any_realm_lord = {...} and any_realm_lord = {...} in the example below are valid, the only syntactical difference between them being the use of any_realm_lord/any_province_lord as the head of each statement.
#Example file
#previous text
any_realm_lord={any_character={limit={same_realm=ROOT}set_character_flag=my_flag}
} any_province_lord={any_character = { #some comment
limit = {#some other comment
same_realm = ROOT} set_character_flag =
my_flag
}
}#more text
Once that's explained, this is what I want to do with ruby (I will use the Example file above to illustrate it)
1) Open a file and locate the statements which are not within the body of other statements
(in the example, I'd want it to locate the any_realm_character = {...} and the any_province_character = {...} statements)
2) Iterate over the statements located in 1) and select the ones whose head matches a certain string. If the match is in a line where there is also other atoms or statements, separate them. From now on, I will refer to the statement whose head matches the string as "the target statement".
(Say the string to match is "any_province_lord". After this step the file will look like this:
#Example file
#previous text
any_realm_lord={any_character={limit={same_realm=ROOT}set_character_flag=my_flag}
}
any_province_lord={any_character = { #some comment
limit = {#some other comment
same_realm = ROOT} set_character_flag =
my_flag
}
}#more text
)
3) create a blank line above the line where the head of the target statement is, and cut and paste there any comment in the lines that the target statement encompass
(
#Example file
#previous text
any_realm_lord={any_character={limit={same_realm=ROOT}set_character_flag=my_flag}
}
#some comment#some other comment
any_province_lord={any_character = {
limit = {
same_realm = ROOT} set_character_flag =
my_flag
}
}#more text
)
4)If the closing bracket of the target statement is in the same line as another atom(s) or statement(s), add a \n after the closing bracket
(
#Example file
#previous text
any_realm_lord={any_character={limit={same_realm=ROOT}set_character_flag=my_flag}
}
#some comment#some other comment
any_province_lord={any_character = {
limit = {
same_realm = ROOT} set_character_flag =
my_flag
}
}
#more text
)
5)Erase the body of the target statement (but not the brackets), and add new content between the brackets that I have already defined, with nice spacing.
(
#Example file
#previous text
any_realm_lord={any_character={limit={same_realm=ROOT}set_character_flag=my_flag}
}
#some comment#some other comment
any_province_lord={
#my predefined content will be here
}
#more text
)
What would then be the best way to do this in terms of efficiency? I need my program to do this to over a thousand of files (each one of an average size of 500Kb). I'm fairly new to Ruby, so I am still figuring out if for these things is best to use read, readlines, or readline in term of efficiency.
What do you think?
I hope I've been clear in the explanation of what I need and not too unnecesarily verbose
I have to cut the price from strings like that:
s1 = "somefing $ 100"
s2 = "$ 19081 words $"
s3 = "30$"
s4 = "hi $90"
s5 = "wow 150"
Output should be:
s1 = "100"
s2 = "19081"
s3 = "30"
s4 = "90"
s5 = nil
I use the following regex:
price = str[/\$\s*(\d+)|(\d+)\s*\$/, 1]
But it doesn't work for all types of strings.
Your code always returns the result of the first capture group group whereas in the failing case it is the second capture group that you are interested in. I don't think the [] method has a good way of dealing with this (when using numbered capture groups). You could write this like so
price = str =~ /\$\s*(\d+)|(\d+)\s*\$/ && ($1 || $2)
Although this isn't very legible. If instead you use a named capture group, then you can do
price = str[/\$\s*(?<amount>\d+)|(?<amount>\d+)\s*\$/, 'amount']
Duplicate named capture groups won't always do what you want but when they are in separate alternation branches (as they are here) then it should work.
The problem is that you're always getting value from the first regex group and you don't check the second. So, you're not looking the case after | - the one when digit is before $ sign.
If you look at the graphical representation of your regex, by typing 1 as a second parameter in square brackets, you are covering only the upper path (first case), and you never check lower one (second case).
Basically, try:
price = str[/\$\s*(\d+)|(\d+)\s*\$/, 1] or str[/\$\s*(\d+)|(\d+)\s*\$/, 2]
P.S. I'm not that experienced in Ruby, there might be some more optimal way to type this, but this should do the trick
try this, its much simpler but it may not be the most efficient.
p1 = s1.gsub(' ','')[/\$(\d+)|(\d+)\$/,1]
I'm working on 2 cases:
assume I have those var:
a = "hello"
b = "hello-SP"
c = "not_hello"
Any partial matches
I want to accept any string that has the variable a inside, so b and c would match.
Patterned match
I want to match a string that has a inside, followed by '-', so b would match, c does not.
I am having problem, because I always used the syntax /expression/ to define Regexp, so how dynamically define an RegExp on Ruby?
You can use the same syntax to use variables in a regex, so:
reg1 = /#{a}/
would match on anything that contains the value of the a variable (at the time the expression is created!) and
reg2 = /#{a}-/
would do the same, plus a hyphen, so hello- in your example.
Edit: As Wayne Conrad points out, if a contains "any characters that would have special meaning in a regular expression," you need to escape them. Example:
a = ".com"
b = Regexp.new(Regexp.escape(a))
"blah.com" =~ b
Late to comment but I wasn't able to find what I was looking for.The above mentioned answers didn't help me.Hope it help someone new to ruby who just wants a quick fix.
Ruby Code:
st = "BJ's Restaurant & Brewery"
#take the string you want to match into a variable
m = (/BJ\'s/i).match(string) #(/"your regular expression"/.match(string))
# m has the match #<MatchData "BJ's">
m.to_s
# this will display the match
=> "BJ's"