My current lex file looks like this:
%{
#include "foo.h"
void rem_as(char* string);
%}
DIGIT [0-9]
LITTERAL [a-zA-Z]
SEP [_-]|["."]|["\\"][ ]
FILE_NAME ({DIGIT}|{LITTERAL}|{SEP})*
PATH ({FILE_NAME}"/"{FILE_NAME})*|({FILE_NAME})
%%
"move" {return MOVE;}
"mv" {return MOVE;}
">" {return R_STDOUT;}
"2>" {return R_STDERR;}
"<" {return R_STDIN;}
"|" {return PIPE;}
"&" {return AND;}
"=" {return EQUAL_SIGN;}
"-"?{DIGIT}+ {yylval.integer = atoi(yytext); return NUM;}
{PATH} {rem_as(yytext); sscanf(yytext,"%[^\n]",yylval.string); return FILENAME;}
\n {return LINEBREAK;}
. ;
%%
That works quite good.
For example, thanks to this grammar
Move: MOVE FILENAME FILENAME { move($2, $3); }
;
I can do stuff like move a b.
Now my problem:
After adding this to my lex file
VAR_NAME [a-zA-Z][a-zA-Z0-9_-]*
...
{VAR_NAME} {return VAR_NAME;} // declared before the "=" rule
My previous rules break, especially FILENAME, which now must necessarily contain a '/'.
For example, with this grammar:
VarDecl: VAR_NAME EQUAL_SIGN FILENAME { puts("foo"); }
;
a=b/ works while a=b throws a syntax error.
Any idea about the cause of the problem?
Thanks.
The order in which you declare lex rules matters, b matches VAR_NAME, so the VAR_NAME token is emitted, before even trying to match PATH, so you end up with a VAR_NAME EQUAL_SIGN VAR_NAME rule which is invalid.
The easy solution is to make PATH a rule in you grammar, not in your lexical stuff.
PATH: VAR_NAME | FILE_NAME | VAR_NAME SLASH PATH | FILE_NAME SLASH PATH
adding just / as a token in your lex file.
Related
How can i get the values inner depends in bash script?
manifest.py
# Commented lines
{
'category': 'Sales/Subscription',
'depends': [
'sale_subscription',
'sale_timesheet',
],
'auto_install': True,
}
Expected response:
sale_subscription sale_timesheet
The major problem is linebreak, i have already tried | grep depends but i can not get the sale_timesheet value.
Im trying to add this values comming from files into a var, like:
DOWNLOADED_DEPS=($(ls -A $DOWNLOADED_APPS | while read -r file; do cat $DOWNLOADED_APPS/$file/__manifest__.py | [get depends value])
Example updated.
If this is your JSON file:
{
"category": "Sales/Subscription",
"depends": [
"sale_subscription",
"sale_timesheet"
],
"auto_install": true
}
You can get the desired result using jq like this:
jq -r '.depends | join(" ")' YOURFILE.json
This uses .depends to extract the value from the depends field, pipes it to join(" ") to join the array with a single space in between, and uses -r for raw (unquoted) output.
If it is not a json file and only string then you can use below Regex to find the values. If it's json file then you can use other methods like Thomas suggested.
^'depends':\s*(?:\[\s*)(.*?)(?:\])$
demo
you can use egrep for this as follows:
% egrep -M '^\'depends\':\s*(?:\[\s*)(.*?)(?:\])$' pathTo\jsonFile.txt
you can read about grep
As #Thomas has pointed out in a comment, the OPs input data is not in JSON format:
$ cat manifest.py
# Commented lines // comments not allowed in JSON
{
'category': 'Sales/Subscription', // single quotes should be replaced by double quotes
'depends': [
'sale_subscription',
'sale_timesheet', // trailing comma at end of section not allowed
],
'auto_install': True, // trailing comma issue; should be lower case "true"
}
And while the title of the question mentions regex, there is no sign of a regex in the question. I'll leave a regex based solution for someone else to come up with and instead ...
One (quite verbose) awk solution based on the input looking exactly like what's in the question:
$ awk -F"'" ' # use single quote as field separator
/depends/ { printme=1 ; next } # if we see the string "depends" then set printme=1
printme && /]/ { printme=0 ; next} # if printme=1 and line contains a right bracket then set printme=0
printme { printf pfx $2; pfx=" " } # if printme=1 then print a prefix + field #2;
# first time around pfx is undefined;
# subsequent passes will find pfx set to a space;
# since using "printf" with no "\n" in sight, all output will stay on a single line
END { print "" } # add a linefeed on the end of our output
' json.dat
This generates:
sale_subscription sale_timesheet
I'm trying to match a hyphen and a hash sign in a while loop in awk. My current setup is:
awk 'BEGIN { while ($1==# && $2==-) { #do stuff} }'
This obviously results in a syntax error on the hash sign. I've tried escaping it in all kinds of ways, but this either results in a syntax error, or a "backslash not last character on line" error.
So: How can I match the hash sign and hyphen in an awk expression?
The while in a BEGIN is probably not what you want, unless #do stuff includes a next or some other statement to get to the next line of input. To answer your specific question, I'll assume you want to check each line of input. I am using echo -e 'foo bar skip\n# - printme' to provide two lines of input: foo bar skip and # - printme, and I am using print $3 in place of #do stuff.
echo -e 'foo bar skip\n# - printme' | awk '($1=="#" && $2=="-") { print $3 }'
# ^ ^ ^ ^ double quotes
prints printme, as it should. You can also do that with regular expressions:
echo -e 'foo bar skip\n# - printme' | awk '($1~/^#$/ && $2~/^-$/) { print $3} '
# ^^ ^ ^^ ^ regex match
The ~ is the regex match operator, and the // delimit the regex. Edit The ^ and $ are so the regex matches the whole field, and doesn't succeed if, e.g., $1 merely contains a hyphen.
Tested on gawk 4.1.3 on cygwin.
I have a script shell with one parameter.
./script.sh 121-0/2/3
I want to print only after the "-":
Output :
0/2/3
how to do this in shell ??
Look for the $ { variable # pattern }
If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.
In your case:
var = $1 #(command line argument)
res = ${var # *-} #Wrong: spaces
res = ${var#*-} #gives your response
For instance you can look up it here
I want to perform a hierarchical set of (non-recursive) substitutions in a text file.
I want to define the rules in an ascii file "table.txt" which contains lines of blank space tabulated pairs of strings:
aaa 3
aa 2
a 1
I have tried to solve it with an awk script "substitute.awk":
BEGIN { while (getline < file) { subs[$1]=$2; } }
{ line=$0; for(i in subs)
{ gsub(i,subs[i],line); }
print line;
}
When I call the script giving it the string "aaa":
echo aaa | awk -v file="table.txt" -f substitute.awk
I get
21
instead of the desired "3". Permuting the lines in "table.txt" doesn't help. Who can explain what the problem is here, and how to circumvent it? (This is a simplified version of my actual task. Where I have a large file containing ascii encoded phonetic symbols which I want to convert into Latex code. The ascii encoding of the symbols contains {$,&,-,%,[a-z],[0-9],...)).
Any comments and suggestions!
PS:
Of course in this application for a substitution table.txt:
aa ab
a 1
a original string: "aa" should be converted into "ab" and not "1b". That means a string which was yielded by applying a rule must be left untouched.
How to account for that?
The order of the loop for (i in subs) is undefined by default.
In newer versions of awk you can use PROCINFO["sorted_in"] to control the sort order. See section 12.2.1 Controlling Array Traversal and (the linked) section 8.1.6 Using Predefined Array Scanning Orders for details about that.
Alternatively, if you can't or don't want to do that you could store the replacements in numerically indexed entries in subs and walk the array in order manually.
To do that you will need to store both the pattern and the replacement in the value of the array and that will require some care to combine. You can consider using SUBSEP or any other character that cannot be in the pattern or replacement and then split the value to get the pattern and replacement in the loop.
Also note the caveats/etc×¥ with getline listed on http://awk.info/?tip/getline and consider not using that manually but instead using NR==1{...} and just listing table.txt as the first file argument to awk.
Edit: Actually, for the manual loop version you could also just keep two arrays one mapping input file line number to the patterns to match and another mapping patterns to replacements. Then looping over the line number array will get you the pattern and the pattern can be used in the second array to get the replacement (for gsub).
Instead of storing the replacements in an associative array, put them in two arrays indexed by integer (one array for the strings to replace, one for the replacements) and iterate over the arrays in order:
BEGIN {i=0; while (getline < file) { subs[i]=$1; repl[i++]=$2}
n = i}
{ for(i=0;i<n;i++) { gsub(subs[i],repl[i]); }
print tolower($0);
}
It seems like perl's zero-width word boundary is what you want. It's a pretty straightforward conversion from the awk:
#!/usr/bin/env perl
use strict;
use warnings;
my %subs;
BEGIN{
open my $f, '<', 'table.txt' or die "table.txt:$!";
while(<$f>) {
my ($k,$v) = split;
$subs{$k}=$v;
}
}
while(<>) {
while(my($k, $v) = each %subs) {
s/\b$k\b/$v/g;
}
print;
}
Here's an answer pulled from another StackExchange site, from a fairly similar question: Replace multiple strings in a single pass.
It's slightly different in that it does the replacements in inverse order by length of target string (i.e. longest target first), but that is the only sensible order for targets which are literal strings, as appears to be the case in this question as well.
If you have tcc installed, you can use the following shell function, which process the file of substitutions into a lex-generated scanner which it then compiles and runs using tcc's compile-and-run option.
# Call this as: substitute replacements.txt < text_to_be_substituted.txt
# Requires GNU sed because I was too lazy to write a BRE
substitute () {
tcc -run <(
{
printf %s\\n "%option 8bit noyywrap nounput" "%%"
sed -r 's/((\\\\)*)(\\?)$/\1\3\3/;
s/((\\\\)*)\\?"/\1\\"/g;
s/^((\\.|[^[:space:]])+)[[:space:]]*(.*)/"\1" {fputs("\3",yyout);}/' \
"$1"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex -t)
}
With gcc or clang, you can use something similar to compile a substitution program from the replacement list, and then execute that program on the given text. Posix-standard c99 does not allow input from stdin, but gcc and clang are happy to do so provided you tell them explicitly that it is a C program (-x c). In order to avoid excess compilations, we use make (which needs to be gmake, Gnu make).
The following requires that the list of replacements be in a file with a .txt extension; the cached compiled executable will have the same name with a .exe extension. If the makefile were in the current directory with the name Makefile, you could invoke it as make repl (where repl is the name of the replacement file without a text extension), but since that's unlikely to be the case, we'll use a shell function to actually invoke make.
Note that in the following file, the whitespace at the beginning of each line starts with a tab character:
substitute.mak
.SECONDARY:
%: %.exe
#$(<D)/$(<F)
%.exe: %.txt
#{ printf %s\\n "%option 8bit noyywrap nounput" "%%"; \
sed -r \
's/((\\\\)*)(\\?)$$/\1\3\3/; #\
s/((\\\\)*)\\?"/\1\\"/g; #\
s/^((\\.|[^[:space:]])+)[[:space:]]*(.*)/"\1" {fputs("\3",yyout);}/' \
"$<"; \
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"; \
} | lex -t | c99 -D_POSIX_C_SOURCE=200809L -O2 -x c -o "$#" -
Shell function to invoke the above:
substitute() {
gmake -f/path/to/substitute.mak "${1%.txt}"
}
You can invoke the above command with:
substitute file
where file is the name of the replacements file. (The filename must end with .txt but you don't have to type the file extension.)
The format of the input file is a series of lines consisting of a target string and a replacement string. The two strings are separated by whitespace. You can use any valid C escape sequence in the strings; you can also \-escape a space character to include it in the target. If you want to include a literal \, you'll need to double it.
If you don't want C escape sequences and would prefer to have backslashes not be metacharacters, you can replace the sed program with a much simpler one:
sed -r 's/([\\"])/\\\1/g' "$<"; \
(The ; \ is necessary because of the way make works.)
a) Don't use getline unless you have a very specific need and fully understand all the caveats, see http://awk.info/?tip/getline
b) Don't use regexps when you want strings (yes, this means you cannot use sed).
c) The while loop needs to constantly move beyond the part of the line you've already changed or you could end up in an infinite loop.
You need something like this:
$ cat substitute.awk
NR==FNR {
if (NF==2) {
strings[++numStrings] = $1
old2new[$1] = $2
}
next
}
{
for (stringNr=1; stringNr<=numStrings; stringNr++) {
old = strings[stringNr]
new = old2new[old]
slength = length(old)
tail = $0
$0 = ""
while ( sstart = index(tail,old) ) {
$0 = $0 substr(tail,1,sstart-1) new
tail = substr(tail,sstart+slength)
}
$0 = $0 tail
}
print
}
$ echo aaa | awk -f substitute.awk table.txt -
3
$ echo aaaa | awk -f substitute.awk table.txt -
31
and adding some RE metacharacters to table.txt to show they are treated just like every other character and showing how to run it when the target text is stored in a file instead of being piped:
$ cat table.txt
aaa 3
aa 2
a 1
. 7
\ 4
* 9
$ cat foo
a.a\aa*a
$ awk -f substitute.awk table.txt foo
1714291
Your new requirement requires a solution like this:
$ cat substitute.awk
NR==FNR {
if (NF==2) {
strings[++numStrings] = $1
old2new[$1] = $2
}
next
}
{
delete news
for (stringNr=1; stringNr<=numStrings; stringNr++) {
old = strings[stringNr]
new = old2new[old]
slength = length(old)
tail = $0
$0 = ""
charPos = 0
while ( sstart = index(tail,old) ) {
charPos += sstart
news[charPos] = new
$0 = $0 substr(tail,1,sstart-1) RS
tail = substr(tail,sstart+slength)
}
$0 = $0 tail
}
numChars = split($0, olds, "")
$0 = ""
for (charPos=1; charPos <= numChars; charPos++) {
$0 = $0 (charPos in news ? news[charPos] : olds[charPos])
}
print
}
.
$ cat table.txt
1 a
2 b
$ echo "121212" | awk -f substitute.awk table.txt -
ababab
I have a string like this:
a1="a,b,c,(d,e),(f,g)";
How to get the array like
arr=["a","b","c","d,e","f,g"];
I want to replace the comma between parentheses with some other character and revert it after having converted into array
But i do not know how to replace only the comma between parentheses;
how can this be done?
GNU sed parser
sed 's/,/\",\"/g;s/(\(.\)\"/\1/g;s/\"\(.\))/\1/g;s/^\w\+=\"/arr=[\"/;s/;/];/'
Try following bash script where I parse the string using regular expression. It's awkward for me but seems to work:
#!/usr/bin/env bash
unset arr
a1="a,b,c,xxx(d,e),sdf(f,g)"
## The regular expression does an alternation between
## a pair of parens followed by an optional comma "\([^\)]+\)(,?)"
## or any characters followed by a comma or end of line "[^,]+(,|$)"
## After that I save all the rest of the string to match it in
## following iterations.
while [[ $a1 =~ ([^\(,]*\([^\)]+\)(,?)|[^,]+(,|$))(.*) ]]; do
## BASH_REMATCH keeps grouped expressions. The first one
## has the data extracted between commas. This removes the
## trailing one.
elem="${BASH_REMATCH[1]%,}"
## Remove opening paren, if exists one.
elem="${elem/\(/}"
## Remove trailing paren, if exists one.
elem="${elem%)}"
## Add element to an array.
arr+=("$elem")
## Use the string left (fourth grouped expression in
## the regex) to continue matching elements.
a1="${BASH_REMATCH[4]}"
done
printf "%s\n" "${arr[#]}"
Running it like:
bash script.sh
It yields:
a
b
c
xxxd,e
sdff,g
Write a parser! :D
I have no idea how to do this in bash, but I can show you how to do it in PHP (should be transferable to other languages).
$str = "a,b,c,(d,e),(f,g)";
$out = array();
$current_token = "";
$open_brackets = 0;
$length = strlen($str)
for ($i = 0; $i < $length; $i += 1) {
$chr = $str[$i];
if ($chr === "(") {
$open_brackets += 1;
} else if ($chr === ")") {
$open_brackets -= 1;
} else if ($open_brackets === 0 && $chr === ",") {
$out[] = $current_token; // push token value to out
$current_token = "";
} else {
$current_token .= $chr;
}
}
if (strlen($current_token) > 0) {
$out[] = $current_token; // dont forget the last one
}
var_dump($out); // ["a","b","c","d,e","f,g"]
Untested, but this is the outline. Keep track of number of brackets and only when the brackets are matched should , be interpretted as a delimiter.