Bash - replace string inside all files in directory - bash

I have 31 .ctl files in a directory, they looks like this:
load data CHARACTERSET AL32UTF8
infile '../dane/kontakty_Biura_wyborcze.csv' "str '\n'"
append
into table ODI_PUW_OSOBY2
fields terminated by ';'
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( LP CHAR(4000),
WOJEWODZTWO CHAR(4000),
POWIAT CHAR(4000),
GMINA CHAR(4000),
NAZWA_INSTYTUCJI CHAR(4000),
KOD CHAR(4000),
MIEJSCOWOSC CHAR(4000),
ADRES CHAR(4000),
NAZWISKO_I_IMIE CHAR(4000),
FUNKCJA CHAR(4000),
TEL_SLUZB_STACJON_1 CHAR(4000),
TEL_SLUZB_STACJON_2 CHAR(4000),
TEL_SLUZB_STACJON_3 CHAR(4000),
TEL_SLUZB_KOM_1 CHAR(4000),
TEL_SLUZB_KOM_2 CHAR(4000),
FAX_SLUZB_1 CHAR(4000),
FAX_SLUZB_2 CHAR(4000),
EMAIL_SLUZB_1 CHAR(4000),
EMAIL_SLUZB_2 CHAR(4000),
WWW CHAR(4000),
TYP CONSTANT "Biura wyborcze.",
ODI_SESJA_ID CONSTANT "20130717144702"
ODI_STATUS CONSTANT "0",
IMIE EXPRESSION "pg_odi_utils.zwroc_imiona(pg_odi_utils.usun_przyrostki(:NAZWISKO_I_IMIE),0)",
NAZWISKO EXPRESSION "pg_odi_utils.zwroc_nazwisko(pg_odi_utils.usun_przyrostki(:NAZWISKO_I_IMIE),0)"
)
There are 31 files like this. I need to replace value in this line:
ODI_SESJA_ID CONSTANT '20130717144702'
to new timestamp, the same for all files. Current timestamp is not known (I mean value that exists in file currently, in this case '20130717144702').
So I need to (for each file found in directory):
find line starting from ODI_SESJA_ID
replace value after 'ODI_SESJA_ID CONSTANT ' with new one
the rest lines in file should stay untouched
What is the best way to do this using bash? Should I use sed or similar tools? How?

Something like:
sed 's/\(^[ \t]\+ODI_SESJA_ID\ CONSTANT\).*/\1 \"newtimestamp\"/' tmp
should work.
Group the string that will be retained, adding the placeholder (\1) in the replacement string. Replace newtimestamp with whatever value you prefer, of course.

I would do this using sed like so:
sed -i "/^[ \t]*ODI_SESJA_ID CONSTANT/s/'[^']\+'/'REPLACEMENT'/" *.ctl
The -i flag to sed means it modifies the files in place, so I usually try it on a single file first with the -e flag instead of the -i flag and confirm that sed's output is what I was looking for.
Explanation:
The double-quotes protect my regex from the shell.
/^[ \t]*ODI_SESJA_ID CONSTANT/ matches only the lines that start with whitespace followed by 'ODI_SESJA_ID CONSTANT'.
s/'[^']\+'/'REPLACEMENT'/ substitutes 'REPLACEMENT' (quoted) for the first quoted portion of the text on matching lines.
The document at http://www.catonmat.net/blog/wp-content/uploads/2008/09/sed1line.txt (top Google hit for 'sed one liners' is pretty helpful for quickly dispatching these sort of tasks.

I found some simplest solution, it seems to be good:
sed -i 's/.*ODI_SESJA_ID.*/ ODI_SESJA_ID CONSTANT "'$(date +%s)'",/' *.ctl
It replaces lines that contains ODI_SESJA_ID to new value. Not very elegant, because it replaces entire line, instead of only value that need to be processed.

Related

Filter text file with very long lines and special symbols using bash

I have the text file with very long lines and special symbols inside. Here is an example:
{"keyword1":["A123","D356"],"keyword2":"ENXXXXXXXXXXXXXX","keyword3":[{"name1":["R3123","L2356"],"keyword4":"text here","keyword5":"4LJ"},{"app":,"keyword6":"XX-XX-XX-XXX-XXX-Axy - Important text here","keyword7":"FBG","{[ ** text here.........}
Text in keyword2 is always starting with EN followed by 14 numbers
Text in keyword6 is always starting in alphanumeric format XX-XX-XX-XXX-XXX-Axx, where X is 0 to 9, A is symbol A, and xx is 0 to 9, but my or may not be present. "Important text here" can contain any symbol including &, /, \ *.
Keywords may not be unique, but they can appear in the text only after keyword7.
What i want to achieve is to take data from the keywords 2 and 6 and make a new file with three columns: separated with semicolon
ENXXXXXXXXXXXXXX;XX-XX-XX-XXX-XXX-Axy;Important text here
Tried awk and sed, but with questionable success due to so many special symbols around.
echo '{"keyword1":["A123","D356"],"keyword2":"ENXXXXXXXXXXXXXX","keyword3":[{"name1":["R3123","L2356"],"keyword4":"text here","keyword5":"4LJ"},{"app":,"keyword6":"XX-XX-XX-XXX-XXX-Axy - Important text here","keyword7":"FBG","{[ ** text here.........}' |
{m,g,n}awk NF=NF OFS=' )\n ( ' \
FS='^.+"keyword2":"|","keyword(3".+"keyword6":"|7".+$)| - '
)
( ENXXXXXXXXXXXXXX )
( XX-XX-XX-XXX-XXX-Axy )
( Important text here )
(
It should be trivial from here.
gawk 'gsub("^\n+|\n+$",_, $!(NF = NF))^_' OFS='\n' \
FS='^.+"keyword2":"|","keyword(3".+"keyword6":"|7".+$)| - '
ENXXXXXXXXXXXXXX
XX-XX-XX-XXX-XXX-Axy
Important text here

Regex for printing pattern from string

i have a file with below content. i need to separate the content into 2 files
o/p1 should have content everything within first braces () and ` removed and only 1&2 columns printed.
o/p2 should have location with its value
$ cat dt.txt
CREATE EXTERNAL TABLE `rte.fteff_ft`(
`dt` date,
`wk_id` int,
`yq_id` int(10,00),
`te_ind` string,
`yw_dt` date,
`em_dt` date comment dfdsf sdfsdf)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0007'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://dfdf/data/ffff/ODE/TdddfT/'
TBLPROPERTIES (
'last_modified_by'='asdas',
'last_modified_time'='1639551681',
'numFiles'='1',
'totalSize'='2848434',
'transient_lastDdlTime'='1639551681')
i need output from the above in two files.
o/p1: a.txt
dt date,
wk_id int,
yq_id int(10,00),
te_ind string,
yw_dt date,
em_dt date
o/p2: b.txt
LOCATION
'hdfs://dfdf/data/ffff/ODE/TdddfT/'
First, use sed to run a couple of commands, to operate on the range of lines between 'CREATE EXTERNAL' and 'ROW DELIMITED FORMAT' where they occur at the start of the line, not including those lines. Then replace grave accent marks with nothing, then keep only the first 2 words.
sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g; s/(([^ ]+ ){2}).*/\1/' dt.txt > a.txt
EDIT: To remove the commas at the end of the line, add another command of s/,$// . Make sure to anchor the comma to the end of the line else you'll get the comma in the int declaration.
sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g;s/,$//; s/(([^ ]+ ){2}).*/\1/' dt.txt > a.txt
Second, use the -A option to grep to match the word 'LOCATION' on a line by itself plus the following 1 line.
grep -A 1 '^LOCATION$' dt.txt > b.txt

Trim whitespaces from esql string

I want to Trim the white-spaces from a string which I am getting from XML file using esql.
I am using trim command but it doesn't seems to work while trimming spaces, whereas if you want to trim something else the Trim() function seems to be working fine .
example
Trim(' ' From ' Nitin ');
Result
Nitin
Trim('i' From 'Nitin');
Result
Ntn
DECLARE whiteSpace CONSTANT CHARACTER CAST( X'090D0A20' AS CHAR CCSID 1208);
-- tab, cr, lf, space
DECLARE input2 CHARACTER 'smith';
SET input2 = whiteSpace || input2 || whiteSpace;
SET OutputRoot.XMLNSC.Top.Out2 = TRIM( whiteSpace FROM input2);
output:
<Top><Out2>smith</Out2></Top>

replace multiple lines identifying end character

I have the below code
CREATE TABLE Table1(
column1 double NOT NULL,
column2 varchar(60) NULL,
column3 varchar(60) NULL,
column4 double NOT NULL,
CONSTRAINT Index1 PRIMARY KEY CLUSTERED
(
column2 ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
) ON PRIMARY
GO
GO
and I want to replace
CONSTRAINT Index1 PRIMARY KEY CLUSTERED
(
column2 ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
) ON PRIMARY
GO
with
)
You can't assume GO is the last character of the file. After Go there can be another table script.
How can I do that with single sed or awk.
Update:
You can use the following sed command to replace even the last , before the CONSTRAINT block:
sed -r '/,/{N;/CONSTRAINT/{:a;N;/GO/!ba;s/([^,]+).*/\1\n)/};/CONSTRAINT/!n}' input.sql
Let me explain it as a multiline script:
# Search for a comma
/,/ {
# If a command was found slurp in the next line
# and append it to the current line in pattern buffer
N
# If the pattern buffer does not contain the word CONSTRAINT
# print the pattern buffer and go on with the next line of input
# meaning start searching for a comma
/CONSTRAINT/! n
# If the pattern CONSTRAINT was found we loop until we find the
# word GO
/CONSTRAINT/ {
# Define a start label for the loop
:a
# Append the next line of input to the pattern buffer
N
# If GO is still not found in the pattern buffern
# step to the start label of the loop
/GO/! ba
# The loop was exited meaning the pattern GO was found.
# We keep the first line of the pattern buffer - without
# the comma at the end and replace everything else by a )
s/([^,]+).*/\1\n)/
}
}
You can save the above multiline script in a file and execute it using
sed -rf script.sed input.sql
You can use the following sed command:
sed '/CONSTRAINT/{:a;N;/GO/!ba;s/.*/)/}' input.sql
The pattern searches for a line containing /CONSTRAINT/. If the pattern is found a block of commands is started wrapped between { }. In the block we first define a label a through :a. The we get the next line of input through N and append it to the pattern buffer. Unless we find the pattern /GO/! we'll continue at label a using the branch command b. If the pattern /GO/ is found we simply replace the buffer by a ).
An alternative can be using using a range like FredPhil suggested:
sed '/CONSTRAINT/,/GO/{s/GO/)/;te;d;:e}'
This may look scary but it is not difficult to grasp with a bit of explanation:
SED_DELIM=$(echo -en "\001")
START=' CONSTRAINT Index1 PRIMARY KEY CLUSTERED'
END='GO'
sed -n $'\x5c'"${SED_DELIM}${START}${SED_DELIM},"$'\x5c'"${SED_DELIM}${END}${SED_DELIM}{s${SED_DELIM}GO${SED_DELIM})${SED_DELIM};t a;d;:a;};p" test2.txt
The sed has the following form you may be more familiar with:
sed /regex1/,/regex2/{commands}
First it uses the SOH non-printable as the delimiter \001
Sets the START and END tags for sed multiline match
Then performs the sed command:
-n do not print by default
$'\x5c' is a Bash string literal that corresponds to backslash \
The backslashes are necessary to escape the non-printable delimiter on the multiline range match.
{s${SED_DELIM}GO${SED_DELIM})${SED_DELIM};t a;d;:a;};p:
s${SED_DELIM}GO${SED_DELIM})${SED_DELIM} replace the line that matches GO with )
t a; if there is a successful substitution in the prior statement then branch to the :a label
d if there is no subsitution then delete the line
p print whatever the result is after the commands
branch to the
I didn't see their answers prior to posting this - this answer is the same as FredPhil/hek2mgl - except in this manner you have a mechanism to be more dynamic on the LHS since you can change the delimiter to a character that is much less likely to appear in the dataset.
With GNU awk for multi-char RS and assuming you want to get rid of the comma before the "CONSTRAINT":
$ cat tst.awk
BEGIN{ RS="^$"; ORS="" }
{
gsub(/\<GO\>/,"\034")
gsub(/,\s*CONSTRAINT[^\034]+\034/,")")
gsub(/\034/,"GO")
print
}
$ gawk -f tst.awk file
CREATE TABLE Table1(
column1 double NOT NULL,
column2 varchar(60) NULL,
column3 varchar(60) NULL,
column4 double NOT NULL)
GO
The above works by replacing every stand-alone "GO" with a control char that's unlikely to appear in your input (in this case I used the same value as the default SUBSEP) so we can use that char in a negated character list in the middle gsub() to create a regexp that ends with the first "GO" after "CONSTRAINT". This is one way to do "non-greedy" matching in awk.
If there is no char that you KNOW cannot appear in your input, you can create one like this:
$ cat tst.awk
BEGIN{ RS="^$"; ORS="" }
{
gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/\<GO\>/,"b")
gsub(/,\s*CONSTRAINT[^b]+b/,")")
gsub(/b/,"GO"); gsub(/aB/,"b"); gsub(/aA/,"a")
print
}
$
$ gawk -f tst.awk file
CREATE TABLE Table1(
column1 double NOT NULL,
column2 varchar(60) NULL,
column3 varchar(60) NULL,
column4 double NOT NULL)
GO
The above initially converts all "a"s to "aA" and "b"s to "aB" so that
there are no longer any "b"s in the record, and
since all original "a"s now have an "A" after them, the only occurrences of
"aB" represent where "bs" were originally located
and that means that we can now convert all "GO"s to "b"s just like we converted them to "\034" in the first script above. Then we do the main gsub() and then unroll our initial gsub()s.
This idea of gsub()ing to create chars that cannot previously exist, using those chars, then unrolling the initial gsub()s is an extremely useful idiom to learn and remember, e.g. see https://stackoverflow.com/a/13062682/1745001 for another application.
To see it working one step at a time:
$ cat file
foo bar Hello World World able bodies
$ awk '{gsub(/a/,"aA")}1' file
foo baAr Hello World World aAble bodies
$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB")}1' file
foo aBaAr Hello World World aAaBle aBodies
$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b")}1' file
foo aBaAr Hello b b aAaBle aBodies
$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The")}1' file
foo aBaAr We Are The b aAaBle aBodies
$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The"); gsub(/b/,"World")}1' file
foo aBaAr We Are The World aAaBle aBodies
$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The"); gsub(/b/,"World"); gsub(/aB/,"b")}1' file
foo baAr We Are The World aAble bodies
$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The"); gsub(/b/,"World"); gsub(/aB/,"b"); ; gsub(/aA/,"a")}1' file
foo bar We Are The World able bodies

Using sed to search large number of files for specific string and replace it

What I am trying to do is search a large number of source files for a particular pattern and put in fort of this pattern another expression. The files I am looking in are all with the same extension *.F90.
My first step is to use grep and find all lines of those files containing allocate but not allocated, so I have:
grep –I “ allocate *(” *.F90 | grep –v allocated
The first problem that I have is that the bracket might be preceded by one or more spaces. I can have
allocate(
or allocate (
or allocate (
This is why I need the “*” in the grep command.
The general rule however (besides the spaces) says that the allocate is followed by “(” and than comes the thing that is being allocated. So I have:
allocate ( array_name ( ....
again the spaces are optional
So what I would like to do is find this string, and put in front of it the following:
If( allocated(array_name) ) deallocate(array_name)
and imidiately after this on the next line I would like to have the original string allocate(array( … .
Please note that the array_name is an alphanumeric string which after the substitutions is appearing in more than one place. It is the name of the array being alocated.
I would be very grateful if someone can give me a hint how to do this. I am stuck and have no idea how to do it.
I assume you mean you want to replace allocate ( array_name ) with If( allocated(array_name) ) deallocate(array_name) allocate ( array_name ).
In GNU or BSD sed you can do the following:
sed -i.bk -e '/allocated/t' \
-e 's/allocate *( *\([A-Za-z0-9_]*\) *)/If( allocated(\1) ) deallocate(\1) &/' \
*.F90
This will search and replace matching lines in *.F90 and skip lines with allocated on. The original file will be called *.F90.bk.
As #Anders Johansson mentioned there can be other cases where the argument to allocate is something not alphanum-underscore, then you can search for this before you search and replace:
for i in *.F90; do
echo "$i"
sed -n '/.*allocate *( *\([^ )]*\) *).*/{h; s//\1/; /^[A-Za-z0-9_]*$/t
x; p;}' "$i"
done
(note the newline after t, BSD sed interpret everything after t as a label). Press ctrl+v ctrl+j in bash to input a newline on the command line.
/a\(b\)c/ find line with matching string
h *h*old the match abc into hold space
s//\1/ *s*ubstitute last match abc with first group b
/^[a-z]*$/t if b matches ^[a-z]*$, then branch to end of script
x e*x*change hold space abc an pattern space b
p *p*rint pattern space b
cat old_file.txt | sed 's/allocate *( *\([a-zA-Z0-9_]*\)/If( allocated(\1) ) deallocate(\1)\
allocate(\1/' > new_file.txt

Resources