How to use sed to insert a line before each line in a file with the original line's content surrounding by a string? - bash

I am trying to use sed (GNU sed version 4.2.1) to insert a line before each line in a file with that line's content surrounding by a string.
Input:
truncate table ALPHA;
truncate table BETA;
delete from TABLE_CHARLIE where ID=1;
Expected Result:
SELECT 'truncate table ALPHA;' from dual;
truncate table ALPHA;
SELECT 'truncate table BETA;' from dual;
truncate table BETA;
SELECT 'delete from TABLE_CHARLIE where ID=1;' from dual;
delete from TABLE_CHARLIE where ID=1;
I have tried to make use of the ampersand (&) special character, but this does not seem to work. If I put anything after the ampersand on the replacement string, the output is not correct.
Attempt 1:
sed -e "s/\(.*\)/SELECT '&\n&/g" input.txt
output:
SELECT 'truncate table ALPHA;
truncate table ALPHA;
SELECT 'truncate table BETA;
truncate table BETA;
SELECT 'delete from TABLE_CHARLIE where ID=1;
delete from TABLE_CHARLIE where ID=1;
With the preceding code, I get the SELECT ' as expected, but once I attempt to add ' from dual; to the right side of string, things get out of whack.
Attempt 2:
sed -e "s/\(.*\)/SELECT '&' from dual;\n&/g" input.txt
output:
' from dual;cate table ALPHA;
truncate table ALPHA;
' from dual;cate table BETA;
truncate table BETA;
SELECT 'delete from TABLE_CHARLIE where ID=1;' from dual;

You can take advantage of the hold space to temporarily store the original line.
sed "h;s/.*/'SELECT '&' from dual;/;p;g" input.txt
or more readably:
sed "
h
s/.*/'SELECT '&' from dual;/
p
g" input.txt
Here's a breakdown of the command.
First, each line of the input is placed in the pattern space.
The h command copies the contents of the pattern space to the hold space.
The s command performs a substitution on the pattern space. The & represents whatever was matched. This command leaves the hold space unaffected.
The p command outputs the contents of the pattern space to standard output.
The g command copies the contents of the hold space to the pattern space.
By default, the contents of the pattern space are written to standard output before reading the next input line.
As Glenn Jackman points out, you can replace p;g with G. This builds up a two-line value in the pattern space that is then printed, rather than print two separate pattern spaces.
sed "h;s/.*/'SELECT '&' from dual;/;G" input.txt
Also, you can add comments to the sed command so that you can understand what the line noise does later :), if this is in a script.
sed "
# The input line is first copied to the pattern space
h # Copy the pattern space to the hold space
s/.*/'SELECT '&' from dual;/ # Modify the pattern space
p # Print the (modified) pattern space
g # Copy the hold space to the pattern space
# The output of the pattern space (the original input line) is now printed
" input.txt

If you're looking for an alternative to sed, these work:
awk '{printf "SELECT '\''%s'\'' from dual;\n%s\n", $0, $0}' file
perl -lpe "print qq{SELECT '\$_' from dual;}" file

Your second attempt works on both 4.2.1 and 4.2.2 versions of sed. I received same invalid input when I tried to save your input file with windows line endings (line feed and carriage return).
Use this command on your input file before running your sed command:
tr -d '\15\32' < winfile.txt > unixfile.txt
Or as you suggest, simply by using the dos2unix utility.

Here's how to do it with awk:
awk -v PRE="SELECT '" -v SU="' from dual;" '{print PRE$0SU; print}'`

Related

Regex for printing pattern from string

i have a file with below content. i need to separate the content into 2 files
o/p1 should have content everything within first braces () and ` removed and only 1&2 columns printed.
o/p2 should have location with its value
$ cat dt.txt
CREATE EXTERNAL TABLE `rte.fteff_ft`(
`dt` date,
`wk_id` int,
`yq_id` int(10,00),
`te_ind` string,
`yw_dt` date,
`em_dt` date comment dfdsf sdfsdf)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0007'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://dfdf/data/ffff/ODE/TdddfT/'
TBLPROPERTIES (
'last_modified_by'='asdas',
'last_modified_time'='1639551681',
'numFiles'='1',
'totalSize'='2848434',
'transient_lastDdlTime'='1639551681')
i need output from the above in two files.
o/p1: a.txt
dt date,
wk_id int,
yq_id int(10,00),
te_ind string,
yw_dt date,
em_dt date
o/p2: b.txt
LOCATION
'hdfs://dfdf/data/ffff/ODE/TdddfT/'
First, use sed to run a couple of commands, to operate on the range of lines between 'CREATE EXTERNAL' and 'ROW DELIMITED FORMAT' where they occur at the start of the line, not including those lines. Then replace grave accent marks with nothing, then keep only the first 2 words.
sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g; s/(([^ ]+ ){2}).*/\1/' dt.txt > a.txt
EDIT: To remove the commas at the end of the line, add another command of s/,$// . Make sure to anchor the comma to the end of the line else you'll get the comma in the int declaration.
sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g;s/,$//; s/(([^ ]+ ){2}).*/\1/' dt.txt > a.txt
Second, use the -A option to grep to match the word 'LOCATION' on a line by itself plus the following 1 line.
grep -A 1 '^LOCATION$' dt.txt > b.txt

Unix sed command - global replacement is not working

I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.

How to replace text in file between known start and stop positions with a command line utility like sed or awk?

I have been tinkering with this for a while but can't quite figure it out. A sample line within the file looks like this:
"...~236 characters of data...Y YYY. Y...many more characters of data"
How would I use sed or awk to replace spaces with a B character only between positions 236 and 246? In that example string it starts at character 29 and ends at character 39 within the string. I would want to preserve all the text preceding and following the target chunk of data within the line.
For clarification based on the comments, it should be applied to all lines in the file and expected output would be:
"...~236 characters of data...YBBYYY.BBY...many more characters of data"
With GNU awk:
$ awk -v FIELDWIDTHS='29 10 *' -v OFS= '{gsub(/ /, "B", $2)} 1' ip.txt
...~236 characters of data...YBBYYY.BBY...many more characters of data
FIELDWIDTHS='29 10 *' means 29 characters for first field, next 10 characters for second field and the rest for third field. OFS is set to empty, otherwise you'll get space added between the fields.
With perl:
$ perl -pe 's/^.{29}\K.{10}/$&=~tr| |B|r/e' ip.txt
...~236 characters of data...YBBYYY.BBY...many more characters of data
^.{29}\K match and ignore first 29 characters
.{10} match 10 characters
e flag to allow Perl code instead of string in replacement section
$&=~tr| |B|r convert space to B for the matched portion
Use this Perl one-liner with substr and tr. Note that this uses the fact that you can assign to substr, which changes the original string:
perl -lpe 'BEGIN { $from = 29; $to = 39; } (substr $_, ( $from - 1 ), ( $to - $from + 1 ) ) =~ tr/ /B/;' in_file > out_file
To change the file in-place, use:
perl -i.bak -lpe 'BEGIN { $from = 29; $to = 39; } (substr $_, ( $from - 1 ), ( $to - $from + 1 ) ) =~ tr/ /B/;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
I would use GNU AWK following way, for simplicity sake say we have file.txt content
S o m e s t r i n g
and want to change spaces from 5 (inclusive) to 10 (inclusive) position then
awk 'BEGIN{FPAT=".";OFS=""}{for(i=5;i<=10;i+=1)$i=($i==" "?"B":$i);print}' file.txt
output is
S o mBeBsBt r i n g
Explanation: I set field pattern (FPAT) to any single character and output field seperator (OFS) to empty string, thus every field is populated by single characters and I do not get superfluous space when print-ing. I use for loop to access desired fields and for every one I check if it is space, if it is I assign B here otherwise I assign original value, finally I print whole changed line.
Using GNU awk:
awk -v strt=29 -v end=39 '{ ram=substr($0,strt,(end-strt));gsub(" ","B",ram);print substr($0,1,(strt-1)) ram substr($0,(end)) }' file
Explanation:
awk -v strt=29 -v end=39 '{ # Pass the start and end character positions as strt and end respectively
ram=substr($0,strt,(end-strt)); # Extract the 29th to the 39th characters of the line and read into variable ram
gsub(" ","B",ram); # Replace spaces with B in ram
print substr($0,1,(strt-1)) ram substr($0,(end)) # Rebuild the line incorporating raw and printing the result
}'file
This is certainly a suitable task for perl, and saddens me that my perl has become so rusty that this is the best I can come up with at the moment:
perl -e 'local $/=\1;while(<>) { s/ /B/ if $. >= 236 && $. <= 246; print }' input;
Another awk but using FS="":
$ awk 'BEGIN{FS=OFS=""}{for(i=29;i<=39;i++)sub(/ /,"B",$i)}1' file
Output:
"...~236 characters of data...YBBYYY.BBY...many more characters of data"
Explained:
$ awk ' # yes awk yes
BEGIN {
FS=OFS="" # set empty field delimiters
}
{
for(i=29;i<=39;i++) # between desired indexes
sub(/ /,"B",$i) # replace space with B
# if($i==" ") # couldve taken this route, too
# $i="B"
}1' file # implicit output
With sed :
sed '
H
s/\(.\{236\}\)\(.\{11\}\).*/\2/
s/ /B/g
H
g
s/\n//g
s/\(.\{236\}\)\(.\{11\}\)\(.*\)\(.\{11\}\)/\1\4\3/
x
s/.*//
x' infile
When you have an input string without \r, you can use:
sed -r 's/(.{236})(.{10})(.*)/\1\r\2\r\3/;:a;s/(\r.*) (.*\r)/\1B\2/;ta;s/\r//g' input
Explanation:
First put \r around the area that you want to change.
Next introduce a label to jump back to.
Next replace a space between 2 markers.
Repeat until all spaces are replaced.
Remove the markers.
In your case, where the length doesn't change, you can do without the markers.
Replace a space after 236..245 characters and try again when it succeeds.
sed -r ':a; s/^(.{236})([^ ]{0,9}) /\1\2B/;ta' input
This might work for you (GNU sed):
sed -E 's/./&\n/245;s//\n&/236/;h;y/ /B/;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Divide the problem into 2 lines, one with spaces and one with B's where there were spaces.
Then using pattern matching make a composite line from the two lines.
N.B. The newline can be used as a delimiter as it is guaranteed not to be in seds pattern space.

bash: separate blocks of lines between pattern x and y

I have a similar question to this one Sed/Awk - pull lines between pattern x and y, however, in my case I want to output each block-of-lines to individual files (named after the first pattern).
Input example:
-- filename: query1.sql
-- sql comments goes here or else where
select * from table1
where id=123;
-- eof
-- filename: query2.sql
insert into table1
(id, date) values (1, sysdate);
-- eof
I want the bash script to generate 2 files: query1.sql and query2.sql with the following content:
query1.sql:
-- sql comments goes here or else where
select * from table1
where id=123;
query2.sql:
insert into table1
(id, date) values (1, sysdate);
Thank you
awk '/-- filename/{if(f)close(f); f=$3;next} !/eof/&&/./{print $0 >> f}' input
Brief explanation,
-- filename{if(f)close(f); f=$3;next}: locate the record contains filename, and assign it to f
!/eof/&&/./{print $0 >> f}: if following lines don't contain 'eof' neither empty, save it to the corresponding file.
This might work for you (GNU sed):
sed -r '/-- filename: (\S+)/!d;s##/&/,/-- eof/{//d;w \1#p;s/.*/}/p;d' file |
sed -nf - file
Create a sed script from the input file and run it against the input file
N.B. Two lines are needed for each query as the program for the query must be surrounded by braces and the w command must end in a newline.
Using GNU awk to handle multiple open files for you:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{out=$3; f=1}' file
or with any awk:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{close(out); out=$3; f=1}' file

check for newline character in a csv file

currently in my code below line used to fix the newline break in a csv :
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' MY_FILE.csv > MY_FILE.csv.tmp
I want to do a pre check like if there is a new line break present in the file then only script will run the above command to fix that file, how do I add a pre check for this ?
my csv file looks as below and having 1 millions records in it :
20160711,"M","N1","F","S","A","good data with.....some special character and space (new line)
space ..
....","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"
new line can appear anywhere in the file .
#sabya Perhaps count the double quotes on a line? If odd, then there is a return somewhere:
gawk '{if (and(1,gsub(/"/, "\"")) HasReturn = 1; exit} END {exit HasReturn}'
I would respectfully suggest you load the data as given and not alter it in order to maintain data integrity by constructing the control file to preserve the newline between the double-quotes.
Construct the control file like this using the "str" clause on the infile option line to set the end of record character. It tells sqlldr that hex 0D (carriage return, or ^M) is the record separator (this way it will ignore the linefeeds inside the double-quotes):
LOAD DATA
infile "test.dat" "str x'0D'"
TRUNCATE
INTO TABLE test
replace
fields terminated by ","
optionally enclosed by '"'
(
cola char,
colb char,
colc char
)
More info in this post: https://stackoverflow.com/a/37216660/2543416

Resources