sed bash substitution only if variable has a value - bash

I'm trying to find a way using variables and sed to do a specific text substitution using a changing input file, but only if there is a value given to replace the existing string with. No value= do nothing (rather than remove the existing string).
Example
Substitute.csv contains 5 lines=
this-has-text
this-has-text
this-has-text
this-has-text
and file.text has one sentence=
"When trying this I want to be sure that text-this-has is left alone."
If I run the following command in a shell script
Text='text-this-has'
Change=`sed -n '3p' substitute.csv`
grep -rl $Text /home/username/file.txt | xargs sed -i "s|$Text|$Change|"
I end up with
"When trying this I want to be sure that is left alone."
But I'd like it to remain as
"When trying this I want to be sure that text-this-has is left alone."
Any way to tell sed "If I give you nothing new, do nothing"?
I apologize for the overthinking, bad habit. Essentially what I'd like to accomplish is if line 3 of the csv file has a value - replace $Text with $Change inline. If the line is empty, leave $Text as $Text.

Text='text-this-has'
Change=$(sed -n '3p' substitute.csv)
if [[ -n $Change ]]; then
grep -rl $Text /home/username/file.txt | xargs sed -i "s|$Text|$Change|"
fi

Just keep it simple and use awk:
awk -v t="$Text" -v c="$Change" 'c!=""{sub(t,c)} {print}' file
If you need inplace editing just use GNU awk with -i inplace.
Given your clarified requirement, this is probably what you actually want:
awk -v t="$Text" 'NR==FNR{if (NR==3) c=$0; next} c!=""{sub(t,c)} {print}' Substitute.csv file.txt

Testing whether $Change has a value before launching into the grep and sed is undoubtedly the most efficient bash solution, although I'm a bit skeptical about the duplication of grep and sed; it saves a temporary file in the case of files which don't contain the target string, but at the cost of an extra scan up to the match in the case of files which do contain it.
If you're looking for typing efficiency, though, the following might be interesting:
find . -name '*.txt' -exec sed -i "s|$Text|${Change:-&}|" {} \;
Which will recursively find all files whose names end with the extension .txt and execute the sed command on each one. ${Change:-&} means "the value of $Change if it exists and is non-empty, and otherwise an &"; & in the replacement of a sed s command means "the matched text", so s|foo|&| replaces every occurrence of foo with itself. That's an expensive no-op but if your time matters more than your cpu time, it might have been worth it.

Related

bash scripting: Can I get sed to output the original line, then a space, then the modified line?

I'm new to Unix in all its forms, so please go easy on me!
I have a bash script that will pipe an ls command with arbitrary filenames into sed, which will use an arbitrary replacement pattern on the files, and then this will be piped into awk for some processing. The catch is, awk needs to know both the original file name and the new one.
I've managed everything except getting the original file names into awk. For instance, let's say my files are test.* and my replacement pattern is 's:es:ar;', which would change every occurrence of "test" to "tart". For testing purposes I'm just using awk to print what it's receiving:
ls "$#" | sed "$pattern" | awk '{printf "0: %s\n1: %s\n2: %s\n", $0,$1,$2}'
where test.* is in $# and the pattern is stored in $pattern.
Clearly, this doesn't get me to where I want to be. The output is obviously
0: tart.c
1: tart.c
2:
If I could get sed to output "test.c tart.c", then I'd have two parameters for awk. I've played around with the pattern to no avail, even hardcoding "test.c" into the replacement. But of course that just gave me amateur results like "ttest.c art.c". Is it possible for sed to remember the input, then work it into the beginning of the output? Do I even have the right ideas? Thanks in advance!
Two ways to change the first t in a b in the duplicated field.
Duplicate (& replays the matched part), change first word and swap (remember 2 strings with a space in between):
echo test.c | sed -r 's/.*/& &/;s/t/b/;s/([^ ]*) (.*)/\2 \1/'
or with more magic (copy original value to buffer, make the change, insert value from buffer as the first line and replace eond of line with a space)
echo test.c | sed 'h;s/t/b/;x;G;s/\n/ /'
Use Perl instead of sed:
echo test.c | perl -lne 'print "$_ ", s/es/ar/r'
-l removes the newline from input and adds it after each print. The /r modifier to the substitution returns the modified string instead of changing the variable (Perl 5.14+ needed).
Old answer, not working for s/t/b/2 or s/.*/replaced/2:
You can duplicate the contents of the line with s/.*/& &/, then just tell sed that it should only apply the second substitution (this works at least in GNU sed):
echo test.c | sed 's/.*/& &/; s/es/ar/2'
$ echo 'foo' | awk '{old=$0; gsub(/o/,"e"); print old, $0}'
foo fee

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Deleting first n rows and column x from multiple files using Bash script

I am aware that the "deleting n rows" and "deleting column x" questions have both been answered individually before. My current problem is that I'm writing my first bash script, and am having trouble making that script work the way I want it to.
file0001.csv (there are several hundred files like these in one folder)
Data number of lines 540
No.,Profile,Unit
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
Desired output
1,1027.84
2,1027.92
3,1028
4,1028.81
I am able to use sed and cut individually but for some reason the following bash script doesn't take cut into account. It also gives me an error "sed: can't read ls: No such file or directory", yet sed is successful and the output is saved to the original files.
sem2csv.sh
for files in 'ls *.csv' #list of all .csv files
do
sed '1,2d' -i $files | cut -f '1-2' -d ','
done
Actual output:
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
I know there may be awk one-liners but I would really like to understand why this particular bash script isn't running as intended. What am I missing?
The -i option of sed modifies the file in place. Your pipeline to cut receives no input because sed -i produces no output. Without this option, sed would write the results to standard output, instead of back to the file, and then your pipeline would work; but then you would have to take care of writing the results back to the original file yourself.
Moreover, single quotes inhibit expansion -- you are "looping" over the single literal string ls *.csv. The fact that you are not quoting it properly then causes the string to be subject to wildcard expansion inside the loop. So after variable interpolation, your sed command expands to
sed -i 1,2d ls *.csv
and then the shell expands *.csv because it is not quoted. (You should have been receiving a warning that there is no file named ls in the current directory, too.) You probably attempted to copy an example which used backticks (ASCII 96) instead of single quotes (ASCII 39) -- the difference is quite significant.
Anyway, the ls is useless -- the proper idiom is
for files in *.csv; do
sed '1,2d' "$files" ... # the double quotes here are important
done
Mixing sed and cut is usually not a good idea because you can express anything cut can do in terms of a simple sed script. So your entire script could be
for f in *.csv; do
sed -i -e '1,2d' -e 's/,[^,]*$//' "$f"
done
which says to remove the last comma and everything after it. (If your sed does not like multiple -e options, try with a semicolon separator: sed -i '1,2d;s/,[^,]*$//' "$f")
You may use awk,
$ awk 'NR>2{sub(/,[^,]*$/,"",$0);print}' file
1,1027.84
2,1027.92
3,1028
4,1028.81
or
sed -i '1,2d;s/,[^,]*$//' file
1,2d; for deleting the first two lines.
s/,[^,]*$// removes the last comma part in remaining lines.

Delete everything before last / in a file path

I have many file paths in a file that look like so:
/home/rtz11/files/testfiles/547/prob547455_01
I want to use a bash script that will print all the filenames to the screen, basically whatever comes after the last /. I don't want to assume that it would always be the same length because it might not be.
Would there be a way to delete everything before the last /? Maybe a sed command?
Using sed for this is vast overkill -- bash has extensive string manipulation built in, and using this built-in support is far more efficient when operating on only a single line.
s=/home/rtz11/files/testfiles/547/prob547455_01
basename="${s##*/}"
echo "$basename"
This will remove everything from the beginning of the string greedily matching */. See the bash-hackers wiki entry for parameter expansion.
If you only want to remove everything prior to the last /, but not including it (a literal reading of your question, but also a generally less useful operation), you might instead want if [[ $s = */* ]]; then echo "/${s##*/}"; else echo "$s"; fi.
awk '{print $NF}' FS=/ input-file
The 'print $NF' directs awk to print the last field of each line, and assigning FS=/ makes forward slash the field delimeter. In sed, you could do:
sed 's#.*/##' input-file
which simply deletes everything up to and including the last /.
Meandering but simply because I can remember the syntax I use:
cat file | rev | cut -d/ -f1 | rev
Many ways to skin a 'cat'. Ouch.
One more way:
Use the basename executable (command):
basename /path/with/many/slashes/and/file.extension
>file.extension
basename /path/with/many/slashes/and/file.extension .extension
OR
basename -s .extension /path/with/many/slashes/and/file.extension
> file

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources