remove lines based on file input pattern using sed - bash

I have been trying to solve a simple sed line deletion problem.
Looked here and there. It didn't solve my problem.
My problem could simply be achieved by using sed -i'{/^1\|^2\|^3/d;}' infile.txt which deletes lines beginning with 1,2 and 3 from the infile.txt.
But what I want instead is to take the starting matching patterns from a file than manually feeding into the stream editor.
E.g: deletePattern
1
3
2
infile.txt
1 Line here
2 Line here
3 Line here
4 Line here
Desired output
4 Line here
Thank you in advance,

This grep should work:
grep -Fvf deletePattern infile.txt
4 Line here
But this will skip a line if patterns in deletePattern are found anywhere in the 2nd file.
More accurate results can be achieved by using this awk command:
awk 'FILENAME == ARGV[1] && FNR==NR{a[$1];next} !($1 in a)' deletePattern infile.txt
4 Line here

Putting together a quick command substitution combined with a character class will allow a relatively short oneliner:
$ sed -e "/^[$( while read -r ch; do a+=$ch; done <pattern.txt; echo "$a" )]/d" infile.txt
4 Line here
Of course, change the -e to -i for actual in-place substitution.

With GNU sed (for -f -):
sed 's!^[0-9][0-9]*$!/^&[^0-9]/d!' deletePattern | sed -f - infile.txt
The first sed transforms deletePattern into a sed script, then the second sed applies this script.

Try this:
sed '/^[123]/ d' infile.txt

Related

bash / sed : editing of the file

I use sed to remove all lines starting from "HETATM" from the input file and cat to combine another file with the output recieved from SED
sed -i '/^HETATM/ d' file1.pdb
cat fil2.pdb file1.pdb > file3.pdb
is this way to do it in one line e.g. using only sed?
If you want to consider awk then it can be done in a single command:
awk 'FNR == NR {print; next} !/^HETATM/' file2.pdb file1.pdb > file3.pdb
With cat + grep combination please try following code. Simple explanation would be, using cat command's capability to concatenate file's output when multiple files are passed to it and using grep -v to remove all words starting from HETATM in file1.pdb before sending is as an input to cat command and creating new file named file3.pdb from cat command's output.
cat file2.pdb <(grep -v '^HETATM' file1.pdb) > file3.pdb
I'm not sure what you mean by "remove all lines starting from 'HETATM'", but if you mean that any line that appears in the file after a line that starts with "HETATM" will not be outputted, then your sed expression won't do it - it will just remove all lines starting with the pattern while leaving all following lines that do not start with the pattern.
There are ways to get the effect I believe you wanted, possibly even with sed - but I don't know sed all that well. In perl I'd use the range operator with a guaranteed non-matching end expression (not sure what will be guaranteed for your input, I used "XXX" in this example):
perl -ne 'unless (/^HETATM/../XXX/) { print; }' file1.pdb
mawk '(FNR == NR) < NF' FS='^HETATM' f1 f2

Delete the 3 last line of my txt with bash? [duplicate]

I want to remove some n lines from the end of a file. Can this be done using sed?
For example, to remove lines from 2 to 4, I can use
$ sed '2,4d' file
But I don't know the line numbers. I can delete the last line using
$sed $d file
but I want to know the way to remove n lines from the end. Please let me know how to do that using sed or some other method.
I don't know about sed, but it can be done with head:
head -n -2 myfile.txt
If hardcoding n is an option, you can use sequential calls to sed. For instance, to delete the last three lines, delete the last one line thrice:
sed '$d' file | sed '$d' | sed '$d'
From the sed one-liners:
# delete the last 10 lines of a file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2
Seems to be what you are looking for.
A funny & simple sed and tac solution :
n=4
tac file.txt | sed "1,$n{d}" | tac
NOTE
double quotes " are needed for the shell to evaluate the $n variable in sed command. In single quotes, no interpolate will be performed.
tac is a cat reversed, see man 1 tac
the {} in sed are there to separate $n & d (if not, the shell try to interpolate non existent $nd variable)
Use sed, but let the shell do the math, with the goal being to use the d command by giving a range (to remove the last 23 lines):
sed -i "$(($(wc -l < file)-22)),\$d" file
To remove the last 3 lines, from inside out:
$(wc -l < file)
Gives the number of lines of the file: say 2196
We want to remove the last 23 lines, so for left side or range:
$((2196-22))
Gives: 2174
Thus the original sed after shell interpretation is:
sed -i '2174,$d' file
With -i doing inplace edit, file is now 2173 lines!
If you want to save it into a new file, the code is:
sed -i '2174,$d' file > outputfile
You could use head for this.
Use
$ head --lines=-N file > new_file
where N is the number of lines you want to remove from the file.
The contents of the original file minus the last N lines are now in new_file
Just for completeness I would like to add my solution.
I ended up doing this with the standard ed:
ed -s sometextfile <<< $'-2,$d\nwq'
This deletes the last 2 lines using in-place editing (although it does use a temporary file in /tmp !!)
To truncate very large files truly in-place we have truncate command.
It doesn't know about lines, but tail + wc can convert lines to bytes:
file=bigone.log
lines=3
truncate -s -$(tail -$lines $file | wc -c) $file
There is an obvious race condition if the file is written at the same time.
In this case it may be better to use head - it counts bytes from the beginning of file (mind disk IO), so we will always truncate on line boundary (possibly more lines than expected if file is actively written):
truncate -s $(head -n -$lines $file | wc -c) $file
Handy one-liner if you fail login attempt putting password in place of username:
truncate -s $(head -n -5 /var/log/secure | wc -c) /var/log/secure
This might work for you (GNU sed):
sed ':a;$!N;1,4ba;P;$d;D' file
Most of the above answers seem to require GNU commands/extensions:
$ head -n -2 myfile.txt
-2: Badly formed number
For a slightly more portible solution:
perl -ne 'push(#fifo,$_);print shift(#fifo) if #fifo > 10;'
OR
perl -ne 'push(#buf,$_);END{print #buf[0 ... $#buf-10]}'
OR
awk '{buf[NR-1]=$0;}END{ for ( i=0; i < (NR-10); i++){ print buf[i];} }'
Where "10" is "n".
With the answers here you'd have already learnt that sed is not the best tool for this application.
However I do think there is a way to do this in using sed; the idea is to append N lines to hold space untill you are able read without hitting EOF. When EOF is hit, print the contents of hold space and quit.
sed -e '$!{N;N;N;N;N;N;H;}' -e x
The sed command above will omit last 5 lines.
It can be done in 3 steps:
a) Count the number of lines in the file you want to edit:
n=`cat myfile |wc -l`
b) Subtract from that number the number of lines to delete:
x=$((n-3))
c) Tell sed to delete from that line number ($x) to the end:
sed "$x,\$d" myfile
You can get the total count of lines with wc -l <file> and use
head -n <total lines - lines to remove> <file>
Try the following command:
n = line number
tail -r file_name | sed '1,nd' | tail -r
This will remove the last 3 lines from file:
for i in $(seq 1 3); do sed -i '$d' file; done;
I prefer this solution;
head -$(gcalctool -s $(cat file | wc -l)-N) file
where N is the number of lines to remove.
sed -n ':pre
1,4 {N;b pre
}
:cycle
$!{P;N;D;b cycle
}' YourFile
posix version
To delete last 4 lines:
$ nl -b a file | sort -k1,1nr | sed '1, 4 d' | sort -k1,1n | sed 's/^ *[0-9]*\t//'
I came up with this, where n is the number of lines you want to delete:
count=`wc -l file`
lines=`expr "$count" - n`
head -n "$lines" file > temp.txt
mv temp.txt file
rm -f temp.txt
It's a little roundabout, but I think it's easy to follow.
Count up the number of lines in the main file
Subtract the number of lines you want to remove from the count
Print out the number of lines you want to keep and store in a temp file
Replace the main file with the temp file
Remove the temp file
For deleting the last N lines of a file, you can use the same concept of
$ sed '2,4d' file
You can use a combo with tail command to reverse the file: if N is 5
$ tail -r file | sed '1,5d' file | tail -r > file
And this way runs also where head -n -5 file command doesn't run (like on a mac!).
#!/bin/sh
echo 'Enter the file name : '
read filename
echo 'Enter the number of lines from the end that needs to be deleted :'
read n
#Subtracting from the line number to get the nth line
m=`expr $n - 1`
# Calculate length of the file
len=`cat $filename|wc -l`
#Calculate the lines that must remain
lennew=`expr $len - $m`
sed "$lennew,$ d" $filename
A solution similar to https://stackoverflow.com/a/24298204/1221137 but with editing in place and not hardcoded number of lines:
n=4
seq $n | xargs -i sed -i -e '$d' my_file
In docker, this worked for me:
head --lines=-N file_path > file_path
Say you have several lines:
$ cat <<EOF > 20lines.txt
> 1
> 2
> 3
[snip]
> 18
> 19
> 20
> EOF
Then you can grab:
# leave last 15 out
$ head -n5 20lines.txt
1
2
3
4
5
# skip first 14
$ tail -n +15 20lines.txt
15
16
17
18
19
20
POSIX compliant solution using ex / vi, in the vein of #Michel's solution above.
#Michel's ed example uses "not-POSIX" Here-Strings.
Increment the $-1 to remove n lines to the EOF ($), or just feed the lines you want to (d)elete. You could use ex to count line numbers or do any other Unix stuff.
Given the file:
cat > sometextfile <<EOF
one
two
three
four
five
EOF
Executing:
ex -s sometextfile <<'EOF'
$-1,$d
%p
wq!
EOF
Returns:
one
two
three
This uses POSIX Here-Docs so it is really easy to modify - especially using set -o vi with a POSIX /bin/sh.
While on the subject, the "ex personality" of "vim" should be fine, but YMMV.
This will remove the last 12 lines
sed -n -e :a -e '1,10!{P;N;D;};N;ba'

How do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it?

I need to edit specific lines in a text file. I have a pattern here, pattern.txt:
1
3
6
17
etc...
and a file with text, file.txt:
1 text
2 text
3 text
4 text
5 text
etc...
I want to add the words _PUT FLAG HERE to the end of each line of file.txt on lines that have match indicated by the pattern.txt.
I have
grep -F -f pattern.txt file.txt | awk '{print $0 "_PUT FLAG HERE" }'
But I can't seem to figure out a way to shove those changes back into the original file so it looks like this:
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
6 teeskjtkljeltsj _PUT FLAG HERE
etc...
It's a lot like trying to use tr, but much more convoluted. There should be a logical way to string AWK and grep, I just can't seem conceive of a way to put the pieces together into one pipe that would do this, and I can't find the answer anywhere. (If you explain a sed way to do this, please explain the regex.)
Assume your awk has been taken hostage.
A GNU sed/grep solution! To generate a sed script that does what you want, we get the lines to change from the input file:
$ grep -wFf pattern.txt file.txt
1 text
3 text
6 text
17 text
This matches complete words (-w) so 1 text is matched, but 11 text is not; -F is for fixed strings (no regex, should be faster) and -f pattern.txt reads the patterns to look for from a file.
Now we pipe this to sed to generate a script:
$ grep -wFf pattern.txt file.txt | sed 's#.*#/^&$/s/$/_PUT FLAG HERE/#'
/^1 text$/s/$/_PUT FLAG HERE/
/^3 text$/s/$/_PUT FLAG HERE/
/^6 text$/s/$/_PUT FLAG HERE/
/^17 text$/s/$/_PUT FLAG HERE/
The sed command in the pipe matches the complete line (.*) and assembles an address plus substitution command (& stands for the whole previously matched line).
Now we take all that and use it as input for sed by means of process substitution (requires Bash):
$ sed -f <(grep -wFf pattern.txt file.txt | sed 's#.*#/^&$/s/$/_PUT FLAG HERE/#') file.txt
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
6 text_PUT FLAG HERE
7 text
8 text
9 text
10 text
11 text
12 text
13 text
14 text
15 text
16 text
17 text_PUT FLAG HERE
Done!
Yes, yes, awk is shorter1, faster and more beautiful.
1 Actually not, but still.
Another remark: the grep step is not actually required, see answers by potong and Walter A.
Try this:
pattern.txt:
1
3
6
17
file.txt:
1 text
2 text
3 text
4 text
5 text
Use awk:
$ awk 'NR == FNR{seen[$1];next} $1 in seen{printf("%s_PUT FLAG HERE\n",$0);next}1' pattern.txt file.txt
Output:
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
awk to the rescue!
you don't need other tools with the full power of awk at your disposal
$ awk -v tag='_PUT FLAG HERE' 'NR==FNR{a[$1];next}
{print $0 ($1 in a?tag:"")}' pattern file
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
just as an exercise, do the same with join/sort
$ sort <(join pattern file --nocheck-order |
sed 's/$/_PUT_FLAG_HERE/') <(join -v2 pattern file --nocheck-order)
1 text_PUT_FLAG_HERE
2 text
3 text_PUT_FLAG_HERE
4 text
5 text
perhaps defining function for DRY
$ f() { join $1 pattern file --nocheck-order; }; sort <(f "" |
sed 's/$/_PUT_FLAG_HERE/') <(f -v2)
The solution of #Benjamin can be simplified to
sed -f <(sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt) file.txt
Explanation
# Read awk commands from a file
sed -f awkcommands.txt pattern.txt file.txt
# Read awk commands from other command
sed -f <(other_command) file.txt
# Append string to every line by replacing end-of-line character $
sed 's/$/_PUT FLAG HERE/'
# Only append string on lines matching something
sed '/something/s/$/_PUT FLAG HERE/#'
# Only append string on lines matching something at the beginning of the line followed by a space
sed '/^something /s/$/_PUT FLAG HERE/#'
# Get the word something in above command selecting the whole line with .* and putting it in the new sed command with &.
# The slashes are used for the inner sed command, so use # here
sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt
# Now all together:
sed -f <(sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt) file.txt
This might work for you (GNU sed):
sed 's#.*#/&/s/$/_PUT FLAG HERE/#' pattern.txt | sed -f - file
This turns the pattern file into a sed script which is then invoked against the text file.
This solution uses only Bash (4.0+) features:
# Set up associative array 'patterns' whose keys are patterns
declare -A patterns
for pat in $(< pattern.txt) ; do patterns[$pat]=1 ; done
# Slurp all the lines of 'file.txt' into the 'lines' array
readarray -t lines < file.txt
# Write each old line in the file, possibly with a suffix, back to the file
for line in "${lines[#]}" ; do
read -r label text <<< "$line"
printf '%s%s\n' "$line" "${patterns[$label]+_PUT FLAG HERE}"
done > file.txt
NOTES:
The changes are written back to 'file.txt', as the question seems to specify.
Bash 4.0 or later is required for associative arrays and readarray.
Bash is very slow, so this solution may not be practical if either of the files is large (more than 10 thousand lines).

Finding lines without letters or numbers, only with commas, BASH

Good day to all,
I was wondering how to find the line number of a line with only commas. The only but is that I don't know how many commas have each line:
Input:
...
Total,Total,,,
,,,,
,,,,
Alemania,,1.00,,
...
Thanks in advance for any clue
You can do this with a single command:
egrep -n '^[,]+$' file
Line numbers will be prefixed.
Result with your provided four test lines:
2:,,,,
3:,,,,
Now, if you only want the line numbers, you can cut them easily:
egrep -n '^[,]+$' file | cut -d: -f1
sed
sed -n '/^,\+$/=' file
awk
awk '/^,+$/&&$0=NR' file
With GNU sed:
sed -nr '/^,+$/=' file
Output:
2
3

sed: Argument list too long

I have created a script in Unix environment. In the script, I used the sed command as shown below to delete some lines from the file. I want to delete a specified set of lines, not necessarily a simple range, from the file, specified by line numbers.
sed -i "101d; 102d; ... 4930d;" <file_name>
When I execute this it shows the following error:
sed: Arg is too long
Can you please help to resolve this problem?
If you want to delete a contiguous range of lines, you can specify a range of line numbers:
sed -i '101,4930d' file
If you want to delete some arbitrary set of lines that can't easily be expressed as a range, you can put the commands in a file rather than on the command line, and use sed -f.
For example, if foo.sed contains:
2d
4d
6d
8d
10d
then this:
sed -i -f foo.sed file
will delete lines 2, 4, 6, 8, and 10 from file. Putting the commands in a file rather than on the command line avoids limits on command line length.
If there's some pattern to the lines you want to delete, you might consider using a more sophisticated tool such as Awk or Perl.
I had this exact same problem.
I originally put the giant sed command sed -i "101d; 102d; ... 4930d;" <file_name> in a file and tried to execute as a bash script.
To fix - put only the deletion commands in a file and run that file as a sed script. I was able to execute 18,193 deletion commands that had failed to run before.
sed -i -f to_delete.sed input_file
to_delete.sed:
101d;102d;...4930d
With awk:
awk ' NR < 101 || NR > 4930 { print } ' input_file
This might work for you (GNU sed and awk):
cat <<\! >/tmp/a
> 2
> 4
> 6
> 8
> !
seq 10 >/tmp/b
sed 's/$/d/' /tmp/a | sed -f - /tmp/b
1
3
5
7
9
10
awk 'NR==FNR{a[$0];next};FNR in a{next};1' /tmp/{a,b}
1
3
5
7
9
10

Resources