Deleting lines matching a string in a file

Deleting lines matching a string in a file - bash

I have multiple lines in a file. some lines start in the pattern below
0 8234 <Enter_newLine>
0 12 <Enter_newLine>
1 2 <Enter_newLine>
I wanted to delete the lines which start with 0 as shown above. Can someone please help me in this

This is very simple to do in awk:
awk '!/^0/' file
Any line starting with a 0 will not be printed.
To overwrite the input file, you can use the standard trick:
awk '!/^0/' file > tmp && mv tmp file
You could also use grep:
grep -v '^0' file
The -v switch means that only lines that don't match the pattern are printed.

If you want to edit the file, you can use ed, the standard editor:
ed -s file < <(printf '%s\n' g/^0/d w q)
This uses the g/re/d construct: g to use the whole file, /re/ is the regex to work with, here ^0 to match lines starting with 0 and d to delete those lines. We then send the commands w (write) and q (quit).
The same without bashisms:
printf '%s\n' g/^0/d w q | ed -s file

You can also try sed:
sed -i '/^0[[:blank:]]\+/d' file.txt
Assuming that there can be one or more space or tab after initial 0, no other character.

This awk should do:
awk '$1!="0"' file
1 2 <Enter_newLine>
This removes line where first field is just 0.

Related

Grep a line from a file and replace a substring and append the line to the original file in bash?

This is what I want to do.
for example my file contains many lines say :
ABC,2,4
DEF,5,6
GHI,8,9
I want to copy the second line and replace a substring EF(all occurrences) and make it XY and add this line back to the file so the file looks like this:
ABC,2,4
DEF,5,6
GHI,8,9
DXY,5,6
how can I achieve this in bash?
EDIT : I want to do this in general and not necessarily for the second line. I want to grep EF, and do the substition in whatever line is returned.

Here's a simple Awk script.
awk -F, -v pat="EF" -v rep="XY" 'BEGIN { OFS=FS }
$1 ~ pat { x = $1; sub(pat, rep, x); y = $0; sub($1, x, y); a[++n] = y }
1
END { for(i=1; i<=n; i++) print a[i] }' file
The -F , says to use comma as the input field separator (internal variable FS) and in the BEGIN block, we also set that as the output field separator (OFS).
If the first field matches the pattern, we copy the first field into x, substitute pat with rep, and then substitute the first field of the whole line $0 with the new result, and append it to the array a.
1 is a shorthand to say "print the current input line".
Finally, in the END block, we output the values we have collected into a.
This could be somewhat simplified by hardcoding the pattern and the replacement, but I figured it's more useful to make it modular so that you can plug in whatever values you need.
While this all could be done in native Bash, it tends to get a bit tortured; spending the 30 minutes or so that it takes to get a basic understanding of Awk will be well worth your time. Perhaps tangentially see also while read loop extremely slow compared to cat, why? which explains part of the rationale for preferring to use an external tool like Awk over a pure Bash solution.

You can use the sed command:
sed '
/EF/H # copy all matching lines
${ # on the last line
p # print it
g # paste the copied lines
s/EF/XY/g # replace all occurences
s/^\n// # get rid of the extra newline
}'
As a one-liner:
sed '/EF/H;${p;g;s/EF/XY/g;s/^\n//}' file.csv

If ed is available/acceptable, something like:
#!/bin/sh
ed -s file.txt <<-'EOF'
$kx
g/^.*EF.*,.*/t'x
'x+;$s/EF/XY/
,p
Q
EOF
Or in one-line.
printf '%s\n' '$kx' "g/^.*EF.*,.*/t'x" "'x+;\$s/EF/XY/" ,p Q | ed -s file.txt
Change Q to w if in-place editing is needed.
Remove the ,p to silence the output.

Using BASH:
#!/bin/bash
src="${1:-f.dat}"
rep="${2:-XY}"
declare -a new_lines
while read -r line ; do
if [[ "$line" == *EF* ]] ; then
new_lines+=("${line/EF/${rep}}")
fi
done <"$src"
printf "%s\n" "${new_lines[#]}" >> "$src"
Contents of f.dat before:
ABC,2,4
DEF,5,6
GHI,8,9
Contents of f.dat after:
ABC,2,4
DEF,5,6
GHI,8,9
DXY,5,6

Following on from the great answer by #tripleee, you can create a variation that uses a single call to sub() by outputting all records before the substitution is made, then add the updated record to the array to be output with the END rule, e.g.
awk -F, '1; /EF/ {sub(/EF/,"XY"); a[++n]=$0} END {for(i=1;i<=n;i++) print a[i]}' file
Example Use/Output
An expanded input based on your answer to my comment below the question that all occurrences of EF will be replaced with XY in all records, e.g.
$ cat file
ABC,2,4
DEF,5,6
GHI,8,9
EFZ,3,7
Use and output:
$ awk -F, '1; /EF/ {sub(/EF/,"XY"); a[++n]=$0} END {for(i=1;i<=n;i++) print a[i]}' file
ABC,2,4
DEF,5,6
GHI,8,9
EFZ,3,7
DXY,5,6
XYZ,3,7
Let me know if you have questions.

Replace every 4th occurence of char "_" with "#" in multiple files

I am trying to replace every 4th occurrence of "_" with "#" in multiple files with bash.
E.g.
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo..
would become
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo...
#perl -pe 's{_}{++$n % 4 ? $& : "#"}ge' *.txt
I have tried perl but the problem is this replaces every 4th _ carrying on from the last file. So for example, some files the first _ is replaced because it is not starting each new file at a count of 0, it carries on from the previous file.
I have tried:
#awk '{for(i=1; i<=NF; i++) if($i=="_") if(++count%4==0) $i="#"}1' *.txt
but this also does not work.
Using sed I cannot find a way to keep replacing every 4th occurrence as there are different numbers of _ in each file. Some files have 20 _, some have 200 _. Therefore, I cant specify a range.
I am really lost what to do, can anybody help?

You just need to reset the counter in the perl one using eof to tell when it's done reading each file:
perl -pe 's{_}{++$n % 4 ? "_" : "#"}ge; $n = 0 if eof' *.txt

This MAY be what you want, using GNU awk for RT:
$ awk -v RS='_' '{ORS=(FNR%4 ? RT : "#")} 1' file
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo..
It only reads each _-separated string into memory 1 at a time so should work no matter how large your input file, assuming there are _s in it.
It assumes you want to replace every 4th _ across the whole file as opposed to within individual lines.

A simple sed would handle this:
s='foo_foo_foo_foo_foo_foo_foo_foo_foo_foo'
sed -E 's/(([^_]+_){3}[^_]+)_/\1#/g' <<< "$s"
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
Explanation:
(: Start capture group #1
([^_]+_){3}: Match Match 1+ of non-_ characters followed by a _. Repeat this group 3 times to match 3 such words separated by _
[^_]+: Match 1+ of non-_ characters
): End capture group #1
_: Match a _
Replacement is \1# to replace 4th _ with a #

With GNU sed:
sed -nsE ':a;${s/(([^_]*_){3}[^_]*)_/\1#/g;p};N;ba' *.txt
-n suppresses the automatic printing, -s processes each file separately, -E uses extended regular expressions.
The script is a loop between label a (:a) and the branch-to-label-a command (ba). Each iteration appends the next line of input to the pattern space (N). This way, after the last line has been read, the pattern space contains the whole file(*). During the last iteration, when the last line has been read ($), a substitute command (s) replaces every 4th _ in the pattern space by a # (s/(([^_]*_){3}[^_]*)_/\1#/g) and prints (p) the result.
When you will be satisfied with the result you can change the options:
sed -i -nE ':a;${s/(([^_]*_){3}[^_]*)_/\1#/g;p};N;ba' *.txt
to modify the files in-place, or:
sed -i.bkp -nE ':a;${s/(([^_]*_){3}[^_]*)_/\1#/g;p};N;ba' *.txt
to modify the files in-place, but keep a *.txt.bkp backup of each file.
(*) Note that if you have very large files this could cause memory overflows.

With your shown samples, please try following awk program. Have created an awk variable named fieldNum where I have assigned 4 to it, since OP needs to enter # after every 4th _, you can keep it as per your need too.
awk -v fieldNum="4" '
BEGIN{ FS=OFS="_" }
{
val=""
for(i=1;i<=NF;i++){
val=(val?val:"") $i (i%fieldNum==0?"#":(i<NF?OFS:""))
}
print val
}
' Input_file

With GNU awk
$ cat ip.txt
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
123_45678_90
_
$ awk -v RS='(_[^_]+){3}_' -v ORS= '{sub(/_$/, "#", RT); print $0 RT}' ip.txt
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
123_45678_90
#
-v RS='(_[^_]+){3}_' set input record separator to cover sequence of four _ (text matched by this separator will be available via RT)
-v ORS= empty output record separator
sub(/_$/, "#", RT) change last _ to #
Use -i inplace for inplace editing.

If the count should reset for each line:
perl -pe's/(?:_[^_]*){3}\K_/\#/g'
$ cat a.txt
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
$ perl -pe's/(?:_[^_]*){3}\K_/\#/g' a.txt a.txt
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
If the count shouldn't reset for each line, but should reset for each file:
perl -0777pe's/(?:_[^_]*){3}\K_/\#/g'
The -0777 cause the whole file to be treated as one line. This causes the count to work properly across lines.
But since a new a match is used for each file, the count is reset between files.
$ cat a.txt
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
$ perl -0777pe's/(?:_[^_]*){3}\K_/\#/g' a.txt a.txt
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
foo_foo_foo#foo_foo_foo_foo#foo_foo_foo
foo_foo_foo_foo#foo_foo_foo_foo#foo_foo
foo_foo_foo#foo_foo_foo_foo#foo_foo_foo
To avoid that reading the entire file at once, you could continue using the same approach, but with the following added:
$n = 0 if eof;
Note that eof is not the same thing as eof()! See eof.

Delete values in line based on column index using shell script

I want to be able to delete the values to the RIGHT(starting from given column index) from the test.txt at the given column index based on a given length, N.
Column index refers to the position when you open the file in the VIM editor in LINUX.
If my test.txt contains 1234 5678, and I call my delete_var function which takes in the column number as 2 to start deleting from and length N as 2 to delete as input, the test.txt would reflect 14 5678 as it deleted the values from column 2 to column 4 as the length to delete was 2.
I have the following code as of now but I am unable to understand what I would put in the sed command.
delete_var() {
sed -i -r 's/not sure what goes here' test.txt
}
clmn_index= $1
_N=$2
delete_var "$clmn_index" "$_N" # call the method with the column index and length to delete
#sample test.txt (before call to fn)
1234 5678
#sample test.txt (after call to fn)
14 5678
Can someone guide me?

You should avoid using regex for this task. It is easier to get this done in awk with simple substr function calls:
awk -v i=2 -v n=2 'i>0{$0 = substr($0, 1, i-1) substr($0, i+n)} 1' file
14 5678

Assumping OP must use sed (otherwise other options could include cut and awk but would require some extra file IOs to replace the original file with the modified results) ...
Starting with the sed command to remove the 2 characters starting in column 2:
$ echo '1234 5678' > test.txt
$ sed -i -r "s/(.{1}).{2}(.*$)/\1\2/g" test.txt
$ cat test.txt
14 5678
Where:
(.{1}) - match first character in line and store in buffer #1
.{2} - match next 2 characters but don't store in buffer
(.*$) - match rest of line and store in buffer #2
\1\2 - output contents of buffers #1 and #2
Now, how to get variables for start and length into the sed command?
Assume we have the following variables:
$ s=2 # start
$ n=2 # length
To map these variables into our sed command we can break the sed search-replace pattern into parts, replacing the first 1 and 2 with our variables like such:
replace {1} with {$((s-1))}
replace {2} with {${n}}
Bringing this all together gives us:
$ s=2
$ n=2
$ echo '1234 5678' > test.txt
$ set -x # echo what sed sees to verify the correct mappings:
$ sed -i -r "s/(.{"$((s-1))"}).{${n}}(.*$)/\1\2/g" test.txt
+ sed -i -r 's/(.{1}).{2}(.*$)/\1\2/g' test.txt
$ set +x
$ cat test.txt
14 5678
Alternatively, do the subtraction (s-1) before the sed call and just pass in the new variable, eg:
$ x=$((s-1))
$ sed -i -r "s/(.{${x}}).{${n}}(.*$)/\1\2/g" test.txt
$ cat test.txt
14 5678

One idea using cut, keeping in mind that storing the results back into the original file will require an intermediate file (eg, tmp.txt) ...
Assume our variables:
$ s=2 # start position
$ n=2 # length of string to remove
$ x=$((s-1)) # last column to keep before the deleted characters (1 in this case)
$ y=$((s+n)) # start of first column to keep after the deleted characters (4 in this case)
At this point we can use cut -c to designate the columns to keep:
$ echo '1234 5678' > test.txt
$ set -x # display the cut command with variables expanded
$ cut -c1-${x},${y}- test.txt
+ cut -c1-1,4- test.txt
14 5678
Where:
1-${x} - keep range of characters from position 1 to position $(x) (1-1 in this case)
${y}- - keep range of characters from position ${y} to end of line (4-EOL in this case)
NOTE: You could also use cut's ability to work with the complement (ie, explicitly tell what characters to remove ... as opposed to above which says what characters to keep). See KamilCuk's answer for an example.
Obviously (?) the above does not overwrite test.txt so you'd need an extra step, eg:
$ echo '1234 5678' > test.txt
$ cut -c1-${x},${y}- test.txt > tmp.txt # store result in intermediate file
$ cat tmp.txt > test.txt # copy intermediate file over original file
$ cat test.txt
14 5678

Looks like:
cut --complement -c $1-$(($1 + $2 - 1))
Should just work and delete columns between $1 and $2 columns behind it.
please provide code how to change test.txt
cut can't modify in place. So either pipe to a temporary file or use sponge.
tmp=$(mktemp)
cut --complement -c $1-$(($1 + $2 - 1)) test.txt > "$tmp"
mv "$tmp" test.txt

Below command result in the elimination of the 2nd character. Try to use this in a loop
sed s/.//2 test.txt

Delete range of lines when line number of known or not in unix using head and tail?

This is my sample file.
I want to do this.
I have fixed requirement to delete 2nd and 3rd line keeping the 1st line.
From the bottom, I want to delete above 2 lines excluding last line, as I wouldn't know what my last line number is as it depends on file.
Once I delete my 2nd and 3rd line 4th line should ideally come at 2nd and so on, same for a bottom after delete.
I want to use head/tail command and modify the existing file only. as Changes to write back to the same file.
Sample file text format.
Input File
> This is First Line
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> ..
> ..
> ..
> ..
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> This is Last Line, should not be deleted It could be come at any line
number (variable)
Output file (same file modified)
This is First Line
..
..
..
..
This is Last Line, should not be deleted It could be come at any line number (variable)
Edit - Because of compatibility issues on Unix (Using HP Unix on ksh shell) I want to implement this using head/tail/awk. not sed.

Adding solution as per OP's request to make it genuine solution.
Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.
What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.
cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)
awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh
Change Input_file name to your actual file name and let me know how it goes then.
Following awk may help you in same(Since I don't have Hp system so didn't test it).
awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1' Input_file
EDIT: Adding non-one liner form of solution too now.
awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file

wc + sed solution:
len=$(wc -l inpfile | cut -d' ' -f1)
sed "$(echo "$((len-2)),$((len-1))")d; 2,3d" inpfile > tmp_f && mv tmp_f inpfile
$ cat inputfile
> This is First Line
> ..
> ..
> ..
> ..
> This is Last Line, should not be deleted It could be come at any line

Perl suggestion... read whole file into array #L, get index of last line. Delete 2nd last, 3rd last, 3rd and 2nd line. Print what's left.
perl -e '#L=<>; $p=$#L; delete $L[$p-1]; delete $L[$p-2]; delete $L[2]; delete $L[1]; print #L' file.txt
Or, maybe a little more succinctly with splice:
perl -e '#L=<>; splice #L,1,2; splice #L,$#L-2,2; print #L' file.txt

If you wish to have some flexibility a ksh script approach may work, though little expensive in terms of resources :
#!/bin/ksh
[ -f "$1" ] || echo "Input is not a file" || exit 1
total=$(wc -l "$1" | cut -d' ' -f1 )
echo "How many lines to delete at the end?"
read no
[ -z "$no" ] && echo "Not sure how many lines to delete, aborting" && exit 1
sed "2,3d;$((total-no)),$((total-1))d" "$1" >tempfile && mv tempfile "$1"
And feed the file as argument to the script.
Notes
This deletes second and third lines.
Plus no number of lines from last excluding last as read from user.
Note: My ksh version is 93u+ 2012-08-01

awk '{printf "%d\t%s\n", NR, $0}' < file | sed '2,3d;N;$!P;D' file
The awk here serves the purpose of providing line numbers and then passing the output to the sed which uses the line numbers to do the required operations.
%d : Used to print the numbers. You can also use '%i'
'\t' : used to place a tab between the number and string
%s : to print the string of charaters
'\n' : To create a new line
NR : to print lines numbers starting from 1
For sed
N: Read/append the next line of input into the pattern space.
$! : is for not deleting the last line
D : This is used when pattern space contains no new lines normal and start a new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the specified lines, and restart cycle with the resultant pattern space, without reading a new line of input.
P : Print up to the first embedded newline of the current pattern space.This
prints the lines after removing the subjected lines.

I enjoyed this task and wrote awk script for more scaleable case (huge files).
Reading/scanning the input file once (no need to know line count), not storing the whole file in memory.
script.awk
BEGIN { range = 3} # define sliding window range
{lines[NR] = $0} # capture each line in array
NR == 1 {print} # print 1st line
NR > range * 2{ # for lines in sliding window range bottom
print lines[NR - range]; # print sliding window top line
delete lines[NR - range]; # delete sliding window top line
}
END {print} # print last line
running:
awk -f script.awk input.txt
input.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
output:
line 1
line 4
line 5
line 6
line 7
line 10

how to delete a large number of lines from a file

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks

A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'

perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines

You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.

If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile

You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Deleting lines matching a string in a file - bash

I have multiple lines in a file. some lines start in the pattern below 0 8234 <Enter_newLine> 0 12 <Enter_newLine> 1 2 <Enter_newLine> I wanted to delete the lines which start with 0 as shown above. Can someone please help me in this

You can also try sed: sed -i '/^0[[:blank:]]\+/d' file.txt Assuming that there can be one or more space or tab after initial 0, no other character.

This awk should do: awk '$1!="0"' file 1 2 <Enter_newLine> This removes line where first field is just 0.

Related

Grep a line from a file and replace a substring and append the line to the original file in bash?

Replace every 4th occurence of char "_" with "#" in multiple files

Delete values in line based on column index using shell script

Delete range of lines when line number of known or not in unix using head and tail?

how to delete a large number of lines from a file

Categories

Resources