how to delete a large number of lines from a file - bash

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks

A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'

perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines

You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.

If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile

You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!

Related

Multiplying all values in a txt file with another value

My aim is to multiply all values in a text file with a number. In my case it is 1000.
Original text in file:
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
I want the output to look like:
(so, changing the contents of the file to...)
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
Or even rather:
4.9
43.8
149.7
443.1
882.0
975.7
995.7
1000
I am using bash on macOS in the terminal.
If you have dc :
cat infile | dc -f - -e '1k1000sa[la*Sdz0!=Z]sZzsclZx[Ld1/psblcd1-sc1<Y]sYlYx'
Using Perl
perl -lpe ' $_=$_*1000 '
with inputs and inline replacing
$ cat andy.txt
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
$ perl -i -lpe ' $_=$_*1000 ' andy.txt
$ cat andy.txt
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
$
One decimal place
perl -lpe ' $_=sprintf("%0.1f",$_*1000 ) '
Zero decimal place and rounding off
perl -lpe ' $_=sprintf("%0.0f",$_*1000 ) '
Zero decimal place and Truncating
perl -lpe ' $_=sprintf("%0.0f",int($_*1000) ) '
awk to the rescue!
$ awk '{printf "%.1f\n", $1*1000}' file > tmp && mv tmp file
Using num-utils. For answers to 8 decimal places:
numprocess '/*1000/' n.txt
For rounded answers to 1 decimal place:
numprocess '/*1000/' n.txt | numround -n '.1'
Use sed to prefix each line with 1000*, then process the resulting mathematical expressions with bc. To show only the first digit after the decimal point you can use sed again.
sed 's/^/1000*/' yourFile | bc | sed -E 's/(.*\..).*/\1/'
This will print the latter of your expected outputs. Just as you wanted, decimals are cut rather than rounded (1.36 is converted to 1.3).
To remove all decimal digits either replace the last … | sed … with sed -E 's/\..*//' or use the following command
sed 's:^.*$:1000*&/1:' yourFile | bc
With these commands overwriting the file directly is not possible. You have to write to a temporary file (append > tmp && mv tmp yourFile) or use the sponge command from the package moreutils (append | sponge yourFile).
However, if you want to remove all decimal points after the multiplication there is a trick. Instead of actually multiplying by 1000 we can syntactically shift the decimal point. This can be done in one single sed command. sed has the -i option to overwrite input files.
sed -i.bak -E 's/\..*/&000/;s/^[^.]*$/&.000/;s/\.(...).*/\1/;s/^(-?)0*(.)/\1\2/' yourFile
The command changes yourFile's content to
4
43
149
443
882
975
995
1000
A backup yourFile.bak of the original is created.
The single sed command should work with every input number format too (even for things like -.1 → -100).

Add the first line to the beginning of a file to each line with shell

I have a lot of files with the first line of them as an identifier. The subsequent lines are products of the identifier. Here is an example of the file:
0G000001:
Product_2221
Product_2222
Product_2122
...
I want to put the identifier at the beginning of every line of the file. The final output would be like this:
0G000001: Product_2221
0G000001: Product_2222
0G000001: Product:2122
....
I want to make a loop for all the files that I have. I've been trying with:
for i in $(echo `head -n1 file.$i.txt);
do
cat - file.$i.txt > file_id.$i.txt;
done
But I only duplicate the first line of the file. I know that sed can add specific text at the beginning of the file but I can't figure it out to specify that the text is the first line of the file and in a loop context.
No explicit loop necessary:
awk '
FNR==1 { close(out); out=FILENAME; sub(/\./,"_id&",out); hdr=$0; next }
{ print hdr, $0 > out }
' file.*.txt
With awk:
awk 'NR==1 { prod = $0 } NR>1 { print prod, $0 }' infile
Output:
0G000001: Product_2221
0G000001: Product_2222
0G000001: Product_2122
A sed command to do what you want could look like this:
$ sed '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile
0G000001: Product_2221
0G000001: Product_2222
0G000001: Product_2122
This does the following:
1 { # On the first line
h # Copy the pattern space to the hold space
d # Delete the line, move to next line
}
G # Append the hold space to the pattern space
s/\(.*\)\n\(.*\)/\2 \1/ # Swap the lines in the pattern space
Some seds might complain about {h;d} and require an extra semicolon, {h;d;}.
To do this in-place for a file, you can use
sed -i '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile
for GNU sed, or
sed -i '' '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile
for macOS sed. Or, if your sed doesn't support -i at all:
sed '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile > tmpfile && mv tmpfile infile
To do it in a loop over all files in a directory:
for f in /path/to/dir/*; do
sed -i '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' "$f"
done
or even directly with a glob:
sed -i '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' /path/to/dir/*
The latter works for sure with GNU sed; not sure about other seds.
sed + head solution:
for f in *.txt; do sed -i '1d; s/^/'"$(head -n1 $f)"' /' "$f"; done
-i - to modify file in-place
1d; - delete the 1st line
$(head -n1 $f) - extract the 1st line from file (getting identifier)
s/^/<identifier> / - prepend identifier to each line in file
This might work for you (GNU sed):
sed -ri '1h;1d;G;s/(.*)\n(.*)/\2 \1/' file ...
Save the first line in the hold space (HS) and then delete it from the pattern space (PS). For every line (other than the first), append the HS to the PS and then swap the lines and replace the newline with a space.

SED: copy lines from a file to specific line in another file

I can do this using the following example. The 1st command will output the lines 16...80 from file1 to patch, while the 2nd will insert the contents of patch after line 18 to file2:
sed -n 16,80p file1>patch
sed -i 18rpatch file2
However, I would like to copy directly from one file to another without using a temporary file in-between, in one command using sed (not awk, etc.). I'm pretty sure this is possible, just don't know how.
Doing this with sed requires some additional shell trickery. Assuming bash, you could use
sed -i 18r<(sed '16,80!d' file1) file2
Where <(sed '16,80!d' file1) is substituted with the name of a pipe from which the output of sed '16,80!d' file1 can be read.
Generally, I feel that it is nicer to do this with awk (if a little longer), because awk is better equipped to handle multiple input files. For example:
awk 'NR == FNR { if(FNR >= 16 && FNR <= 80) { patch = patch $0 ORS }; next } FNR == 18 { $0 = patch $0 } 1' file1 file2
This works as follows:
NR == FNR { # While processing the first file
if(FNR >= 16 && FNR <= 80) { # remember the patch lines
patch = patch $0 ORS
}
next # and do nothing else
}
FNR == 18 { # after that, while processing the first file:
$0 = patch $0 # prepend the patch to line 18
}
1 # and print regardless of whether the current
# line was patched.
However, this approach does not lend itself to in-place editing of files. This is not usually a problem; I'd simply use
cp file2 file2~
awk ... file1 file2~ > file2
with the added advantage of having a backup in case things go pear-shaped, but in the end it's up to you.
I have done something similar using:
head -80 file | tail -16 > patch
Check the documentation for your local versions of head and tail, and change the two integers to suit your requirements.
sed -i '1,15 d
34 r patch
81,$ d' YourFile
# oneliner version
sed -i -e '1,15 d' -e '34 r patch' -e '81,$ d' YourFile
order of line is not important.
You can adapt a bit or batch it with variable like this
sed -i "1,16 d
$(( 16 + 18 )) r patch
81,$ d" YourFile
but add some security about line count in this case.
if the r line is more than 1 line, following line are still counted from original place and final file is bigger than 80 - 16 lines
i dont exactly test for line taken,excluded or modified (like 34 is the 18th line of cropped file),but principe is the same
Explaination for Lines index references used in this sample:
1,15 are the heading line to remove, so file take care lines from 16 in this case
34 is the line to change the content and is the result of 18th line AFTER the first new content (line 16 in our case) so 16 + 18 = 34
81,$ are trailing lines to remove, $ mean last line and 81 is the first line (after 80 that is taken) of the unwanted trailing lines.
i had this problem, i did it in 2 steps(1-tail 2-head), for example in a text file with 20 lines(test.txt), we want to copy lines from 13 to 17 to another file(final.txt),
tail -8 test.txt > temp.txt
head -5 temp.txt > final.txt

Deleting lines matching a string in a file

I have multiple lines in a file. some lines start in the pattern below
0 8234 <Enter_newLine>
0 12 <Enter_newLine>
1 2 <Enter_newLine>
I wanted to delete the lines which start with 0 as shown above. Can someone please help me in this
This is very simple to do in awk:
awk '!/^0/' file
Any line starting with a 0 will not be printed.
To overwrite the input file, you can use the standard trick:
awk '!/^0/' file > tmp && mv tmp file
You could also use grep:
grep -v '^0' file
The -v switch means that only lines that don't match the pattern are printed.
If you want to edit the file, you can use ed, the standard editor:
ed -s file < <(printf '%s\n' g/^0/d w q)
This uses the g/re/d construct: g to use the whole file, /re/ is the regex to work with, here ^0 to match lines starting with 0 and d to delete those lines. We then send the commands w (write) and q (quit).
The same without bashisms:
printf '%s\n' g/^0/d w q | ed -s file
You can also try sed:
sed -i '/^0[[:blank:]]\+/d' file.txt
Assuming that there can be one or more space or tab after initial 0, no other character.
This awk should do:
awk '$1!="0"' file
1 2 <Enter_newLine>
This removes line where first field is just 0.

Merge two text files at a specific location, sed or awk

I have two text files, I want to place a text in the middle of another, I did some research and found information about adding single strings:
I have a comment in the second text file called STUFFGOESHERE, so I tried:
sed '/^STUFFGOESHERE/a file1.txt' file2.txt
sed: 1: "/^STUFFGOESHERE/a long.txt": command a expects \ followed by text
So I tried something different, trying to place the contents of the text based on a given line, but no luck.
Any ideas?
This should do it:
sed '/STUFFGOESHERE/ r file1.txt' file2.txt
If you want to remove the STUFFGOESHERE line:
sed -e '/STUFFGOESHERE/ r file1.txt' -e '/STUFFGOESHERE/d' file2.txt
If you want to modify file2 in place:
sed -i -e...
(or maybe sed -i '' -e..., I'm using GNU sed 4.1.5.)
If you can use ex or ed, try
cat <<EOF | ex -e - file2.txt
/^STUFFGOESHERE/
.r file1.txt
w
q
EOF
The same script works for ed:
cat <<EOF | ed file2.txt
/^STUFFGOESHERE/
.r file1.txt
w
q
EOF
awk '/STUFFGOESHERE/{while((getline line<"file1")>0){ print line};next}1' file2
From a Unix shell (bash, csh, zsh, whatever):
: | perl -e '#c = join("", map {<>} 0..eof); print $c[0] =~ /STUFFGOESHERE/ ? $` . $c[1] . $'"'"' : $c[0]' file2.txt file1.txt > newfile2.txt

Resources