Deleting each line in a file from index specified in another file in bash [duplicate] - bash

I want to delete one or more specific line numbers from a file. How would I do this using sed?

If you want to delete lines from 5 through 10 and line 12th:
sed -e '5,10d;12d' file
This will print the results to the screen. If you want to save the results to the same file:
sed -i.bak -e '5,10d;12d' file
This will store the unmodified file as file.bak, and delete the given lines.
Note: Line numbers start at 1. The first line of the file is 1, not 0.

You can delete a particular single line with its line number by
sed -i '33d' file
This will delete the line on 33 line number and save the updated file.

and awk as well
awk 'NR!~/^(5|10|25)$/' file

$ cat foo
1
2
3
4
5
$ sed -e '2d;4d' foo
1
3
5
$

This is very often a symptom of an antipattern. The tool which produced the line numbers may well be replaced with one which deletes the lines right away. For example;
grep -nh error logfile | cut -d: -f1 | deletelines logfile
(where deletelines is the utility you are imagining you need) is the same as
grep -v error logfile
Having said that, if you are in a situation where you genuinely need to perform this task, you can generate a simple sed script from the file of line numbers. Humorously (but perhaps slightly confusingly) you can do this with sed.
sed 's%$%d%' linenumbers
This accepts a file of line numbers, one per line, and produces, on standard output, the same line numbers with d appended after each. This is a valid sed script, which we can save to a file, or (on some platforms) pipe to another sed instance:
sed 's%$%d%' linenumbers | sed -f - logfile
On some platforms, sed -f does not understand the option argument - to mean standard input, so you have to redirect the script to a temporary file, and clean it up when you are done, or maybe replace the lone dash with /dev/stdin or /proc/$pid/fd/1 if your OS (or shell) has that.
As always, you can add -i before the -f option to have sed edit the target file in place, instead of producing the result on standard output. On *BSDish platforms (including OSX) you need to supply an explicit argument to -i as well; a common idiom is to supply an empty argument; -i ''.

The shortest, deleting the first line in sed
sed -i '1d' file
As Brian states here, <address><command> is used, <address> is <1> and <command> <d>.

I would like to propose a generalization with awk.
When the file is made by blocks of a fixed size
and the lines to delete are repeated for each block,
awk can work fine in such a way
awk '{nl=((NR-1)%2000)+1; if ( (nl<714) || ((nl>1025)&&(nl<1029)) ) print $0}'
OriginFile.dat > MyOutputCuttedFile.dat
In this example the size for the block is 2000 and I want to print the lines [1..713] and [1026..1029].
NR is the variable used by awk to store the current line number.
% gives the remainder (or modulus) of the division of two integers;
nl=((NR-1)%BLOCKSIZE)+1 Here we write in the variable nl the line number inside the current block. (see below)
|| and && are the logical operator OR and AND.
print $0 writes the full line
Why ((NR-1)%BLOCKSIZE)+1:
(NR-1) We need a shift of one because 1%3=1, 2%3=2, but 3%3=0.
+1 We add again 1 because we want to restore the desired order.
+-----+------+----------+------------+
| NR | NR%3 | (NR-1)%3 | (NR-1)%3+1 |
+-----+------+----------+------------+
| 1 | 1 | 0 | 1 |
| 2 | 2 | 1 | 2 |
| 3 | 0 | 2 | 3 |
| 4 | 1 | 0 | 1 |
+-----+------+----------+------------+

cat -b /etc/passwd | sed -E 's/^( )+(<line_number>)(\t)(.*)/--removed---/g;s/^( )+([0-9]+)(\t)//g'
cat -b -> print lines with numbers
s/^( )+(<line_number>)(\t)(.*)//g -> replace line number to null (remove line)
s/^( )+([0-9]+)(\t)//g #remove numbers the cat printed

Related

Delete the 3 last line of my txt with bash? [duplicate]

I want to remove some n lines from the end of a file. Can this be done using sed?
For example, to remove lines from 2 to 4, I can use
$ sed '2,4d' file
But I don't know the line numbers. I can delete the last line using
$sed $d file
but I want to know the way to remove n lines from the end. Please let me know how to do that using sed or some other method.
I don't know about sed, but it can be done with head:
head -n -2 myfile.txt
If hardcoding n is an option, you can use sequential calls to sed. For instance, to delete the last three lines, delete the last one line thrice:
sed '$d' file | sed '$d' | sed '$d'
From the sed one-liners:
# delete the last 10 lines of a file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2
Seems to be what you are looking for.
A funny & simple sed and tac solution :
n=4
tac file.txt | sed "1,$n{d}" | tac
NOTE
double quotes " are needed for the shell to evaluate the $n variable in sed command. In single quotes, no interpolate will be performed.
tac is a cat reversed, see man 1 tac
the {} in sed are there to separate $n & d (if not, the shell try to interpolate non existent $nd variable)
Use sed, but let the shell do the math, with the goal being to use the d command by giving a range (to remove the last 23 lines):
sed -i "$(($(wc -l < file)-22)),\$d" file
To remove the last 3 lines, from inside out:
$(wc -l < file)
Gives the number of lines of the file: say 2196
We want to remove the last 23 lines, so for left side or range:
$((2196-22))
Gives: 2174
Thus the original sed after shell interpretation is:
sed -i '2174,$d' file
With -i doing inplace edit, file is now 2173 lines!
If you want to save it into a new file, the code is:
sed -i '2174,$d' file > outputfile
You could use head for this.
Use
$ head --lines=-N file > new_file
where N is the number of lines you want to remove from the file.
The contents of the original file minus the last N lines are now in new_file
Just for completeness I would like to add my solution.
I ended up doing this with the standard ed:
ed -s sometextfile <<< $'-2,$d\nwq'
This deletes the last 2 lines using in-place editing (although it does use a temporary file in /tmp !!)
To truncate very large files truly in-place we have truncate command.
It doesn't know about lines, but tail + wc can convert lines to bytes:
file=bigone.log
lines=3
truncate -s -$(tail -$lines $file | wc -c) $file
There is an obvious race condition if the file is written at the same time.
In this case it may be better to use head - it counts bytes from the beginning of file (mind disk IO), so we will always truncate on line boundary (possibly more lines than expected if file is actively written):
truncate -s $(head -n -$lines $file | wc -c) $file
Handy one-liner if you fail login attempt putting password in place of username:
truncate -s $(head -n -5 /var/log/secure | wc -c) /var/log/secure
This might work for you (GNU sed):
sed ':a;$!N;1,4ba;P;$d;D' file
Most of the above answers seem to require GNU commands/extensions:
$ head -n -2 myfile.txt
-2: Badly formed number
For a slightly more portible solution:
perl -ne 'push(#fifo,$_);print shift(#fifo) if #fifo > 10;'
OR
perl -ne 'push(#buf,$_);END{print #buf[0 ... $#buf-10]}'
OR
awk '{buf[NR-1]=$0;}END{ for ( i=0; i < (NR-10); i++){ print buf[i];} }'
Where "10" is "n".
With the answers here you'd have already learnt that sed is not the best tool for this application.
However I do think there is a way to do this in using sed; the idea is to append N lines to hold space untill you are able read without hitting EOF. When EOF is hit, print the contents of hold space and quit.
sed -e '$!{N;N;N;N;N;N;H;}' -e x
The sed command above will omit last 5 lines.
It can be done in 3 steps:
a) Count the number of lines in the file you want to edit:
n=`cat myfile |wc -l`
b) Subtract from that number the number of lines to delete:
x=$((n-3))
c) Tell sed to delete from that line number ($x) to the end:
sed "$x,\$d" myfile
You can get the total count of lines with wc -l <file> and use
head -n <total lines - lines to remove> <file>
Try the following command:
n = line number
tail -r file_name | sed '1,nd' | tail -r
This will remove the last 3 lines from file:
for i in $(seq 1 3); do sed -i '$d' file; done;
I prefer this solution;
head -$(gcalctool -s $(cat file | wc -l)-N) file
where N is the number of lines to remove.
sed -n ':pre
1,4 {N;b pre
}
:cycle
$!{P;N;D;b cycle
}' YourFile
posix version
To delete last 4 lines:
$ nl -b a file | sort -k1,1nr | sed '1, 4 d' | sort -k1,1n | sed 's/^ *[0-9]*\t//'
I came up with this, where n is the number of lines you want to delete:
count=`wc -l file`
lines=`expr "$count" - n`
head -n "$lines" file > temp.txt
mv temp.txt file
rm -f temp.txt
It's a little roundabout, but I think it's easy to follow.
Count up the number of lines in the main file
Subtract the number of lines you want to remove from the count
Print out the number of lines you want to keep and store in a temp file
Replace the main file with the temp file
Remove the temp file
For deleting the last N lines of a file, you can use the same concept of
$ sed '2,4d' file
You can use a combo with tail command to reverse the file: if N is 5
$ tail -r file | sed '1,5d' file | tail -r > file
And this way runs also where head -n -5 file command doesn't run (like on a mac!).
#!/bin/sh
echo 'Enter the file name : '
read filename
echo 'Enter the number of lines from the end that needs to be deleted :'
read n
#Subtracting from the line number to get the nth line
m=`expr $n - 1`
# Calculate length of the file
len=`cat $filename|wc -l`
#Calculate the lines that must remain
lennew=`expr $len - $m`
sed "$lennew,$ d" $filename
A solution similar to https://stackoverflow.com/a/24298204/1221137 but with editing in place and not hardcoded number of lines:
n=4
seq $n | xargs -i sed -i -e '$d' my_file
In docker, this worked for me:
head --lines=-N file_path > file_path
Say you have several lines:
$ cat <<EOF > 20lines.txt
> 1
> 2
> 3
[snip]
> 18
> 19
> 20
> EOF
Then you can grab:
# leave last 15 out
$ head -n5 20lines.txt
1
2
3
4
5
# skip first 14
$ tail -n +15 20lines.txt
15
16
17
18
19
20
POSIX compliant solution using ex / vi, in the vein of #Michel's solution above.
#Michel's ed example uses "not-POSIX" Here-Strings.
Increment the $-1 to remove n lines to the EOF ($), or just feed the lines you want to (d)elete. You could use ex to count line numbers or do any other Unix stuff.
Given the file:
cat > sometextfile <<EOF
one
two
three
four
five
EOF
Executing:
ex -s sometextfile <<'EOF'
$-1,$d
%p
wq!
EOF
Returns:
one
two
three
This uses POSIX Here-Docs so it is really easy to modify - especially using set -o vi with a POSIX /bin/sh.
While on the subject, the "ex personality" of "vim" should be fine, but YMMV.
This will remove the last 12 lines
sed -n -e :a -e '1,10!{P;N;D;};N;ba'

File Name comparision in Bash

I have two files containing list of files. I need to check what files are missing in the list of second file. Problem is that I do not have to match full name, but only need to match last 19 Characters of the file names.
E.g
MyFile12343220150510230000.xlsx
and
MyFile99999620150510230000.xlsx
are same files.
This is a unique problem and I don't know how to start. Kindly help.
awk based solution:
$ awk '
{start=length($0) - 18;}
NR==FNR{a[substr($0, start)]++; next;} #save last 19 characters for every line in file2
{if(!a[substr($0, start)]) print $0;} #If that is not present in file1, print that line.
' file2.list file.list
First you can use comm to match the exact file names and obtain a list of files not matchig. Then you can use agrep. I've never used it, but you might find it useful.
Or, as last option, you can do a brute force and for every line in the first file search into the second:
#!/bin/bash
# Iterate through the first file
while read LINE; do
# Find the section of the filename that has to match in the other file
CHECK_SECTION="$(echo "$LINE" | sed -nre 's/^.*([0-9]{14})\.(.*)$/\1.\2/p')"
# Create a regex to match the filenames in the second file
SEARCH_REGEX="^.*$CHECK_SECTION$"
# Search...
egrep "$SEARCH_REGEX" inputFile_2.txt
done < inputFile_1.txt
Here I assumed the filenames end with 14 digits that must match in the other file and a file extension that can be different from file to file but that has to match too:
MyFile12343220150510230000.xlsx
| variable | 14digits |.ext
So, if the first file is FILE1 and the second file is FILE2 then if the intention is only to identify the files in FILE2 that don't exist in FILE1, the following should do:
tmp1=$(mktemp)
tmp2=$(mktemp)
cat $FILE1 | rev | cut -c -19 | sort | uniq > ${tmp1}
cat $FILE2 | rev | cut -c -19 | sort | uniq > ${tmp2}
diff ${tmp1} ${tmp2} | rev
rm ${tmp1} ${tmp2}
In a nutshell, this reverses the characters on each line, and extracts the part you're interested in, saving to a temporary file, for each list of files. The reversal of characters is done since you haven't said whether or not the length of filenames is guaranteed to be constant---the only thing we can rely on here is that the last 19 characters are of a fixed format (in this case, although the format is easily inferred, it isn't really relevant). The sort is important in order for the diff to show you what's not in the second file that is in the first.
If you're certain that there will only ever be files missing from FILE2 and not the other way around (that is, files in FILE2 that don't exist in FILE1), then you can clean things up by removing the cruft introduced by diff, so the last line becomes:
diff ${tmp1} ${tmp2} | rev | grep -i xlsx | sed 's/[[:space:]]\+.*//'
The grep limits the output to those lines with xlsx filenames, and the sed removes everything on a line from the first space encountered onwards.
Of course, technically this only tells you what time-stamped-grouped groups of files exist in FILE1 but not FILE2--as I understand it, this is what you're looking for (my understanding of your problem description is that MyFile12343220150510230000.xlsx and MyFile99999620150510230000.xlsx would have identical content). If the file names are always the same length (as you subsequently affirmed), then there's no need for the rev's and the cut commands can just be amended to refer to fixed character positions.
In any case, to get the final list of files, you'll have to use the "cleaned up" output to filter the content of FILE1; so, modifying the script above so that it includes the "cleanup" command, we can filter the files that you need using a grep--the whole script then becomes:
tmp1=$(mktemp)
tmp2=$(mktemp)
missing=$(mktemp)
cat $FILE1 | rev | cut -c -19 | sort | uniq > ${tmp1}
cat $FILE2 | rev | cut -c -19 | sort | uniq > ${tmp2}
diff ${tmp1} ${tmp2} | rev | grep -i xlsx | sed 's/[[:space:]]\+.*//' > ${missing}
grep -E "("`echo $(<${missing}) | sed 's/[[:space:]]/|/g'`")" ${tmp1}
rm ${tmp1} ${tmp2} ${missing}
The extended grep command (-E) just builds up an "or" regular expression for each timestamp-plus-extension and applies it to the first file. Of course, this is all assuming that there will never be timestamp-groups that exist in FILE2 and not in FILE1--if this is the case, then the "diff output processing" bit needs to be a little more clever.
Or you could use your standard coreutil tools:
for i in $(cat file1 file2 | sort | uniq -u); do
grep -q "$i" f1.txt && \
echo "f2 missing '$i'" || \
echo "f1 missing '$i'"
done
It will identify which non-common entries are missing from which file. You can also manipulate the non-common filenames in any way you like, e.g. parameter expansion/substring extraction, substring removal, or character indexes.

remove n lines from STDOUT on bash

Do you have any bash solution to remove N lines from stdout?
like a 'head' command, print all lines, only except last N
Simple solition on bash:
find ./test_dir/ | sed '$d' | sed '$d' | sed '$d' | ...
but i need to copy sed command N times
Any better solution?
except awk, python etc...
Use head with a negative number. In my example it will print all lines but last 3:
head -n -3 infile
if head -n -3 filename doesn't work on your system (like mine), you could also try the following approach (and maybe alias it or create a function in your .bashrc)
head -`echo "$(wc -l filename)" | awk '{ print $1 - 3; }'` filename
Where filename and 3 above are your file and number of lines respectively.
The tail command can skip from the end of a file on Mac OS / BSD. tail accepts +/- prefix, which facilitates expression below, which will show 3 lines from the start
tail -n +3 filename.ext
Or, to skip lines from the end of file, use - prefixed, instead.
tail -n -3 filenme.ext
Typically, the default for tail is the - prefix, thus counting from the end of the file. See a similar answer to a different question here: Print a file skipping first X lines in Bash

How to copy text from a given line and column number and paste it in another file at a given line and column number

I have a file address.txt containing
`0x0003FFB0'
at line 1 and column 2
i want to paste it in another file 'linker.txt' at line 20 and column 58
How can i do that using bash scripting?
Note that the content of the input file can be random , doesn't have to be same each time.
But the length of the word to be copied will always be the same
you can use SED
sed -i.old '20s/^.{58}/&0x0003FFB0/' file
it will produce a file.old with the original content and file will be updated with this address. Quickly explain
sed '20command' file # do command in line 20
sed '20s/RE/&xxx/' file # search for regular expression, replace by the original text (&) + xxx
to read the address and put in this sed, cut it possible
ADDRESS=$(head -1 address.txt | cut -f2)
sed -i.old "20s/^.{58}/&${ADDRESS}/" file
You can use a combination of head and tail to get any line number. To get line 2, get the last line (using tail) of the first two lines (using head):
ADDRESS=$(head -2 address.txt | tail -1 | cut -f2)
For the third line:
ADDRESS=$(head -3 address.txt | tail -1 | cut -f2)
And so on.

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

Resources