sed delete lines from a logfile that respect numbers in another file - bash

I have a logfile that is starting to grow in size, and I need to remove certain lines that match a given pattern from it. I used grep -nr for extracting the target lines and copied them in a temp file, but I can't figure how can I tell sed to delete those lines from the log file.
I have found something similar here: Delete line from text file with line numbers from another file but this doesn't actually delete the lines, it only prints the wanted output.
Can anyone give me a hint?
Thank you!

I think, what you really need is sed -i '/pattern/d' filename.
But to answer your question:
How to delete lines matching the line numbers from another file:
(Assuming that there are no special characters in the line_numbers file, just numbers one per line...)
awk 'NR==FNR{a[$0]=1; next}; !(FNR in a)' line_numbers input.log

If you already have a way of printing what you want to standard output, there's no reason why you can't just overwrite the original file. For example, to only print lines that don't match a pattern, you could use:
grep -v 'pattern' original > tmp && mv tmp original
This redirects the output of the grep command to a temporary file, then overwrites the original file. Any other solution that does this "in-place" is only pretending to do so, after all.
There are numerous other ways to do this, using sed as suggested in the comments, or awk:
awk '!/pattern/' original > tmp && mv tmp original

If you want to use sed and your file is growing continuously, then you will have to execute sed -i '/REGEX/d' FILENAME more frequently.
Instead, you can make use of syslog-ng. You just have to edit the /etc/syslog-ng/syslog-ng.conf, wherein you need to create/edit an appropriate filter (somewhat like: f_example { not match(REGEX); }; ), save file, restart the service and you're done.
The messages containing that particular pattern will not be dumped in the log file. In this way, your file would not only stop growing, but also you need not process it periodically using sed or grep.
Reference

To remove a line with sed, you can do:
sed "${line}d" <originalLogF >tmpF
If you want remove several lines, you can pass a sed script. Here I delete the first and the second lines:
sed '1d;2d' <originalLogF >tmpF
If your log file is big, you probably have two pass. The first one to generate the sed script in a file, and a second one to apply the sed script. But it will be more efficient to have only one pass if you be able to recognize the pattern directly (and do not use "${line}d" at all). See Tom Fenech or anishsane answers, I think it is what you really need.
By the way you have to preserve the inode (not only the file name) because most of logger keep the file opened. So the final command (if you don't use sed -i) should be:
cat tmpF >originalLogF
By the way, the "-i" option (sed) is NOT magic, sed will create a temporary buffer, so if we have concurrent append to the log file, you can loose some lines.

Related

Is there a way to add text to the end of a string in a conf file in linux?

I have a problem: I have a file that, if I knew how, I would like to edit from the command. I would like to locate the file by content on that line.
I am in CyberPatriot, and my team is second in my state. I know someone who is on the number one team and I know one of the people on the first team. It kills me so I want to make a list of commands that I can go off of to make it faster and more efficient.
Imagine I had this file:
example
oof
goo
random
yes
and I wanted to change it to this:
example
oof
goo
random 'added text'
yes
How do I do so?
I know I can use the echo command to add text to the end of a file, but I don't know how to add text to the end of a specific line.
Thanks, Owen
You can use sed for this purpose.
sed 's/random/& Hello World/' file
to append text to the matched string.
You can use ^random$ to make sure the entire line is matched, before appending.
If you need to modify the file directly, you can use the -i flag, which facilitates in-place editing. Further, using -i.bak creates a backup of the original file first before modifying it, as in
sed -i.bak 's/random/& Hello World/' file
The original copy of the file can be found in file.bak
More about sed : https://www.gnu.org/software/sed/manual/sed.html
Use something like below
sed '4!d' file | xargs -I{} sed -i "4s/{}/{} \'added text\'/" file
Basically in the above command, we are getting the 4th line of the file using sed sed '4!d' file and then using this line to replace it with the same text and some new text(added text)

How to detect some pattern with grep -f on a file in terminal, and extract those lines without the pattern

I'm on mac terminal.
I have a txt file with one column with 9 IDs, allofthem.txt, where every ID starts with ¨rs¨:
rs382216
rs11168036
rs9296559
rs9349407
rs10948363
rs9271192
rs11771145
rs11767557
rs11
Also, I have another txt file, useful.txt, with those IDs that were useful in an analysis I did. It looks the same, one column with several rows of IDs, but with less IDS, only 5.
rs9349407
rs10948363
rs9271192
rs11
Problem:I want to generate a new txt file with the non-useful ones (the ones that appear in allofthem.txt but not in useful.txt).
I want to do the inverse of:
grep -f useful.txt allofthem.txt
I want to use some systematic way of deleting all the IDs in useful and obtain a file with the remaining ones. Maybe with awk or sed, but I can´t see it. Can you help me, please? Thanks in advance!
Desired output:
rs382216
rs11168036
rs9296559
rs11771145
rs11767557
-v option does the inverse for you:
grep -vxf useful.txt allofthem.txt > remaining.txt
-x option matches the whole line in allofthem.txt, not parts.
As #hek2mgl rightly pointed out, you need -F if you want to treat the content of useful.txt as strings and not patterns:
grep -vxFf useful.txt allofthem.txt > remaining.txt
Make sure your files have no leading or trailing white spaces - they could affect the results.
I recommend to use awk:
awk 'FNR==NR{patterns[$0];next} $0 in patterns' useful.txt allofthem.txt
Explanation:
FNR==NR is true as long as we are reading useful.txt. We create an index in patterns for every line of useful.txt. next stops further processing.
$0 in patterns runs, because of the previous next statement, on every line of allofthem.txt. It checks for every line of that file if it is a key in patterns. If that checks evaluates to true awk will print that line.

Remove a header from a file during parsing

My script gets every .csv file in a dir and writes them into a new file together. It also edits the files such that certain information is written into every row for a all of a file's entries. For instance this file called "trap10c_7C000000395C1641_160110.csv":
"",1/10/2016
"Timezone",-6
"Serial No.","7C000000395C1641"
"Location:","LS_trap_10c"
"High temperature limit (�C)",20.04
"Low temperature limit (�C)",-0.02
"Date - Time","Temperature (�C)"
"8/10/2015 16:00",30.0
"8/10/2015 18:00",26.0
"8/10/2015 20:00",24.5
"8/10/2015 22:00",24.0
Is converted into this format
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,Location:,LS_trap_10c
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,High,temperature,limit,(�C),20.04
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,Low,temperature,limit,(�C),-0.02
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,Date,-,Time,Temperature,(�C)
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,8/10/2015,16:00,30.0
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,8/10/2015,18:00,26.0
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,8/10/2015,20:00,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_160110.csv,8/10/2015,22:00,24.0
I use this script to do this:
dos2unix *.csv
gawk '{print FILENAME, $0}' *.csv>>all_master.erin
sed -i 's/Serial No./SerialNo./g' all_master.erin
sed -i 's/ /,/g' all_master.erin
gawk -F, '/"SerialNo."/ {sn = $3}
/"Location:"/ {loc = $3}
/"([0-9]{1,2}\/){2}[0-9]{4} [0-9]{2}:[0-9]{2}"/ {lin = $0}
{$0 =loc FS sn FS $0}1' all_master.erin > formatted_log.csv
sed -i 's/\"//g' formatted_log.csv
sed -i '/^,/ d' formatted_log.csv
rm all_master.erin
printf "\nDone\n"
I want to remove the messy header from the formatted_log.csv file. I've tried and failed to use a sed, as it seems to remove things that I don't want to remove. Is sed the best way to approach this problem? The current sed fixes some problems with the header, but I want the header gone entirely. Any lines that say "serial no." and "location" are important and require information. The other lines can be removed entirely.
I suppose you edited your script before posting; as it stands, it will not produce the posted output (all_master.erin should be $(<all_master.erin) except in the first occurrence).
You don’t specify many vital details of the format of your input files, so we must guess them. Here are my guesses:
You ignore the first two lines and the subsequent empty third line.
The 4th and 5th lines are useful, since they provide the serial number and location you want to use in all lines of that file
The 6th, 7th and 8th lines are useless.
For each file, you want to discard the first four lines of the posted output.
With these assumptions, this is how I would modify your script:
#!/bin/bash
dos2unix *.csv
awk -vFS=, -vOFS=, \
'{gsub("\"","")}
FNR==4{s=$2}
FNR==5{l=$2}
FNR>8{gsub(" ",OFS);print l,s,FILENAME,$0}' \
*.csv > formatted_log.CSV
printf "\nDone\n"
Explanation of the awk script:
First we delete all double quotes with gsub("\"",""). Then, if the line number is 4, we set the variable s to the second field, which is the serial number. If the line number is 5, we set the variable l to the second field, which is the location. If the line number is greater than 8, we do two things. First, we execute gsub(" ",OFS) to replace all spaces with the value of the output field separator: this is needed because the intended output makes two separate fields of date and time, which were only one field in the input. Second, we print the line preceded by the values of l, s and FILENAME as requested.
Note that I’m using the (questionable) Unix trick of naming the output file with an all-caps extension .CSV to avoid it being wrongly matched by a subsequent *.csv. A better solution would be to put it in another directory, but I don’t know anything about your directory tree so I suggest you modify the output file name yourself.
You could use awk to remove anything
with less than 3 columns in your final file:
awk 'NF>=3' file

How to insert the text at the begin of each line only if the given pattern matches in that line

I need to insert the text at each line only if the given pattern matches with that line.
For example,
sed -n '/pattern/p' /etc/inittab/
so, if the pattern matches with any of the lines in inittab file, then i need to insert '#' at the beginning of those lines in the same file itself.
Kindly suggest me, how to make this.
Using sed:
sed '/pattern/s/^/#/' file
This will look for lines matching the pattern and once it finds it, it will place # in front of it. This will not modify the file. In order to do so, you need to use -i option to make in-place changes. You can put an extension like -i.bak to make an optional back if you'd like.
Using awk:
awk '/pattern/{$0="#"$0}1' file
awk is made up by pattern action statements. For the matching pattern, the action we do is modify the line by placing # in front of it. The 1 at the end will print the lines for us. GNU awk v4.1 or later has in-place editing just like sed. If you are using an older version you can redirect the output to another file and mv it back to original by saying:
awk '/pattern/{$0="#"$0}1' file > tmp && mv tmp file
The in-place changes is nothing special. It does the same job as redirecting to a temp file and then moving it back. It just does all the dirty work for you behind the scenes.
This is achieved with the following sed invocation
% sed -i.orig -e '/pattern/s/^/#/' inittab
The -i.orig option tells sed to operate in place on the file, previously saving the original as inittab.orig. The editing pattern
/pattern/ selects lines matching pattern
s/^/#/ and substitute the empty word at the beginning of line with #

Delete line containing one of multiple strings

I have a text file and I want to remove all lines containing the words: facebook, youtube, google, amazon, dropbox, etc.
I know to delete lines containing a string with sed:
sed '/facebook/d' myfile.txt
I don't want to run this command five different times though for each string, is there a way to combine all the strings into one command?
Try this:
sed '/facebook\|youtube\|google\|amazon\|dropbox/d' myfile.txt
From GNU's sed manual:
regexp1\|regexp2
Matches either regexp1 or regexp2. Use parentheses to use
complex alternative regular expressions. The matching process tries
each alternative in turn, from left to right, and the first one that
succeeds is used. It is a GNU extension.
grep -vf wordsToExcludeFile myfile.txt
"wordsToExcludeFile" should contain the words you don't want, one per line.
If you need to save the result back to the same file, then add this to the command:
> myfile.new && mv myfile.new myfile.txt
With awk
awk '!/facebook|youtube|google|amazon|dropbox/' myfile.txt > filtered.txt

Resources