Delete line containing one of multiple strings - bash

I have a text file and I want to remove all lines containing the words: facebook, youtube, google, amazon, dropbox, etc.
I know to delete lines containing a string with sed:
sed '/facebook/d' myfile.txt
I don't want to run this command five different times though for each string, is there a way to combine all the strings into one command?

Try this:
sed '/facebook\|youtube\|google\|amazon\|dropbox/d' myfile.txt
From GNU's sed manual:
regexp1\|regexp2
Matches either regexp1 or regexp2. Use parentheses to use
complex alternative regular expressions. The matching process tries
each alternative in turn, from left to right, and the first one that
succeeds is used. It is a GNU extension.

grep -vf wordsToExcludeFile myfile.txt
"wordsToExcludeFile" should contain the words you don't want, one per line.
If you need to save the result back to the same file, then add this to the command:
> myfile.new && mv myfile.new myfile.txt

With awk
awk '!/facebook|youtube|google|amazon|dropbox/' myfile.txt > filtered.txt

Related

Regex to match characters between two specific characters in shell script

I want to clean my file before/after saving so I have to delete unnecessary characters that I have there. Sadly, even that my regex is working in Regex101, it does not work in shell script I wrote.
I am getting my list from Kubernetes via
kubectl get pods -n $1 -o jsonpath='{range .items[*]}{#.spec.containers[*].image}{","}{#.status.containerStatuses[*].imageID}{"\n"}{end}'
Then I saving it to the temp file and using sed to clear it - the regex should match and (sed should) delete any character between , and # (also should delete #). I am escaping them since they are special characters.
sed -i 's/(?<=\,)(.*?)(?<=\#)//g' temp
The problem is that this regex is working fine (for example in Regex101) but is not working with the sed command. I even tried awk but getting the same output.
awk '!/(?<=\,)(.*?)(?<=\#)/' temp
Am I missing something or is the regex acting differently somehow in Unix/shell?
Thanks for any input.
Example content of the file (for test):
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,docker-pullable://docker.elastic.co/elasticsearch/elasticsearch#sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,docker-pullable://bitnami/kafka#sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,docker-pullable://bitnami/mongodb#sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,docker-pullable://bitnami/postgresql#sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,docker-pullable://bitnami/zookeeper#sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
Expected output:
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
You are trying to use Perl extensions which are not supported by more traditional regex tools like sed and Awk.
Perhaps see also Why are there so many different regular expression dialects? and the Stack Overflow regex tag info page.
If I can guess what you are trying to do, you want simply
sed -i 's/,[^#]*#/,/g' temp
The /g flag is unnecessary if you only expect one match per line.
Neither , nor # is a regex metacharacter; they do not require escaping.
Usually you would want to avoid using a temporary file or sed -i; perhaps simply
kubectl blah blah | sed 's/,[^#]*#/,/' > temp
to create the file, or remove the redirection if you want to pipe the results further.

How to detect some pattern with grep -f on a file in terminal, and extract those lines without the pattern

I'm on mac terminal.
I have a txt file with one column with 9 IDs, allofthem.txt, where every ID starts with ¨rs¨:
rs382216
rs11168036
rs9296559
rs9349407
rs10948363
rs9271192
rs11771145
rs11767557
rs11
Also, I have another txt file, useful.txt, with those IDs that were useful in an analysis I did. It looks the same, one column with several rows of IDs, but with less IDS, only 5.
rs9349407
rs10948363
rs9271192
rs11
Problem:I want to generate a new txt file with the non-useful ones (the ones that appear in allofthem.txt but not in useful.txt).
I want to do the inverse of:
grep -f useful.txt allofthem.txt
I want to use some systematic way of deleting all the IDs in useful and obtain a file with the remaining ones. Maybe with awk or sed, but I can´t see it. Can you help me, please? Thanks in advance!
Desired output:
rs382216
rs11168036
rs9296559
rs11771145
rs11767557
-v option does the inverse for you:
grep -vxf useful.txt allofthem.txt > remaining.txt
-x option matches the whole line in allofthem.txt, not parts.
As #hek2mgl rightly pointed out, you need -F if you want to treat the content of useful.txt as strings and not patterns:
grep -vxFf useful.txt allofthem.txt > remaining.txt
Make sure your files have no leading or trailing white spaces - they could affect the results.
I recommend to use awk:
awk 'FNR==NR{patterns[$0];next} $0 in patterns' useful.txt allofthem.txt
Explanation:
FNR==NR is true as long as we are reading useful.txt. We create an index in patterns for every line of useful.txt. next stops further processing.
$0 in patterns runs, because of the previous next statement, on every line of allofthem.txt. It checks for every line of that file if it is a key in patterns. If that checks evaluates to true awk will print that line.

sed delete lines from a logfile that respect numbers in another file

I have a logfile that is starting to grow in size, and I need to remove certain lines that match a given pattern from it. I used grep -nr for extracting the target lines and copied them in a temp file, but I can't figure how can I tell sed to delete those lines from the log file.
I have found something similar here: Delete line from text file with line numbers from another file but this doesn't actually delete the lines, it only prints the wanted output.
Can anyone give me a hint?
Thank you!
I think, what you really need is sed -i '/pattern/d' filename.
But to answer your question:
How to delete lines matching the line numbers from another file:
(Assuming that there are no special characters in the line_numbers file, just numbers one per line...)
awk 'NR==FNR{a[$0]=1; next}; !(FNR in a)' line_numbers input.log
If you already have a way of printing what you want to standard output, there's no reason why you can't just overwrite the original file. For example, to only print lines that don't match a pattern, you could use:
grep -v 'pattern' original > tmp && mv tmp original
This redirects the output of the grep command to a temporary file, then overwrites the original file. Any other solution that does this "in-place" is only pretending to do so, after all.
There are numerous other ways to do this, using sed as suggested in the comments, or awk:
awk '!/pattern/' original > tmp && mv tmp original
If you want to use sed and your file is growing continuously, then you will have to execute sed -i '/REGEX/d' FILENAME more frequently.
Instead, you can make use of syslog-ng. You just have to edit the /etc/syslog-ng/syslog-ng.conf, wherein you need to create/edit an appropriate filter (somewhat like: f_example { not match(REGEX); }; ), save file, restart the service and you're done.
The messages containing that particular pattern will not be dumped in the log file. In this way, your file would not only stop growing, but also you need not process it periodically using sed or grep.
Reference
To remove a line with sed, you can do:
sed "${line}d" <originalLogF >tmpF
If you want remove several lines, you can pass a sed script. Here I delete the first and the second lines:
sed '1d;2d' <originalLogF >tmpF
If your log file is big, you probably have two pass. The first one to generate the sed script in a file, and a second one to apply the sed script. But it will be more efficient to have only one pass if you be able to recognize the pattern directly (and do not use "${line}d" at all). See Tom Fenech or anishsane answers, I think it is what you really need.
By the way you have to preserve the inode (not only the file name) because most of logger keep the file opened. So the final command (if you don't use sed -i) should be:
cat tmpF >originalLogF
By the way, the "-i" option (sed) is NOT magic, sed will create a temporary buffer, so if we have concurrent append to the log file, you can loose some lines.

Using sed to replace the first instance of an entire line beginning with string

I am attempting to write a bash script that will use sed to replace an entire line in a text file beginning with a given string, and I only want it to perform this replacement for the first match.
For example, in my text file I may have:
hair=brown
age=25
eyes=blue
age=35
weight=177
And I may want to simply replace the first occurrence of a line beginning with "age" with a different number without affecting the 2nd instance of age:
hair=brown
age=55
eyes=blue
age=35
weight=177
So far, I've come up with
sed -i "0,/^PATTERN/s/^PATTERN/PATTERN=XY/" test.txt
but this will only replace the string "age" itself rather than the entire line. I've been trying to throw a "\c" in there somewhere to change the entire line but nothing is working so far. Does anyone have any ideas as to how this can be resolved? Thanks.
Like #ruakh suggests, you can use
sed -i "0,/^PATTERN/ s/^PATTERN=.*$/PATTERN=XY/" test.txt
A shorter and less repetitive way of doing the same would be
sed -i '0,/^\(PATTERN=\).*/s//\1XY/' test.txt
which takes advantage of backreferences and the fact that not specifying a pattern in an s-expression will use the previously matched pattern.
0,...-ranges only work in GNU sed. An alternative might be to use shell redirect with sed:
{ sed '/^\(PATTERN\).*/!n; s//\1VAL;q'; cat ;} < file
or use awk:
awk '$1=="LABEL" && !n++ {$2="VALUE"}1' FS=\\= OFS=\\= file

How do I alter the n-th line in multiple files using SED?

I have a series of text files that I want to convert to markdown. I want to remove any leading spaces and add a hash sign to the first line in every file. If I run this:
sed -i.bak '1s/ *\(.*\)/\#\1/g' *.md
It alters the first line of the first file and processes them all, leaving the rest of the files unchanged.
What am I missing that will search and replace something on the n-th line of multiple files?
Using bash on OSX 10.7
The problem is that sed by default treats any number of files as a single stream, and thus line-number offsets are relative to the start of the first file.
For GNU sed, you can use the -s (--separate) flag to modify this behavior:
sed -s -i.bak '1s/^ */#/' *.md
...or, with non-GNU sed (including the one on Mac OS X), you can loop over the files and invoke once per each:
for f in *.md; do sed -i.bak '1s/^ */#/' "$f"; done
Note that the regex is a bit simplified here -- no need to match parts of the line that you aren't going to change.
XARgs will do the trick for you:
http://en.wikipedia.org/wiki/Xargs
Remove the *.md from the end of your sed command, then use XArgs to gather your files one at a time and send them to your sed command as a single entity, sorry I don't have time to work it out for you but the wikiPedia article should show you what you need to know.
sed -rsi.bak '1s/^/#/;s/^[ \t]+//' *.md
You don't need g(lobally) at the end of the command(s), because you wan't to replace something at the begin of line, and not multiple times.
You use two commands, one to modify line 1 (1s...), seperated from the second command for the leading blanks (and tabs? :=\t) with a semicolon. To remove blanks in the first line, switch the order:
sed -rsi.bak 's/^[ \t]+//;1s/^/#/' *.md
Remove the \t if you don't need it. Then you don't need a group either:
sed -rsi.bak 's/^ +//;1s/^/#/' *.md
-r is a flag to signal special treatment of regular expressions. You don't need to mask the plus in that case.

Resources