Extracting all lines from a file that are not commented out in a shell script - shell

I'm trying to extract lines from certain files that do not begin with # (commented out). How would I run through a file, ignore everything with a # in front of it, but copy each line that does not start with a # into a different file.
Thanks

Simpler: grep -v '^[[:space:]]*#' input.txt > output.txt

This assumes that you're using Unix/Linux shell and the available Unix toolkit of commands AND that you want to keep a copy of the original file.
cp file file.orig
mv file file.fix
sed '/^[ ]*#/d' file.fix > file
rm file.fix
Or if you've got a nice shiny new GNU sed that all be summarized as
cp file file.orig
sed -i '/^[ ]*#/d' file
In both cases, the regexp in the sed command is meant to be [spaceCharTabChar]
So you saying, delete any line that begins with an (optional space or tab chars) #, but print everything else.
I hope this helps.

grep -v ^\# file > newfile
grep -v ^\# file | grep -v ^$ > newfile
Not fancy regex, but I provide this method to Jr. Admins as it helps with understanding of pipes and redirection.

Related

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Bash script delete a line in the file

I have a file, which has multiple lines.
For example:
a
ab#
ad.
a12fs
b
c
...
I want to use sed or awk delete the line, if the line include symbols or numbers. (For example, I want to delete: ab#, ad., a12fs.... lines)
or in another words, I just want to keep the line which include [a-z][A-Z] .
I know how to delete number line,
sed '/[0-9]/d' file.txt
but I do not know how to delete symbols lines.
Or there has any easy way to do that?
To keep blank lines:
grep '^[[:alpha:]]*$' file
sed '/[^[:alpha:]]/d' file
awk '/^[[:alpha:]]*$/' file
To remove blank lines:
grep '^[[:alpha:]]+$' file
sed -E -n '/^[[:alpha:]]+$/p' file
awk '/^[[:alpha:]]+$/' file
grep works well too and is even simpler: just do the reverse: keep the lines that interest you, which are way easier to define
grep -i '^[a-z]*$' file.txt
(match lines containing only letters and empty lines, and -i option makes grep case-insensitive)
to remove empty lines as well:
grep -i '^[a-z]+$' file.txt
caution when using Windows text files, as there's a carriage return at the end of the line, so nothing would match depending on grep versions (tested on windows here and it works)
but just in case:
grep -iP '^[a-z]*\r?$'
(note the P option to enable perl expressions or \r is not recognized)
You can use this sed:
sed '/^[A-Za-z0-9]\+$/!d' file
(OR)
sed '/[^A-Za-z0-9]/d' file
$ awk '!/[^[:alpha:]]/' file.txt
a
b
c

how to remove <Technology> and </Technology> words from a file using shell script?

My text file contains 100 lines and the text file surely contains Technology and /Technology words .In which ,I want to remove Technology and /Technology words present in the file using shell scripting.
sed -i.bak -e 's#/Technology##g' -e 's#Technology##g' my_text_file
This is delete the words and also make a backup of the original file just in case
sed -i -e 's#/Technology##g' -e 's#Technology##g' my_text_file
This will not make a backup but just modify the original file
You can try this one.
sed -r 's/<[\/]*technology>//g' a
Here is an awk
cat file
This Technology
More here
Mine /Technology must go
awk '{gsub(/\/*Technology/,"")}1' file
This
More here
Mine must go
By adding an extra space in the regex, it will not leave an extra space in the output.
awk '{gsub(/\/*Technology /,"")}1' file
This Technology
More here
Mine must go
To write back to original file
awk '{gsub(/\/*Technology /,"")}1' file > tmp && mv tmp file
If you have gnu awk 4.1+ you can do
awk -i '{gsub(/\/*Technology /,"")}1' file

Remove Lines in Multiple Text Files that Begin with a Certain Word

I have hundreds of text files in one directory. For all files, I want to delete all the lines that begin with HETATM. I would need a csh or bash code.
I would think you would use grep, but I'm not sure.
Use sed like this:
sed -i -e '/^HETATM/d' *.txt
to process all files in place.
-i means "in place".
-e means to execute the command that follows.
/^HETATM/ means "find lines starting with HETATM", and the following d means "delete".
Make a backup first!
If you really want to do it with grep, you could do this:
#!/bin/bash
for f in *.txt
do
grep -v "^HETATM" "%f" > $$.tmp && mv $$.tmp "$f"
done
It makes a temporary file of the output from grep (in file $$.tmp) and only overwrites your original file if the command executes successfully.
Using the -v option of grep to get all the lines that do not match:
grep -v '^HETATM' input.txt > output.txt

sed command creates randomly named files

I recently wrote a script that does a sed command, to replace all the occurrences of "string1" with "string2" in a file named "test.txt".
It looks like this:
sed -i 's/string1/string2/g' test.txt
The catch is, "string1" does not necessarily exist in test.txt.
I notice after executing a bunch of these sed commands, I get a number of empty files, left behind in the directory, with names that look like this:
"sed4l4DpD"
Does anyone know why this might be, and how I can correct it?
-i is the suffix given to the new/output file. Also, you need -e for the command.
Here's how you use it:
sed -i '2' -e 's/string1/string2/g' test.txt
This will create a file called test.txt2 that is the backup of test.txt
To replace the file (instead of creating a new copy - called an "in-place" substitution), change the -i value to '' (ie blank):
sed -i '' -e 's/string1/string2/g' test.txt
EDIT II
Here's actual command line output from a Mac (Snow Leopard) that show that my modified answer (removed space from between the -i and the suffix) is correct.
NOTE: On a linux server, there must be no space between it -i and the suffix.
> echo "this is a test" > test.txt
> cat test.txt
this is a test
> sed -i '2' -e 's/a/a good/' test.txt
> ls test*
test.txt test.txt2
> cat test.txt
this is a good test
> cat test.txt2
this is a test
> sed -i '' -e 's/a/a really/' test.txt
> ls test*
test.txt test.txt2
> cat test.txt
this is a really good test
I wasn't able to reproduce this with a quick test (using GNU sed 4.2.1) -- but strace did show sed creating a file called sedJd9Cuy and then renaming it to tmp (the file named on the command line).
It looks like something is going wrong after sed creates the temporary file and before it's able to rename it.
My best guess is that you've run out of room in the filesystem; you're able to create a new empty file, but unable to write to it.
What does df . say?
EDIT:
I still don't know what's causing the problem, but it shouldn't be too difficult to work around it.
Rather than
sed -i 's/string1/string2/g' test.txt
try something like this:
sed 's/string1/string2/g' test.txt > test.txt.$$ && mv -f test.txt.$$ test.txt
Something is going wrong with the way sed creates and then renames a text file to replace your original file. The above command uses sed as a simple input-output filter and creates and renames the temporary file separately.
So after much testing last night, it turns out that sed was creating these files when trying to operate on an empty string. The way i was getting the array of "$string1" arguments was through a grep command, which seems to be malformed. What I wanted from the grep was all lines containing something of the type "Text here '.'".
For example the string, "Text here 'ABC.DEF'" in a file, should have been caught by grep, then the ABC.DEF portion of the string, would be substituted by ABC_DEF. Unfortunately the grep I was using would catch lines of the type "Text here ''" (that is, nothing between the ''). When later on, the script attempted to perform a sed replacement using this empty string, the random file was created (probably because sed died).
Thanks for all your help in understanding how sed works.
Its better if you do it in this way:
cat large_file | sed 's/string1/string2/g' > file_filtred

Resources