use sed with for loop to delete lines from text file - bash

I am essentially trying to use sed to remove a few lines within a text document. To clean it up. But I'm not getting it right at all. Missing something and I have no idea what...
#!/bin/bash
items[0]='X-Received:'
items[1]='Path:'
items[2]='NNTP-Posting-Date:'
items[3]='Organization:'
items[4]='MIME-Version:'
items[5]='References:'
items[6]='In-Reply-To:'
items[7]='Message-ID:'
items[8]='Lines:'
items[9]='X-Trace:'
items[10]='X-Complaints-To:'
items[11]='X-DMCA-Complaints-To:'
items[12]='X-Abuse-and-DMCA-Info:'
items[13]='X-Postfilter:'
items[14]='Bytes:'
items[15]='X-Original-Bytes:'
items[16]='Content-Type:'
items[17]='Content-Transfer-Encoding:'
items[18]='Xref:'
for f in "${items[#]}"; do
sed '/${f}/d' "$1"
done
What I am thinking, incorrectly it seems, is that I can setup a for loop to check each item in the array that I want removed from the text file. But it's simply not working. Any idea. Sure this is basic and simple and yet I can't figure it out.
Thanks,
Marek

Much better to create a single sed script, rather than generate 19 small scripts in sequence.
Fortunately, generating a script by joining the array elements is moderately easy in Bash:
regex=$(printf '\|%s' "${items[#]}")
regex=${regex#'\|'}
sed "/^$regex/d" "$1"
(Notice also the addition of ^ to the final regex -- I assume you only want to match at beginning of line.)
Properly, you should not delete any lines from the message body, so the script should leave anything after the first empty line alone:
sed "1,/^\$/!b;/$regex/d" "$1"
Add -i if you want in-place editing of the target file.

Related

Substitution of substring doesn't work in bash (tried sed, ${a/b/c/})

Before to write, of course I read many other similar cases. Example I used #!/bin/bash instead of #!/bin/sh
I have a very simple script that reads lines from a template file and wants to replace some keywords with real data. Example the string <NAME> will be replaced with a real name. In the example I want to replace it with the word Giuseppe. I tried 2 solutions but they don't work.
#!/bin/bash
#read the template and change variable information
while read LINE
do
sed 'LINE/<NAME>/Giuseppe' #error: sed: -e expression #1, char 2: extra characters after command
${LINE/<NAME>/Giuseppe} #error: WORD(*) command not found
done < template_mail.txt
(*) WORD is the first word found in the line
I am sorry if the question is too basic, but I cannot see the error and the error message is not helping.
EDIT1:
The input file should not be changed, i want to use it for every mail. Every time i read it, i will change with a different name according to the receiver.
EDIT2:
Thanks your answers i am closer to the solution. My example was a simplified case, but i want to change also other data. I want to do multiple substitutions to the same string, but BASH allows me only to make one substitution. In all programming languages i used, i was able to substitute from a string, but BASH makes this very difficult for me. The following lines don't work:
CUSTOM_MAIL=$(sed 's/<NAME>/Giuseppe/' template_mail.txt) # from file it's ok
CUSTOM_MAIL=$(sed 's/<VALUE>/30/' CUSTOM_MAIL) # from variable doesn't work
I want to modify CUSTOM_MAIL a few times in order to include a few real informations.
CUSTOM_MAIL=$(sed 's/<VALUE1>/value1/' template_mail.txt)
${CUSTOM_MAIL/'<VALUE2>'/'value2'}
${CUSTOM_MAIL/'<VALUE3>'/'value3'}
${CUSTOM_MAIL/'<VALUE4>'/'value4'}
What's the way?
No need to do the loop manually. sed command itself runs the expression on each line of provided file:
sed 's/<NAME>/Giuseppe/' template_mail.txt > output_file.txt
You might need g modifier if there are more appearances of the <NAME> string on one line: s/<NAME>/Giuseppe/g

Running sed ON a variable in bash script

Apologies for a seemingly inane question. But I have spent the whole day trying to figure it out and it drives me up the walls. I'm trying to write a seemingly simple bash script that would take a list of files in the directory from ls, replace part of the file names using sed, get unique names from the list and pass them onto some command. Like so:
inputs=`ls *.ext`
echo $inputs
test1_R1.ext test1_R2.ext test2_R1.ext test2_R2.ext
Now I would like to put it through sed to replace 1.ext and 2.ext with * to get test1_R* etc. Then I'd like to remove resulting duplicates by running sort -u to arrive to the following $outputs variable:
echo $outputs
test1_R* test2_R*
And pass this onto a command, like so
cat $outputs
I can do something like this in a command line:
ls *.ext | sed s/..ext/\*/g | sort -u
But if I try to assign the above to a variable in the script it just returns the output from the ls. I have tried several ways to do it: including the whole pipe in the script. Running each command separately and assigning it to a variable, then passing that variable to the next command and writing the outputs to files then passing the file to the next command. But so far none of this managed to achieve what I aimed to. I think my problem lies in (except general cluelessness aroung bash scripting) inability to run seq on a variable within script. There seems to be a lot of advice around in how to pass variables to pattern or replacement string in sed, but they all seem to take files as input. But I understand that it might not be the proper way of doing it anyway. Therefore I would really appreciate if someone could suggest an elegant way to achieve, what I'm trying to.
Many thanks!
Update 2/06/2014
Hi Barmar, thanks for your answer. Can't say it solved the problem, but it helped pin-pointing it. Seems like the problem is in me using the asterisk. I have to say, I'm very puzzled. The actual file names I've got are:
test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz
If I'm using the code you suggested, which seems to me the right way do to it:
ins=$(ls *.fastq.gz | sed 's/..fastq.gz/\*/g' | sort -u)
Sed doesn't seem to do anything and I'm getting the output of ls:
test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz
Now if I replace that backslash with anything else, the sed works, but it also returns whatever character I'm putting in front (or after) the asteriks:
ins=$(ls *.fastq.gz | sed 's/..fastq.gz/"*/g' | sort -u)
test1_R"* test2_R"*
That's odd enough, but surely I can just put an "R" in front of the asteriks and then replace R in the search pattern string, right? Wrong! If I do that whichever way: 's/R..fastq.gz/R*/g' 's/...fastq.gz/R*/g' 's/[A-Z]..fastq.gz/R*/g' I'm back to the original names! And even if I end up with something like test1_RR* test2_RR* and try to run it through sed again and replace "_R" for "_" or "RR" for "R", I'm having no luck and I'm back to the original names. And yet I can replace the rest of the file name no problem, just not to get me test1_R* I need.
I have a feeling I should be escaping that * in some very clever way, but nothing I've tried seems to work. Thanks again for your help!
This is how you capture the result of the whole pipeline in a variable:
var=$(ls *.ext | sed s/..ext/\*/g | sort -u)

Remove block-comments from a file with a bash script

There is a way to remove from a file all rows wrapped between /* and */ using a bash script?
I use percona to generate a sql script to syncronize two databases, a development one to a production one. Percona generates a well formatted SQL script but full of comments which make increase file size. So, just to make easier upload operation I'd prefer to remove all the unnecessary.
EDIT ON January 10th
I solved with this code:
sed -r ':a; s%(.*)/\*.*\*/%\1%; ta; /\/\*/ !b; N; ba' <FILE_TO_CLEAN>
thanks all
Using sed:
sed '/\/\*.*\*\// d; /\/\*/,/\*\// d' file
The command d tells sed to delete patterns matching the preceeding expression. The first expression /\/\*.*\*\// matches one-line comments, the second one /\/\*/,/\*\// comments that range multiple lines (this is implied by the ,).
I don't know if this works 100%, but as far as I tried, it did the job.
-Try this script- it should help removing the comments, since are the same as C++ Here you can see another sed example to remove HTML comments

Replace last line of XML file

Looking for help creating a script that will replace the last line of an XML file with a tag. I have a few hundred files so I'm looking for something that will process them in a loop. I've managed to rename the files sequentially like this:
posts1.xml
posts2.xml
posts3.xml
etc...
to make it easier to loop through. But I have no idea how to write a script to do this. I'm open to using either Linux or Windows (but i would guess that Linux is better for this kind of task).
So if you want to append a line to every file:
sed -i '$a<YOUR_SHINY_NEW_TAG>' *xml
To replace the last line:
sed -i '$s/.*/<YOUR_SHINY_NEW_TAG>/' *xml
But do note, sed is not the ideal tool to modify xml.
XMLStarlet is a command-line toolkit for performing XML parsing and manipulations. Note that as an XML-aware toolkit, it'll respect XML structure, character encoding and entity substitution.
Check out the ed command to see how to modify documents. You can wrap this in a standard bash loop.
e.g. in a doc consisting of a chain of <elem>s, you can add a following <added>5</added>:
mkdir new
for x in *.xml; do
xmlstarlet ed -a "//elem[count(//elem)]" -t elem -n added -v 5 $x > new/$x
done
Linux way using sed:
To edit the last line of the file in place, you can use sed:
sed -i '$s_pattern_replacement_' filename
To change the whole line to "replacement" use $s_.*_replacement_. Be sure to escape any _'s in replacement with a \.
To loop over files, just use for:
for f in /path/posts*.xml; do sed -i '$s_.*_replacement_' $f; done
This, however, is a dirty way as it's not aware of the XML structure, whereas the XML structure is not affected by newlines. You have to be sure the last line of the files contains exactly what you expect it to.
It makes little to no difference whether you're on Linux, Windows or MacOS
The question is what language do you want to use?
The following is an example in c# (not optimized, but read it as speudocode):
string rootDirectory = #"c:\myfiles";
var files = Directory.GetFiles(rootDirectory, "*.xml");
foreach (var file in files)
{
var lines = File.ReadAllLines(file);
lines[lines.Length - 1] = "whatever you want here";
File.WriteAllLines(file, lines);
}
You can compile this and run it on Windows, Linux, etc..
Or you could do the same in Python.
Of course this method does not actually parse the XML,
but you just wanted to replace the last line right?

bash templating

i have a template, with a var LINK
and a data file, links.txt, with one url per line
how in bash i can substitute LINK with the content of links.txt?
if i do
#!/bin/bash
LINKS=$(cat links.txt)
sed "s/LINKS/$LINK/g" template.xml
two problem:
$LINKS has the content of links.txt without newline
sed: 1: "s/LINKS/http://test ...": bad flag in substitute command: '/'
sed is not escaping the // in the links.txt file
thanks
Use some better language instead. I'd write a solution for bash + awk... but that's simply too much effort to go into. (See http://www.gnu.org/manual/gawk/gawk.html#Getline_002fVariable_002fFile if you really want to do that)
Just use any language where you don't have to mix control and content text. For example in python:
#!/usr/bin/env python
links = open('links.txt').read()
template = open('template.xml').read()
print template.replace('LINKS', links)
Watch out if you're trying to force sed solution with some other separator - you'll get into the same problems unless you find something disallowed in urls (but are you verifying that?) If you don't, you already have another problem - links can contain < and > and break your xml.
You can do this using ed:
ed template.xml <<EOF
/LINKS/d
.r links.txt
w output.txt
EOF
The first command will go to the line
containing LINKS and delete it.
The second line will insert the
contents of links.txt on the current
line.
The third command will write the file
to output.txt (if you omit output.txt
the edits will be saved to
template.xml).
Try running sed twice. On the first run, replace / with \/. The second run will be the same as what you currently have.
The character following the 's' in the sed command ends up the separator, so you'll want to use a character that is not present in the value of $LINK. For example, you could try a comma:
sed "s,LINKS,${LINK}\n,g" template.xml
Note that I also added a \n to add an additional newline.
Another option is to escape the forward slashes in $LINK, possibly using sed. If you don't have guarantees about the characters in $LINK, this may be safer.

Resources