Looking for help creating a script that will replace the last line of an XML file with a tag. I have a few hundred files so I'm looking for something that will process them in a loop. I've managed to rename the files sequentially like this:
posts1.xml
posts2.xml
posts3.xml
etc...
to make it easier to loop through. But I have no idea how to write a script to do this. I'm open to using either Linux or Windows (but i would guess that Linux is better for this kind of task).
So if you want to append a line to every file:
sed -i '$a<YOUR_SHINY_NEW_TAG>' *xml
To replace the last line:
sed -i '$s/.*/<YOUR_SHINY_NEW_TAG>/' *xml
But do note, sed is not the ideal tool to modify xml.
XMLStarlet is a command-line toolkit for performing XML parsing and manipulations. Note that as an XML-aware toolkit, it'll respect XML structure, character encoding and entity substitution.
Check out the ed command to see how to modify documents. You can wrap this in a standard bash loop.
e.g. in a doc consisting of a chain of <elem>s, you can add a following <added>5</added>:
mkdir new
for x in *.xml; do
xmlstarlet ed -a "//elem[count(//elem)]" -t elem -n added -v 5 $x > new/$x
done
Linux way using sed:
To edit the last line of the file in place, you can use sed:
sed -i '$s_pattern_replacement_' filename
To change the whole line to "replacement" use $s_.*_replacement_. Be sure to escape any _'s in replacement with a \.
To loop over files, just use for:
for f in /path/posts*.xml; do sed -i '$s_.*_replacement_' $f; done
This, however, is a dirty way as it's not aware of the XML structure, whereas the XML structure is not affected by newlines. You have to be sure the last line of the files contains exactly what you expect it to.
It makes little to no difference whether you're on Linux, Windows or MacOS
The question is what language do you want to use?
The following is an example in c# (not optimized, but read it as speudocode):
string rootDirectory = #"c:\myfiles";
var files = Directory.GetFiles(rootDirectory, "*.xml");
foreach (var file in files)
{
var lines = File.ReadAllLines(file);
lines[lines.Length - 1] = "whatever you want here";
File.WriteAllLines(file, lines);
}
You can compile this and run it on Windows, Linux, etc..
Or you could do the same in Python.
Of course this method does not actually parse the XML,
but you just wanted to replace the last line right?
Related
I have the following Linux command which I am using to extract data from one very large log file.
sed -n "/<trade>/,/<\/trade>/p" Large.log > output.xml
However, the output is generated in a single file output.xml. My intention is to create a new file every time the "/<trade>/,/<\/trade>/p" is matched. Every new file will be named after the <id> tag which is inside the <trade> </trade> tags.
Something likes this...
sed -n "/<trade>/,/<\/trade>/p" Large.log > "/<id>/,/<\/id>/p".xml
However, that, of course, does not work and I am not sure how to apply a regex as a naming rule.
P.S At this point, I am also not sure if I should use sed or maybe I should try achieving this with awk
I have a pretty large .txt file with data (8MB) and the data lines are separated with the character F.
To analyze this data I need to replace the letter F with the Return command.
This is how my file looks:
-0.27, -0.21, 9.56, 78.86, 47.79, 0.02F0.07, -0.35, 9.47, 78.73, 47.74, 0.05F-0.20, -0.43, 10.60, 79.00, 47.79, 0.07F-0.49, -0.14, 10.44, 76.84, 47.70, 0.10.. and so on
This is how it should look:
-0.27, -0.21, 9.56, 78.86, 47.79, 0.02
0.07, -0.35, 9.47, 78.73, 47.74, 0.05
-0.20, -0.43, 10.60, 79.00, 47.79, 0.07
-0.49, -0.14, 10.44, 76.84, 47.70, 0.10
... and so on
I have a macOS and Windows available. Already tried it with Excel, but the file seems to be to large, Excel just crashes. Any advice?
Try EditPad Lite on Windows. It's a notepad, that is able to handle big files.
You have to enable regular expressions (search->search options) to work correctly. After that you can open the search and replace F with \r\n (new line operator).
You can use TextEdit on a Mac. Use the find and replace option. It is very fast in the test I tried. I used a 5 M file and it ran in a few seconds. Refer to the previous question in Ask Different 'How to use find and replace to replace a character with new line' to see how to get newlinein character in find and replace option.
In MacOS, give this a try.
Using translate characters command
tr F '\n' < input.txt > output.txt
The result will be stored in a separate file. If no new file needed, just remove > output.txt from the command, it will display the result in the console.
Using stream editor command
sed -i '' $'s/F/\\\n/g' test.txt
The sed command will do the same operation with the use of regex. This replace the contents in the original file. To create a backup of the file, give the extension in the argument i (Ex : -i '.backup' creates a file backup test.txt.backup).
For more info, do man tr and man sed in your mac terminal.
I want to find and replace the VALUE into a xml file :
<test name="NAME" value="VALUE"/>
I have to filter by name (because there are lot of lines like that).
Is it possible ?
Thanks for you help.
Since you tagged the question "bash", I assume that you're not trying to use an XML library (although I think an XML expert might be able to give you something like an XSLT processor command that solves this question very robustly), but that you're simply interested in doing search & replace from the commandline.
I am using perl for this:
perl -pi -e 's#VALUE#replacement#g' *.xml
See perlrun man page: Very shortly put, the -p switches perl into text processing mode, -i stands for "in-place", and -e let's you specify an expression to apply to all lines of input.
Also note (if you are not too familiar with that already) that you may use other characters than # (common ones are %, a comma, etc.) that don't clash with your search & replacement strings.
There is one small caveat: perl will read & write all files given on the commandline, even those that did not change. Thus, the files' modification times will be updated even if they did not change. (I usually work around that with some more shell magic, e.g. using grep -l or grin -l to select files for perl to work on.)
EDIT: If I understand your comments correctly, you also need help with the regular expression to apply. Let me briefly suggest something like this then:
perl -pi -e 's,(name="NAME" value=)"[^"]*",\1"NEWVALUE",g' *.xml
Related: bash XHTML parsing using xpath
You can use SED:
SED 's/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/' test.xml
where test.xml is the xml document containing the given node. This is very fragile, and you can work to make it more flexible if you need to do this substitution multiple times. For instance, the current statement is case sensitive, so it won't substitute the value on a node with the name="name", but you can add a case insensitivity flag to the end of the statement, like so:
('s/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/I').
Another option would be to use XSLT, but it would require you to download an external library. It's pretty versatile, and could be a viable option for more complex modifications to an XML document.
I am essentially trying to use sed to remove a few lines within a text document. To clean it up. But I'm not getting it right at all. Missing something and I have no idea what...
#!/bin/bash
items[0]='X-Received:'
items[1]='Path:'
items[2]='NNTP-Posting-Date:'
items[3]='Organization:'
items[4]='MIME-Version:'
items[5]='References:'
items[6]='In-Reply-To:'
items[7]='Message-ID:'
items[8]='Lines:'
items[9]='X-Trace:'
items[10]='X-Complaints-To:'
items[11]='X-DMCA-Complaints-To:'
items[12]='X-Abuse-and-DMCA-Info:'
items[13]='X-Postfilter:'
items[14]='Bytes:'
items[15]='X-Original-Bytes:'
items[16]='Content-Type:'
items[17]='Content-Transfer-Encoding:'
items[18]='Xref:'
for f in "${items[#]}"; do
sed '/${f}/d' "$1"
done
What I am thinking, incorrectly it seems, is that I can setup a for loop to check each item in the array that I want removed from the text file. But it's simply not working. Any idea. Sure this is basic and simple and yet I can't figure it out.
Thanks,
Marek
Much better to create a single sed script, rather than generate 19 small scripts in sequence.
Fortunately, generating a script by joining the array elements is moderately easy in Bash:
regex=$(printf '\|%s' "${items[#]}")
regex=${regex#'\|'}
sed "/^$regex/d" "$1"
(Notice also the addition of ^ to the final regex -- I assume you only want to match at beginning of line.)
Properly, you should not delete any lines from the message body, so the script should leave anything after the first empty line alone:
sed "1,/^\$/!b;/$regex/d" "$1"
Add -i if you want in-place editing of the target file.
i have a template, with a var LINK
and a data file, links.txt, with one url per line
how in bash i can substitute LINK with the content of links.txt?
if i do
#!/bin/bash
LINKS=$(cat links.txt)
sed "s/LINKS/$LINK/g" template.xml
two problem:
$LINKS has the content of links.txt without newline
sed: 1: "s/LINKS/http://test ...": bad flag in substitute command: '/'
sed is not escaping the // in the links.txt file
thanks
Use some better language instead. I'd write a solution for bash + awk... but that's simply too much effort to go into. (See http://www.gnu.org/manual/gawk/gawk.html#Getline_002fVariable_002fFile if you really want to do that)
Just use any language where you don't have to mix control and content text. For example in python:
#!/usr/bin/env python
links = open('links.txt').read()
template = open('template.xml').read()
print template.replace('LINKS', links)
Watch out if you're trying to force sed solution with some other separator - you'll get into the same problems unless you find something disallowed in urls (but are you verifying that?) If you don't, you already have another problem - links can contain < and > and break your xml.
You can do this using ed:
ed template.xml <<EOF
/LINKS/d
.r links.txt
w output.txt
EOF
The first command will go to the line
containing LINKS and delete it.
The second line will insert the
contents of links.txt on the current
line.
The third command will write the file
to output.txt (if you omit output.txt
the edits will be saved to
template.xml).
Try running sed twice. On the first run, replace / with \/. The second run will be the same as what you currently have.
The character following the 's' in the sed command ends up the separator, so you'll want to use a character that is not present in the value of $LINK. For example, you could try a comma:
sed "s,LINKS,${LINK}\n,g" template.xml
Note that I also added a \n to add an additional newline.
Another option is to escape the forward slashes in $LINK, possibly using sed. If you don't have guarantees about the characters in $LINK, this may be safer.