Need to replace integers with floating points in lines containing a specific string in an xml formatted file - bash

For reasons related to app functionality, we need to massage certain data incoming to a system by replacing an integer value with a fixed length decimal value
Example:
Before
<smile:ordinary code:type="Fields" code:value="25">
After
<smile:ordinary code:type="Fields" code:value="25.000000000">`
I had tried to used a sed command in place to replace with a regex group such as the one below
sed -i 's/\(ordinary.*"[0-9]\+\)/\1.000000000/'
This works fine but there's a file watcher that triggers when the file is modified and if it receives a well formatted file, it ends up adding an extra set of 0s
<smile:ordinary code:type="Fields" code:value="25.000000000.000000000">
I've also struggled to get this working with awk and printf but ideally, i'd replace the integer strictly with a decimal. I've considered using an xsl filter transform as well but I'm not quite as well versed there as with shell commands. I'm open to all suggestions including possibly writing a shell script to loop through each line I guess.

Very easily done in XSLT. It just needs a stylesheet with two rules, the standard identity template that copies elements unchanged by default plus a rule
<xsl:template match="smile:ordinary/#code:value">
<xsl:attribute name="code:value">
<xsl:value-of select="format-number(., '#.000000000')"/>
</xsl:attribute>
</xsl:template>
Plus the required namespace declarations, of course.

Related

How to remove multi-line blocks of text of varying sizes from a file given the first and last lines and a substring?

I have an xml file listing several games and their metadata, like so:
<?xml version="1.0"?>
<gameList>
<game>
<path>./Besiege.desktop</path>
<name>Besiege</name>
<desc>Long description of game</desc>
<releasedate>20150128T000000</releasedate>
<developer>Spiderling Studios</developer>
<publisher>Spiderling Studios</publisher>
<genre>Strategy</genre>
<players>1</players>
</game>
<A bunch of other entries>
<game>
<path>./67000.The Polynomial.txt</path>
<name>The Polynomial - Space of the music</name>
<desc>Long description of game</desc>
<releasedate>20101015T000000</releasedate>
<developer>Dmytry Lavrov</developer>
<publisher>Dmitriy Uvarov</publisher>
<genre>Shooter, Music</genre>
<players>1</players>
<favorite>true</favorite>
</game>
<Another bunch of entries>
</gameList>
I want to remove every entry that contains the substring ".desktop" and leave all the rest. But just removing the line which contains this string isn't enough, I want to remove the whole block from <game> to </game>.
I know that in Linux, with bash, there are several ways to remove a fixed number of lines before or after a given string. But by comparing the two entries above, you can see that they don't always have the same number of fields. The descriptions inside the "<desc>" tags also vary from one to four paragraphs separated by empty lines. I have not found any solutions that deal with a variable number of lines around a target substring.
I thought there would be an easy way to split the text into blocks from the opening <game> tag to the closing </game> tag so that I could operate on them in a similar way to how one normally does with lines, in which case a simple while loop that tested for the presence of the substring and deleted the block if true, or something similar, would solve my problem. Well, I've been banging my head against grep, sed and awk and I've tried to set a convenient value for IFS so that it would only end lines at "</game>" and I am growing increasingly frustrated because I'm almost at the point where it would have been faster to do this manually. But then I'd remain ignorant.
I'm only just beginning to learn Bash so there is so much that I don't know, and I feel like this is the sort of thing that someone more knowledgeable could do with a single-liner but I'm completely stumped. So thank you for your time and please point me in the right direction.
Do not use line tools to edit XML files. Do not use Bash to edit XML files. Use XML tools to edit XML files. Write a program in python or Perl or other capable programming language with an XML library to edit XML.
The following with xmlstarlet is quite simple:
$ xmlstarlet ed -d '/gameList/game[ contains(path, ".desktop") ]' input.xml
<?xml version="1.0"?>
<gameList>
<game>
<path>./67000.The Polynomial.txt</path>
<name>The Polynomial - Space of the music</name>
<desc>Long description of game</desc>
<releasedate>20101015T000000</releasedate>
<developer>Dmytry Lavrov</developer>
<publisher>Dmitriy Uvarov</publisher>
<genre>Shooter, Music</genre>
<players>1</players>
<favorite>true</favorite>
</game>
</gameList>

Use bash to extract data between two regular expressions while keeping the formatting

but I have a question about a small piece of code using the awk command. I have not found an answer/solution anywhere.
I am trying to parse an output file and extract all data between the 1st expression (including) ATOMIC and 2nd expression (excluding) Bond. This data is to be sent to a new file $1_geom. So far I have the following:
`awk '/ATOMIC/{flag=1;next}/Bond lengths in Bohr/{flag=0}flag' $1` >> $1_geom
This script will extract the correct data for me, but there are 2 problems:
The line ATOMICis not extracted with the data
The data is extracted and appended to a single line. I want the data to retain the formatting from the parsed file (5 columns, variable amount of lines). Please see attachment to see a visual. Visual Example Attachment. Is there a different way to append data (other than >>) so that I can keep formatting?
Any help is appreciated, thank you.
The next is causing the first match to be skipped; take it out if you don't want that.
The backticks by themselves are a shell syntax error (unless your Awk script happens to produce valid shell commands). I'm guessing you have a useless echo or something like that in your actual script which disarms the error, but instead produces the symptoms you describe.
This was part of a code in a csh script and I did have an "echo" in front of this line. Removing the "echo" makes it work perfectly and addresses the 2 questions that I had.

XSLT sort Alphabetically then Numerically

By default xslt sorts numerically then alphabetically.
What if I want alphabetically first then numerically.
<root>
<item>B3</item>
<item>A1</item>
<item>C2</item>
<item>3B</item>
<item>2C</item>
<item>1A</item>
</root>
I'd want:
<root>
<item>A1</item>
<item>B3</item>
<item>C2</item>
<item>1A</item>
<item>2C</item>
<item>3B</item>
</root>
The thing is I don't know how long or how many letters numbers are in the names. It could be 1054-FS or C104-G. Also C20-H should comme before C101-H.
Is that something easy to achieve without knowledge of what will be pushed through ?
Thanks.
I think that what you are saying is that you want a collation sequence in which letters precede digits.
If you want C20 to precede C101 then you are also looking for a collation that groups consecutive digits and sorts them as numbers.
It's incorrect to say that XSLT (always) sorts digits before letters. The default collation depends on the XSLT processor you are using, it's not defined in the language specification.
In XSLT 3.0, and in Saxon 9.6, you can use the Unicode Collation Algorithm to achieve this effect. You would write
<xsl:sort select="..."
collation="http://www.w3.org/2013/collation/UCA?numeric=yes;reorder=Latn,digit"/>
(I haven't tested this specific example).
If you're using some other processor, you'll have to check its documentation to see what collation options it provides. You can sometimes play tricks for example by using <xsl:sort select="translate(....)"/> to compute a sort key that works the way you want.

Using sed to modify line not containing string

I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I donĀ“t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources