Multiline grep with specific text - bash

There is an xml file with lot of <A_tag>-s in it.
I need to see those A tags (and their children, so the tags' whole content) that have at least one <C_tag>.
So this block should match (therefore contained in the result):
<A_tag>
...
...
<C_tag attr1="" ... attrn="" />
...
</A_tag>
I tried using pcregrep, but I don't know how to tell any block ending, that is longer than 1 character (and </A_tag> is longer than that, but for instance [^>] regexp would be easy for me too).
I also tried awk, but couldn't manage the goal with it either.
If someone experienced would help me, please make your command separate the found blocks with an empty line too, with that I could learn more.

Following up on the xmllint comment:
xmllint --xpath '(//A_tag/C_tag/..)' x.xml
Will look for C_TAG under A_TAG, and then display the parent A_TAG.
Output:
<A_tag>
<C_tag attr1="" attrn=""/>
</A_tag>

Yeah, well in my case, this was the solution:
xmllint --shell x.xml <<< 'xpath //A_tag//C_tag/ancestor::A_tag'
It's because my xmllint version doesn't support --xpath option.
Also, C_tag could be any descendant of A_tag, not just direct child (which I didn't clarify in question).
However, the answer of dash-o seems to be correct.
My only problem is that this xml file I'm working with contains 4.5 million lines, where xmllint turned out to be slow - as it does parse the file.
If you have a more general solution that works with awk or pcregrep, please share with me. They would be good here as they just work with patterns.
Otherwise I'll accept the original answer tomorrow.

If the file is pretty-printed (or follow similar rules), possible to write small awk script, and only acts on the a_tag and c_tag lines:
awk '
/<A_tag>/ { in_a=$0 ; c="" ; next }
in_a { in_a = in_a RS $0}
/<C_tag/ { c=$0 ; next }
/<\/A_tag>/ { if ( in_a && c ) { print in_a ; in_a="" ; c=""} }
' x.xml

Related

remove line in csv file if string found (from another text file) in bash

Due to a power failure issue, I am having to clean up jobs which are run based on text files. So the problem is, I have a text file with strings like so (they are uuids):
out_file.txt (~300k entries)
<some_uuidX>
<some_uuidY>
<some_uuidZ>
...
and a csv like so:
in_file.csv (~500k entries)
/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location3/,<some_uuidX>.json.<some_string3>
/path/to/some/location4/,<some_uuidY>.json.<some_string4>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
/path/to/some/location6/,<some_uuidZ>.json.<some_string6>
...
I would like to remove lines from out_file for entries which match in_file.
The end result:
/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
...
Since the file sizes are fairly large, I was wondering if there is an efficient way to do it in bash.
any tips would be geat.
Here is a potential grep solution:
grep -vFwf out_file.txt in_file.csv
And a potential awk solution (likely faster):
awk -F"[,.]" 'FNR==NR { a[$1]; next } !($2 in a)' out_file.txt in_file.csv
NB there are caveats to each of these approaches. Although they both appear to be suitable for your intended purpose (as indicated by your comment "the numbers add up correctly"), posting a minimal, reproducible example in future questions is the best way to help us help you.

Counting char in word with different delimiter

I am writing a shell script, in which I get the location of java via which java. As response I get (for example)
/usr/pi/java7_32/jre/bin/java.
I need the path to be cut so it ends with /jre/, more specificly
/usr/pi/java7_32/jre/
as the programm this information is provided to can not handle the longe path to work.
I have used cut with the / as delimiter and as I thought that the directory of the Java installation is always the same, therfore a
cut -d'/' -f1-5
worked just fine to get this result:
/usr/pi/java7_32/jre/
But as the java could be installed somewhere else aswell, for example at
/usr/java8_64/jre/
the statement would not work correctly.
I need tried sed, awk, cut and different combinations of them but found no answer I liked.
As the title says I would count the number of appereance of the car / until the substing jre/ is found under the premisse that the shell counts from the left to the right.
The incremented number would be the the field I want to see by cutting with the delimiter.
path=$(which java) # example: /usr/pi/java7_32/jre/bin/java
i=0
#while loop with a statment which would go through path
while substring != jre/ {
if (char = '/')
i++
}
#cut the path
path=$path | cut -d'/' -f 1-i
#/usr/pi/java7_32/jre result
Problem is the eventual difference in the path before and after
/java7_64/jre/, like */java*/jre/
I am open for any ideas and solutions, thanks a lot!
Greets
Jan
You can use the shell's built-in parameter operations to get what you need. (This will save the need to create other processes to extract the information you need).
jpath="$(which java)"
# jpath now /usr/pi/java7_32/jre/bin/java
echo ${jpath%jre*}jre
produces
/usr/pi/java7_32/jre
The same works for
jpath=/usr/java8_64/jre/
The % indicates remove from the right side of the string the matching shell reg-ex pattern. Then we just put back jre to have your required path.
You can overwrite the value from which java
jpath=${jpath%jre*}jre
IHTH
You can get the results with grep:
path=$(echo $path | grep -o ".*/jre/")

Issue with bash script using SED/AWK for substituion

I have been working on this little script at work to free up my own time and am currently stuck on part of it. The script is supposed to pull some content from a JSON, modify the content, and then re-upload it. The modification part is the portion that doesn't work.
An example of what the content looks like after being extracted from the JSON is:
<p>App1_v1.0_20160911_release.apk</p<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
The modification function is supposed to update the list with the newer app filenames in the same location. I've tried using both SED and AWK to get this to work but I haven't gotten anywhere fast.
Here are examples of both commands and the parameters for the substitution I am trying to run on the example file:
old_name=App1_.*_release.apk
new_name=App1_v1.0_20160920_1152_release.apk
sed "s/$old_name/$new_name/" body > upload
awk -v oldname="$old_name" -v newname="$new_name" '{sub(oldname, newname)}1' body > upload
What ends up happening is the substitution will change the correct part of the list, but then nuke everything between that point and the end of the list.
Thank you for any and all help.
PS: If I didn't explain something correctly or you feel some information is missing, please comment and let me know so I can better explain the problem.
There are SO many possible values of oldname, newname, and your input data that could cause either of the commands you wrote to fail - don't use that "replace a regexp with a backreference-enabled-string" approach in any command, use string operations instead (which means you can't use sed since sed doesn't support strings)
This modifies your sample input as you say you want:
$ awk -v new='App1_v1.0_20160920_1152_release.apk' 'BEGIN{RS="</p>\n?"; FS=OFS="<p>"} NR==1{$2=new} {printf "%s%s", $0, RT}' file
<p>App1_v1.0_20160920_1152_release.apk<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
If that's not adequate then edit your question to better explain your requirements and provide more truly representative sample input/output.
The above uses GNU awk for multi-char RS and RT.

Update var in ini file using bash

I am attempting to write a bash script to configure various aspects of a server. The context here is replacing a value of a variable in a conf file (ini format) with another value.
The context is
[ssh-iptables]
enabled = false
And I simply need to change false to true.
Typically I'd just do this with a simple bit of sed
sed -i 's/^enabled = false/enabled = true/g' /etc/fail2ban/jail.conf
But enabled = false exists in multiple places.
I've tried using awk with no success
awk -F ":| " -v v1="true" -v opt="enabled" '$1 == "[ssh-iptables]" && !f {f=1}f && $1 == opt{sub("=.*","= "v1);f=0}1' /etc/fail2ban/jail.conf
The above was sourced from this forum thread but I don't really have enough understanding of how to use it in scripts to make it work. All it seems to do is the equivalent of cat /etc/fail2ban/jail.conf
I have found a few other scripts which are considerably longer which isn't ideal as this will happen to loads of ini files so I'm hoping someone can help me correct the above code or point me in the right direction.
Apologies if this belongs on ServerFault, but as it's scripting rather than the intricacies of server configuration itself I figured here might be more apt.
Assuming your format is that there are no square-bracket lines (like [ssh-iptables]) within sections, I would use your solution above (with sed) but restrict the query to within that block like so:
sed -i '/^\[ssh-iptables\]$/,/^\[/ s/^enabled = false/enabled = true/' /etc/fail2ban/jail.conf
The extra part at the beginning tells the following substitution statement to only run between the line that is [ssh-iptables] and the next one that starts with a [. It uses two regular expressions separated by a comma which indicate the bounds.
If you are open to use external applications, you could be interested into the use of crudini.
Example:
[oauth2provider]
module = SippoServiceOAuth2Provider
backend[] = none
wiface = public
; [calldirection]
; module = SippoServiceCallDirection
; backend[] = none
; wiface = internal
A standard grep will not filter commented exceptions.
With crudini things for consulting, setting and modify are easier:
$ crudini --get /myproject/config/main.ini oauth2provider wiface
public
$ crudini --get /myproject/config/main.ini calldirection wiface
Section not found: calldirection
I was on a bash-only app and moved to this approach. Just a suggestion.
Regards,
You might consider using m4 instead of sed in this case. This uses variable replacement and I think keeps the file looking readable. Your m4 template might look like this:
[ssh-iptables]
enabled=SSH_IPTABLES_ENABLED
Now, you call m4 with the following parameters (which can be called from a bash script):
m4 -DSSH_IPTABLES_ENABLED=true input.m4 > output.ini
or:
m4 -DSSH_IPTABLES_ENABLED=false input.m4 > output.ini
This is an overly simple way of using m4, if you read about it you'll find you can do some really nifty things (this is the infrastructure upon which autoconf/automake was initially designed).
awk '/^[ssh-iptables/ {ok=1}
ok==1 && $0="enabled = false" {print " enabled = true"; ok=0 ; next}
{print $0} ' infile > tmp
mv tmp infile

Replace last line of XML file

Looking for help creating a script that will replace the last line of an XML file with a tag. I have a few hundred files so I'm looking for something that will process them in a loop. I've managed to rename the files sequentially like this:
posts1.xml
posts2.xml
posts3.xml
etc...
to make it easier to loop through. But I have no idea how to write a script to do this. I'm open to using either Linux or Windows (but i would guess that Linux is better for this kind of task).
So if you want to append a line to every file:
sed -i '$a<YOUR_SHINY_NEW_TAG>' *xml
To replace the last line:
sed -i '$s/.*/<YOUR_SHINY_NEW_TAG>/' *xml
But do note, sed is not the ideal tool to modify xml.
XMLStarlet is a command-line toolkit for performing XML parsing and manipulations. Note that as an XML-aware toolkit, it'll respect XML structure, character encoding and entity substitution.
Check out the ed command to see how to modify documents. You can wrap this in a standard bash loop.
e.g. in a doc consisting of a chain of <elem>s, you can add a following <added>5</added>:
mkdir new
for x in *.xml; do
xmlstarlet ed -a "//elem[count(//elem)]" -t elem -n added -v 5 $x > new/$x
done
Linux way using sed:
To edit the last line of the file in place, you can use sed:
sed -i '$s_pattern_replacement_' filename
To change the whole line to "replacement" use $s_.*_replacement_. Be sure to escape any _'s in replacement with a \.
To loop over files, just use for:
for f in /path/posts*.xml; do sed -i '$s_.*_replacement_' $f; done
This, however, is a dirty way as it's not aware of the XML structure, whereas the XML structure is not affected by newlines. You have to be sure the last line of the files contains exactly what you expect it to.
It makes little to no difference whether you're on Linux, Windows or MacOS
The question is what language do you want to use?
The following is an example in c# (not optimized, but read it as speudocode):
string rootDirectory = #"c:\myfiles";
var files = Directory.GetFiles(rootDirectory, "*.xml");
foreach (var file in files)
{
var lines = File.ReadAllLines(file);
lines[lines.Length - 1] = "whatever you want here";
File.WriteAllLines(file, lines);
}
You can compile this and run it on Windows, Linux, etc..
Or you could do the same in Python.
Of course this method does not actually parse the XML,
but you just wanted to replace the last line right?

Resources