I have hundreds of xml files to process - some have a particular desired tag, some don't. If I just add the tag to all files then some files get 2 tags (no surprises there!). How do I do it in xmlstarlet without a clumsy grep to select the files to work on? eg:
I have this in some files:
...
<parent_tag>
<another_tag/> <-- but not in all files
</parent_tag>
I want this (but some files already have it):
...
<parent_tag>
<good_tag>foobar</good_tag>
<another_tag/>
</parent_tag>
eg this works but I wish I could do it entirely in xmlstarlet without the grep:
grep -L good_tag */config.xml | while read i; do
xmlstarlet ed -P -S -s //parent_tag -t elem -n good_tag -v "" $i > tmp || break
\cp tmp $i
done
I got myself tangled up in some XPATH exoticism like:
xmlstarlet sel --text --template --match //parent_tag --match "//parent_tag/node()[not(self::good_tag)]" -f --nl */config.xml
... but it's not doing what I had hoped ...
Just select only <parent_tag/> elements which do not contain a <good_tag/> for inserting:
xmlstarlet ed -P -S -s '//parent_tag[not(good_tag)]' -t elem -n good_tag -v ""
If you also want to test for the right contents of the tag:
xmlstarlet ed -P -S -s '//parent_tag[not(good_tag[.="foobar"])]' -t elem -n good_tag -v ""
Related
I'm tring use xmlstarlet for parsing this file.xml:
<resultset>
<row>
<field name="b">2</field>
<field name="c"></field>
</row>
</resultset>
With first field evrithing is ok
$ xmlstarlet sel -t -v "resultset/row[1]/field[1]" file.xml
2
$ echo $?
0
But with second field xmlstarlet returns 1
$ xmlstarlet sel -t -v "resultset/row[1]/field[1]" file.xml
$ echo $?
1
In my case empty fields are normal. I want to parse its without xmlstarlet error.
UPDATE:
the same behavior with field[#name="b"]:
$ xmlstarlet sel -t -v 'resultset/row/field[#name="b"]' file.xml
2
$ echo $?
0
and
$ xmlstarlet sel -t -v 'resultset/row/field[#name="c"]' file.xml
$ echo $?
1
i want to distinguish the second case from the real error
UPDATE 2: The MAIN PROBLEM is:
If i try select [#name="c"] and [#name="not_exits"] xmlstarlet returns the SAME exit code 1.
But file.xml has field name 'c', and does not have field with name 'not_exist'.
I want xmlstarlet prints empti string and return exit code 0 when file.xml contains empty field with the given name,
and returns non-null exit code when file.xml does not contain field with the given name at all.
Try changing you xpath expressions to
resultset/row/field[#name="b"]
and
resultset/row/field[#name="c"]
and see if it works.
Edit:
I'm not sure what exactly you are echoing in your question, but if we assume your xml is in a file called file.xml, these expressions work with no errors:
xmlstarlet sel -T -t -m "resultset/row/field[#name='b']" -v . --nl file.xml
and
xmlstarlet sel -T -t -m "resultset/row/field[#name='c']" -v . --nl file.xml
I think -n or --nl does the trick to exit with 0 when there is empty match.
Together with -m ... -v . seems to do what OP wants.
# Empty match
$ echo "<field name=\"c\"></field>" | xmlstarlet sel -t -m "field[#name='c']" -v . --nl
$ echo "${PIPESTATUS[#]}"
0 0
# No match
$ echo "<field name=\"c\"></field>" | xmlstarlet sel -t -m "field[#name='non_existent']" -v . --nl
$ echo "${PIPESTATUS[#]}"
0 1
# Non-empty match
$ echo "<field name=\"c\">xyz</field>" | xmlstarlet sel -t -m "field[#name='c']" -v . --nl
xyz
$ echo "${PIPESTATUS[#]}"
0 0
Notes
This outputs non-empty string. For me it is not a problem, as I filter results later anyway.
Suppose I have an xml file:
<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
<map>
<string name="a"></string>
</map>
And I want to set the value of string with attribute a with something big:
$ xmlstarlet ed -u '/map/string[#name="a"]' -v $(for ((i=0;i<200000;i++)); do echo -n a; done) example.xml > o.xml
This will result in bash error "Argument list is too long". I was unable to find option in xmlstarlet which accept result from a file. So, how would I set the value of xml tag with 200KB data+?
Solution
After trying to feed chunks into the xmlstarlet by argument -a (append), I realized that I am having additional difficulties like escape of special characters and the order in which xmlstarlet accepts these chunks.
Eventually I reverted to simpler tools like xml2/sed/2xml. I am dropping the code as a separate post below.
This, as a workaround for your own example that bombs because of the ARG_MAX limit:
#!/bin/bash
# (remove 'echo' commands and quotes around '>' characters when it looks good)
echo xmlstarlet ed -u '/map/string[#name="a"]' -v '' example.xml '>' o.xml
for ((i = 0; i < 100; i++))
do
echo xmlstarlet ed -u '/map/string[#name="a"]' -a -v $(for ((i=0;i<2000;i++)); do echo -n a; done) example.xml '>>' o.xml
done
SOLUTION
I am not proud of it, but at least it works.
a.xml - what was proposed as an example in the starting post
source.txt - what has to be inserted into a.xml as xml tag
b.xml - output
#!/usr/bin/env bash
ixml="a.xml"
oxml="b.xml"
s="source.txt"
echo "$ixml --> $oxml"
t="$ixml.xml2"
t2="$ixml.xml2.edited"
t3="$ixml.2xml"
# Convert xml into simple string representation
cat "$ixml" | xml2 > "$t"
# Get the string number of xml tag of interest, increment it by one and delete everything after it
# For this to work, the tag of interest should be at the very end of xml file
cat "$t" | grep -n -E 'string.*name=.*a' | cut -f1 -d: | xargs -I{} echo "{}+1" | bc | xargs -I{} sed '{},$d' "$t" > "$t2"
# Rebuild the deleted end of the xml2-file with the escaped content of s-file and convert everything back to xml
# * The apostrophe escape is necessary for apk xml files
sed "s:':\\\':g" "$s" | sed -e 's:^:/map/string=:' >> "$t2"
cat "$t2" | 2xml > "$t3"
# Make xml more readable
xmllint --pretty 1 --encode utf-8 "$t3" > "$oxml"
# Delete temporary files
rm -f "$t"
rm -f "$t2"
rm -f "$t3"
I have a command and it returns 108 set of week/enumeration:
Command:
impala-shell -B -f query.sql
Results:
20180203 1
20180127 2
20180120 3
...
I parsed the results and read the week and enumeration as two variables. However, I have to use a variable wk to store intermediate results first:
wk="$(impala-shell -B -f query.sql)"
echo "$wk" | while read -r a b; do echo $a--$b; done
I tried to avoid using additional variable wk:
"$(impala-shell -B -f query.sql)" | while read -r a b; do echo $a--$b; done
But it returned:
...
20160213 104
20160206 105
20160130 106
20160123 107
20160116 108: command not found
I understand you can use wk="$(impala-shell -B -f query.sql)" && echo "$wk" | while read -r a b; do echo $a--$b; done but that doesn't avoid using a variable in the middle. How to compose a one-liner without using the variable wk?
or
awk to the rescue!
$ impala-shell -B -f query.sql | awk '{print $1"--"$2}'
You can execute commands first (inline) when using special quotes ``
Try this (untested, as i neither have your shell, nor that script):
`impala-shell -B -f query.sql` | while read -r a b; do echo $a--$b; done
Most elegant answer goes to choroba in the question comments! You just need to remove the quotes!
impala-shell -B -f query.sql | while read -r a b ; do echo $a--$b; done
I am trying to create a script that opens automatically any files containing a particular pattern.
This is what I achieved so far:
xargs -d " " vim < "$(grep --color -r test * | cut -d ':' -f 1 | uniq | sed ':a;N;$!ba;s/\n/ /g')"
The problem is that vim does not recognize the command as separate file of list, but as a whole filename instead:
zsh: file name too long: ..............
Is there an easy way to achieve it? What am I missing?
The usual way to call xargs is just to pass the arguments with newlines via a pipe:
grep -Rl test * | xargs vim
Note that I'm also passing the -l argument to grep to list the files that contain my pattern.
Use this:
vim -- `grep -rIl test *`
-I skip matching in binary files
-l print file name at first match
Try to omit xargs, becouse this leads to incorrect behaviour of vim:
Vim: Warning: Input is not from a terminal
What I usually do is append the following line to a list of files:
> ~/.files.txt && vim $(cat ~/.files.txt | tr "\n" " ")
For example :
grep --color -r test * > ~/.files.txt && vim $(cat ~/.files.txt | tr "\n" " ")
I have the following in my .bashrc to bind VV (twice V in uppercase) to insert that automatically :
insertinreadline() {
READLINE_LINE=${READLINE_LINE:0:$READLINE_POINT}$1${READLINE_LINE:$READLINE_POINT}
READLINE_POINT=`expr $READLINE_POINT + ${#1}`
}
bind -x '"VV": insertinreadline " > ~/.files.txt && vim \$(cat ~/.files.txt | tr \"\\n\" \" \")"'
I would like to use this Wikipedia page - http://en.wikipedia.org/wiki/Current_members_of_the_United_States_House_of_Representatives
It contains several links to .jpg images, and I would like to download all of the images into a folder. I am on Mac.
I have tried using wget but so far have been unable.
EDIT: To clarify, I would like for a script to click on every link on the page, then download the page. This is because I need the page to be redirected first.
You can use xmlstarlet for this purpose:
xmlstarlet sel --net --html -t -m "//img" -v "#src" -n 'http://en.wikipedia.org/wiki/Current_members_of_the_United_States_House_of_Representatives'
will give you all the src fields of the img tags in the page at http://en.wikipedia.org/wiki/Current_members_of_the_United_States_House_of_Representatives.
You'll notice that the output lines are missing a heading http:, so we'll have to add this.
Then:
while IFS= read -r line; do
[[ $line = //* ]] && line="http:$line"
wget "$line"
done < <(
xmlstarlet sel --net --html -t -m "//img" -v "#src" -n 'http://en.wikipedia.org/wiki/Current_members_of_the_United_States_House_of_Representatives'
)
should retrieve the image files.
From your comment I now understand your requirement: you want to get all the href fields of the a nodes that contain an img node. An xpath that fulfills this requirement is:
//a[img]
Hence,
xmlstarlet sel --net --html -t -m "//a[img]" -v "#href" -n 'http://en.wikipedia.org/wiki/Current_members_of_the_United_States_House_of_Representatives'
will get you these hrefs.
Now, the URL that is retrieved is not directly the image you want to download; instead it's another HTML page that contains links to the images you want. I've selected the image in these pages with the following xpath:
//div[#class='fullImageLink']/a
that is, the a nodes inside a div node with class="fullImageLink". This seems ok, heuristically.
Then, this should do:
#!/bin/bash
base="http://en.wikipedia.org"
get_image() {
local url=$base$1
printf "*** %s: " "$url"
IFS= read -r imglink < <(xmlstarlet sel --net --html -t -m "//div[#class='fullImageLink']/a" -v "#href" -n "$url")
if [[ -z $imglink ]]; then
echo " ERROR ***"
return 1
fi
imglink="http:$imglink"
echo " Downloading"
wget -q "$imglink" &
}
while IFS= read -r url; do
[[ $url = /wiki/File:* ]] || continue
get_image "$url"
done < <(
xmlstarlet sel --net --html -t -m "//a[img]" -v "#href" -n "$base/wiki/Current_members_of_the_United_States_House_of_Representatives"
)
You'll get a little bit more than what you want, but it's a good basis :).