xmlstarlet: query a value if another value is present - xpath

Here is my xml file where I am trying to query and print the list of all the id's in the file where avtivebyDefault is set to true.
For this am using xmlstarlet sel with the following options:
$ xmlstarlet sel -N x=http://maven.apache.org/POM/4.0.0 -t -m '/x:project/x:profiles/x:profile/x:activation[x:activeByDefault="true"]' -v /x:project/x:profiles/x:profile/x:id pom.xml | sort -u
aaa
alto
bgpcep
bier
coe
controller
daexim
distribution
dlux
dluxapps
eman
faas
genius
groupbasedpolicy
honeycombvbd
infrautils
jsonrpc
l2switch
lispflowmapping
nemo
netconf
netvirt
neutron
nic
ocpplugin
odlparent
ofconfig
openflowplugin
ovsdb
p4plugin
packetcable
sfc
snmp
snmp4sdn
sxp
tsdr
unimgr
usc
vtn
vtnaaa
There are two issues here firstly it prints all the id's even if they are set as false and second the concats by printing the first and last id together (ex vtnaaa from the last line of the output which is incorrect). What am I missing in my query?

Try changing this part of the command:
-m '/x:project/x:profiles/x:profile/x:activation[x:activeByDefault="true"]' -v /x:project/x:profiles/x:profile/x:id
to this:
-m '/x:project/x:profiles/x:profile[x:activation/x:activeByDefault="true"]/x:id' -v .
To separate the values, try using either -n for newlines or change the -v to -v concat(.,' ') for a space.
Example...
$ xmlstarlet sel -N x="http://maven.apache.org/POM/4.0.0" -t -m "/x:project/x:profiles/x:profile[x:activation/x:activeByDefault='true']/x:id" -v . -n pom.xml
aaa
alto
bgpcep
bier
coe
controller
daexim
distribution
dlux
dluxapps
genius
groupbasedpolicy
honeycombvbd
infrautils
jsonrpc
l2switch
lispflowmapping
nemo
netconf
netvirt
neutron
odlparent
ofconfig
openflowplugin
ovsdb
p4plugin
packetcable
sfc
snmp
snmp4sdn
sxp
tsdr
usc
vtn

Related

Adding values in a YAML file via loop

As a part of the Kubernetes Resource Definition, I want to whitelist certain IPs. The list of IPs can be found by
$ kubectl get nodes -o wide --no-headers | awk '{print $7}'
#This prints something like
51.52.215.214
18.170.74.10
.....
Now,
In the Kubernetes deployment file (say deployment.yaml) I want to loop over these values and whitelist them.
I know that we can whitelist by adding under loadBalancerSourceRanges like
#part of the deployment.yaml
loadBalancerSourceRanges
- 51.52.112.111
- 18.159.75.11
I want to update the above loadBalancerSourceRanges to include the output of
$ kubectl get nodes -o wide --no-headers | awk '{print $7}'
How do I go about it? Instead of hardcoding the host IPs, I would like to programatically include via bash or ansible or any other cleaner way possible.
Thanks in advance,
JE
loadBalancerSourceRanges should be a part of Service, not Deployment
You can use the following oneliner to patch your service dynamically:
kubectl patch service YOUR_SERVICE_NAME -p "{\"spec\":{\"loadBalancerSourceRanges\": [$(kubectl get nodes -o jsonpath='{range .items[*].status.addresses[?(#.type=="InternalIP")]}"{.address}/32",{end}' | sed 's/,*$//g')]}}"
, where you should replace YOUR_SERVICE_NAME with actual service name
To explain what's going on here:
We are using kubectl patch to patch existing resource, in our case - spec.loadBalancerSourceRanges.
we are putting our subshell inside [$(..)], since loadBalancerSourceRanges requires array of strings
kubectl get nodes -o jsonpath='{range .items[*].status.addresses[?(#.type=="InternalIP")]}"{.address}/32",{end}' - gets InternalIPs from your nodes, adds /32 to each of them, since loadBalancerSourceRanges requires ranges, encloses each range in " and then places coma between each value.
sed 's/,*$//g' - removes a trailing comma
Using jsonpath is better thatn awk/cut because we are not dependent on kubectl column ordering and get only relevant for us information from API.
I agree with #Kaffe Myers that you should try using kustomize or helm or other templating engines, since they should be a better suited for this job.
You can use yq
# empty array if necessary
yq -i '.loadBalancerSourceRanges = []' file.yaml
# In my env (AWS EKS) the IP is field 6 (change if needed)
for host in $(kubectl get nodes -o wide --no-headers | awk '{print $6}')
do
yq -i '.loadBalancerSourceRanges += ["'${host}'"]' file.yaml
done
The -i parameter is to apply the change to the file (like sed)
If "loadBalancerSourceRanges" is inside "config", you can use: ".config.loadBalancerSourceRanges"
This is a very use-specific thingy, and you might be better off researching kustomize. That being said, you could make a temporary file which you alter before deploy.
cp deployment.yaml temp.yaml
kubectl get nodes -o wide --no-headers |
awk '{print $7}' |
xargs -I{} sed -Ei "s/^(\s+)(loadBalancerSourceRanges:)/\1\2\n\1 - {}/" temp.yaml
kubectl apply -f temp.yaml
It looks for the loadBalancerSourceRanges: part of the yaml, which on the "template" shouldn't have any values, then populate it with whatever kubectl get nodes -o wide --no-headers | awk '{print $7}' feeds it with.

How to get always latest link to download tomcat server using shell

I have written a shell script to download and install the tomcat server v (8.5.31). wget http://www.us.apache.org/dist/tomcat/tomcat-8/v8.5.31/bin/apache-tomcat-8.5.31.tar.gz It was working fine, but as soon the version got changed to 9.0.10, it started giving error as 404 not found.
So what should I do to get the latest version always.
TL;DR
TOMCAT_VER=`curl --silent http://mirror.vorboss.net/apache/tomcat/tomcat-8/ | grep v8 | awk '{split($5,c,">v") ; split(c[2],d,"/") ; print d[1]}'`
wget -N http://mirror.vorboss.net/apache/tomcat/tomcat-8/v${TOMCAT_VER}/bin/apache-tomcat-${TOMCAT_VER}.tar.gz
I encountered the same challenge.
However for my solution I require the latest 8.5.x Tomcat version which keeps changing.
Since the URL to download Tomcat remains the same, with only the version changing, I found the following solution works for me:
TOMCAT_VER=`curl --silent http://mirror.vorboss.net/apache/tomcat/tomcat-8/ | grep v8 | awk '{split($5,c,">v") ; split(c[2],d,"/") ; print d[1]}'`
echo Tomcat version: $TOMCAT_VER
Tomcat version: 8.5.40
grep v8 - returns the line with the desired version:
<img src="/icons/folder.gif" alt="[DIR]"> v8.5.40/ 2019-04-12 13:16 -
awk '{split($5,c,">v") ; split(c[2],d,"/") ; print d[1]}' - Extracts the version we want:
8.5.40
I then proceed to download Tomcat using the extracted version:
wget -N http://mirror.vorboss.net/apache/tomcat/tomcat-8/v${TOMCAT_VER}/bin/apache-tomcat-${TOMCAT_VER}.tar.gz
This is the complete curl response from which the version is extracted using curl, grep and awk:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /apache/tomcat/tomcat-8</title>
</head>
<body>
<h1>Index of /apache/tomcat/tomcat-8</h1>
<pre><img src="/icons/blank.gif" alt="Icon "> Name Last modified Size Description<hr><img src="/icons/back.gif" alt="[PARENTDIR]"> Parent Directory -
<img src="/icons/folder.gif" alt="[DIR]"> v8.5.40/ 2019-04-12 13:16 -
<hr></pre>
<address>Apache/2.4.25 (Debian) Server at mirror.vorboss.net Port 80</address>
</body></html>
I've found a way using the official github mirror.
Basically, one has to query the github api for all available tags.
Afterwards, for each tag, the date has to be determined.
Finally, the tag with the latest date is the latest tag!
Try this script - let's call it latest-tag. It's dependent on jq. It takes a short while to execute, but should print the URL of the tarball of the lastest tag (currently: https://api.github.com/repos/apache/tomcat/tarball/TOMCAT_9_0_10)
#!/bin/bash
# Prints the url to the latest tag of given github repo
# $1: repo (e.g.: apache/tomcat )
# $2: optional github credentials. Credentials are needed if running into the api rate limit (e.g.: <user>|<user>:<authkey>)
repo=${1:?Missing parameter: repo (e.g.: apache/tomcat )}
[ -n "$2" ] && credentials="-u $2"
declare -a commits
declare -a tarball_urls
while IFS=, read commit_url tarball_url
do
date=$(curl $credentials --silent "$commit_url" | jq -r ".commit.author.date")
if [[ "$date" > ${latest_date:- } ]]
then
latest_date=$date
latest_tarball_url=$tarball_url
fi
done < <( curl $credentials --silent "https://api.github.com/repos/$repo/tags" | jq -r ".[] | [.commit.url, .tarball_url] | #csv" | tr -d \")
echo $latest_tarball_url
Usage:
./latest-tag apache/tomcat
You might get hindered by the rate limit of the github api.
Therefore, you might want to supply github credentials to the script:
./latest-tag apache/tomcat <username>
This will ask you for your github password. In order to run it interactively, you can supply the script with a personal github api token:
./latest-tag apache/tomcat <username>:<api token>
Disclaimer - this solution uses screen scraping
Find and download latest version of Apache Tomcat 9 for Linux or Windows-x64.
Uses Python 3.7.3
import os
import urllib.request
url_ends_with = ".tar.gz\"" # Use line for Non-Windows
url_ends_with = "windows-x64.zip\"" # Use line for Windows-x64
url_starts_with = "\"http"
dir_to_contain_download = "tmp/"
tomcat_apache_org_frontpage_html = "tomcat.apache.org.frontpage.html"
download_page = "https://tomcat.apache.org/download-90.cgi"
try:
if not os.path.exists(dir_to_contain_download):
os.makedirs(dir_to_contain_download, exist_ok=True)
htmlfile = urllib.request.urlretrieve(download_page, dir_to_contain_download + tomcat_apache_org_frontpage_html)
fp = open(dir_to_contain_download + tomcat_apache_org_frontpage_html)
line = fp.readline()
cnt = 1
while line:
line = fp.readline()
cnt += 1
if url_ends_with in line and url_starts_with in line:
tomcat_url_index = line.find(url_ends_with)
tomcat_url = line[line.find(url_starts_with) + 1 : tomcat_url_index + len(url_ends_with) - 1]
print ("Downloading: " + tomcat_url)
print ("To file: " + dir_to_contain_download + tomcat_url[tomcat_url.rfind("/")+1:])
zipfile = urllib.request.urlretrieve(tomcat_url, dir_to_contain_download + tomcat_url[tomcat_url.rfind("/")+1:])
break
finally:
fp.close()
os.remove(dir_to_contain_download + "/" + tomcat_apache_org_frontpage_html)
As I don't have enough reputation to answer to Jonathan or edit his post, here is my solution (tested with versions 8-10):
#!/bin/bash
wantedVer=9
TOMCAT_VER=`curl --silent http://mirror.vorboss.net/apache/tomcat/tomcat-${wantedVer}/|grep -oP "(?<=\"v)${wantedVer}(?:\.\d+){2}\b"|sort -V|tail -n 1`
wget -N https://mirror.vorboss.net/apache/tomcat/tomcat-${TOMCAT_VER%.*.*}/v${TOMCAT_VER}/bin/apache-tomcat-${TOMCAT_VER}.tar.gz
# Apache download link: wget -N https://dlcdn.apache.org/tomcat/tomcat-${TOMCAT_VER%.*.*}/v${TOMCAT_VER}/bin/apache-tomcat-${TOMCAT_VER}.tar.gz
I had trouble with Jonathans code, because there were different versions downloadable at the same time which broke the composed download-link. In this solution, only the newest one is regarded.
Altering the first line is enough to distinguish between different Main Versions.
Code Explained:
curl grabs an apache-directory-listing for the wanted tomcat-Version.
Grep then extracts all different Versions with a positive lookbehind, using the (-P) Perl regex pattern and keeping only the matching part (-o). Result: line(s) each containing one version Number in no particular order.
These lines get sorted by Version (sort -V) and only the last line (tail -n 1), in which is the greatest of all Versions is located, is assigned to the variable TOMCAT_VER.
At last, the download link is created with the gathered version information and downloaded via wget , but only if it is a newer version than present (-N).
I wrote this code:
TOMCAT_URL=$(curl -sS https://tomcat.apache.org/download-90.cgi | grep \
'>tar.gz</a>' | head -1 | grep -E -o 'https://[a-z0-9:./-]+.tar.gz')
TOMCAT_NAME=$(echo $TOMCAT_URL | grep -E -o 'apache-tomcat-[0-9.]+[0-9]')
It's not the most efficient way possible but it's very easy to understand how it works and it does work. Update the download-XX.cgi link to 10 if you want that version.
Then you can do:
curl -sS $TOMCAT_URL | tar xfz -
ln -s $TOMCAT_NAME apache-tomcat
and you will have the current version of Tomcat at apache-tomcat. When a new version comes out you can use this to do an easy update while keeping the old version there.

Convert Redis Mass Insertion Protocol Format independent of OS

I am trying to perform Redis mass insertion using the command cat data.txt | redis-cli --pipe as mentioned in https://redis.io/topics/mass-insert.
The data format on macOS has to be converted so that mass insertion could be performed with cat ${FILE} | perl -i -p -e 's|[\r\n]+|\r\n|g' | redis-cli --pipe.
However, the above command does not work on a Linux environment (or a docker environment with the container built from an alpine based image). Instead, the following command has to performed cat ${FILE} | sed 's/\r*$/\r/' | redis-cli --pipe.
Is there a command that would work in both environments?
EDIT: Attached the following:
Redis Mass Insertion script on Alpine Linux: https://gist.github.com/francjohny/f2b13b4cfc147e07e52824ec88ba3781
Redis Mass Insertion script on Mac OS: https://gist.github.com/francjohny/b57756a1e0124dd562959ca5ece2a32b
Redis Protocol Format data file: https://gist.github.com/francjohny/0c21f32d9902809b215f4e92f5e6a9f1
➜ head ouput.rpf| xxd - Mac OS : https://gist.github.com/francjohny/e1a646ab44e7edd7374d28e9ca400711
➜ head ouput.rpf| xxd - Alpine Linux: https://gist.github.com/francjohny/252904928ded4c045448d12b205228df
Updated Answer
From the data you have added, it seems you just have linefeeds separating your lines, whereas Redis requires carriage return followed by linefeed. So basically, you want the equivalent of the unix2dos program, which is not included in macOS. However, macOS does include Perl, so you should be able to use:
perl -pe 's/\n/\r\n/' data.rpf | redis-cli --pipe
It works fine on my Mac.
Original Answer
You appear to have mixed line endings in your various environments. I would imagine this Perl would replace any number of carriage returns and line feeds in any mixture with a single carriage return and linefeed like Redis requires:
perl -pe 's|[\r\n]*|\r\n|' data.txt | redis-cli ...
If not, please answer my question in the comments.

How to get the highest numbered link from curl result?

i have create small program consisting of a couple of shell scripts that work together, almost finished
and everything seems to work fine, except for one thing of which i'm not really sure how to do..
which i need, to be able to finish this project...
there seem to be many routes that can be taken, but i just can't get there...
i have some curl results with lots of unused data including different links, and between all data there is a bunch of similar links
i only need to get (into a variable) the link of the highest number (without the always same text)
the links are all similar, and have this structure:
always same text
always same text
always same text
i was thinking about something like;
content="$(curl -s "$url/$param")"
linksArray= get from $content all links that are in the href section of the links
that contain "always same text"
declare highestnumber;
for file in $linksArray
do
href=${1##*/}
fullname=${href%.html}
OIFS="$IFS"
IFS='_'
read -a nameparts <<< "${fullname}"
IFS="$OIFS"
if ${nameparts[1]} > $highestnumber;
then
highestnumber=${nameparts[1]}
fi
done
echo ${nameparts[1]}_${highestnumber}.html
result:
https://always/same/link/unique-name_19.html
this was just my guess, any working code that can be run from bash script is oke...
thanks...
update
i found this nice program, it is easily installed by:
# 64bit version
wget -O xidel/xidel_0.9-1_amd64.deb https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9/xidel_0.9-1_amd64.deb/download
apt-get -y install libopenssl
apt-get -y install libssl-dev
apt-get -y install libcrypto++9
dpkg -i xidel/xidel_0.9-1_amd64.deb
it looks awsome, but i'm not really sure how to tweak it to my needs.
based on that link and the below answer, i guess a possible solution would be..
use xidel, or use "$ sed -n 's/.href="([^"]).*/\1/p' file" as suggested in this link, but then tweak it to get the link with html tags like:
< a href="https://always/same/link/same-name_17.html">always same text< /a>
then filter out all that doesn't end with ( ">always same text< /a> )
and then use the grep sort as mentioned below.
Continuing from the comment, you can use grep, sort and tail to isolate the highest number of your list of similar links without too much trouble. For example, if you list of links is as you have described (I've saved them in a file dat/links.txt for the purpose of the example), you can easily isolate the highest number in a variable:
Example List
$ cat dat/links.txt
always same text
always same text
always same text
Parsing the Highest Numbered Link
$ myvar=$(grep -o 'https:.*[.]html' dat/links.txt | sort | tail -n1); \
echo "myvar : '$myvar'"
myvar : 'https://always/same/link/same-name_19.html'
(note: the command above is all one line separate by the line-continuation '\')
Applying Directly to Results of curl
Whether your list is in a file, or returned by curl -s, you can apply the same approach to isolate the highest number link in the returned list. You can use process substitution with the curl command alone, or you can pipe the results to grep. E.g. as noted in my original comment,
$ myvar=$(grep -o 'https:.*[.]html' < <(curl -s "$url/$param") | sort | tail -n1); \
echo "myvar : '$myvar'"
or pipe the result of curl to grep,
$ myvar=$(curl -s "$url/$param" | grep -o 'https:.*[.]html' | sort | tail -n1); \
echo "myvar : '$myvar'"
(same line continuation note.)
Why not use Xidel with xquery to sort the links and return the last?
xidel -q links.txt --xquery "(for $i in //#href order by $i return $i)[last()]" --input-format xml
The input-format parameter makes sure you don't need any html tags at the start and ending of your txt file.
If I'm not mistaken, in the latest Xidel the -q (quiet) param is replaced by -s (silent).

Choosing a part of a string

Basically I have the following script:
#!/bin/bash
echo "What shall we set into managed mode? (e.g. wlan0)"
read thisend
sudo ifconfig $thisend down
sudo iwconfig $thisend mode managed
sudo ifconfig $thisend up
var=$(iwconfig wlan0)
What the script does (as you see) is to set the wireless card into managed mode, but I would like it to double-check at the end of the script if it actually is set in managed mode, which I'll write some comparison system for, but for now I just want to know if it's possible to strip out everything else from the output of iwconfig wlan0 except for Mode: Managed, and write the remaining output into a new variable.
var = $(iwconfig wlan0 | grep -v 'Mode: Managed')
from grep man page
-v, --invert-match
Selected lines are those not matching any of the specified patterns.
Use grep or sed to extract just the part you want.

Resources