How to convert xml response into json response with bash - bash

I make curl command and the response is in xml format. Is there a way to convert the response in json format, because I need to extract value of one of the tags, i.e. tag
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE SiteConfidenceApi PUBLIC "SiteConfidence Api Version current" "url">
<SiteConfidenceApi Version="current">
<Request></Request>
<Response Status="Ok" Code="200" Message="Success.">
<ApiKey Lifetime="3600">value</ApiKey>
</Response>
</SiteConfidenceApi>

I assume you want to extract "value" from element "ApiKey" - this can be done with recent versions of xmllint (depending on your linux distribution, it is usually from libxml2-utils or just libxml2).
XML=$(curl "$URL")
echo "$XML" | xmllint --xpath '//ApiKey/text()' -
Instead of //Apikey/text() you could use even more specific xpath expressions like /SiteConfidenceApi/Response/ApiKey/text()
If you have a file from your curl output you could replace "-" (to read from stdin) with the filename as argument to xmllint.
--xpath was added to libxml2 with version 2.7.7 in 2010.

Related

How to extract values from xml in body using xpath in camel

I have an http response that is set in my exchange body. I have to extract some values from that xml. I found that the best way could be using camel-xpath. I have to extract value from root tag level. For example in the xml below, the value i want to extract would be attribute1.
<rootTag attribute1="value1">
<child1/>
</rootTag>
I saw some examples that use namespace. But i don't think i have the scope of using namespace here. If so, how could i do that. Could i not directly extract it from the body of the exchange
You can extract your attribute to the message header:
.setHeader("MyHeader").xpath("/rootTag/#attribute1", String.class)
or put attribute to the body:
.setBody().xpath("/rootTag/#attribute1", String.class)
Your do not need namespaces here..
And #Gilles Quenot is certainly right about the xpath expression.
This is the Xpath you need :
string(/rootTag/#attribute1)
tests in a shell
xmlstarlet:
$ xmlstarlet sel -t -v '/rootTag/#attribute1' file
xmllint :
$ xmllint --xpath 'string(/rootTag/#attribute1)' file
xidel :
$ xidel -se '/rootTag/#attribute1' file
Output :
value1

xmllint to parse a html file

I was trying to parse out text between specific tags on a mac in various html files. I was looking for the first <H1> heading in the body. Example:
<BODY>
<H1>Dublin</H1>
Using regular expressions for this I believe is an anti pattern so I used xmllint and xpath instead.
xmllint --nowarning --xpath '/HTML/BODY/H1[0]'
Problem is some of the HTML files contain badly formed tags. So I get errors on the lines of
parser error : Opening and ending tag mismatch: UL line 261 and LI
</LI>
Problem is I can't just do, 2>/dev/null as then I loose those files altogether. Is there any way, I can just use an XPath expression here and just say, relax if the XML isn't perfect, just give me the value between the first H1 headings?
Try the --html option. Otherwise, xmllint parses your document as XML which is a lot stricter than HTML. Also note that XPath indices are 1-based and that HTML tags are converted to lowercase when parsing. The command
xmllint --html --xpath '/html/body/h1[1]' - <<EOF
<BODY>
<H1>Dublin</H1>
EOF
prints
<h1>Dublin</h1>

Bash Using Sed on a Variable

I have an xml file which will always have the following block inside of it:
<Name NameType="Also Known As">
<NameValue>
<FirstName>name1</FirstName>
<Surname>sur1</Surname>
</NameValue>
<NameValue>
<FirstName>name2</FirstName>
<Surname>sur2</Surname>
</NameValue>
</Name>
There may be just one "NameValue" node inside this block or there may be many. My issue is that I only ever want to keep the first one and discard the rest. The example above would hence read:
<Name NameType="Also Known As">
<NameValue>
<FirstName>name1</FirstName>
<Surname>sur1</Surname>
</NameValue>
</Name>
I'd like to use sed for this purpose. I have written the above block to a variable and tried to manipulate that as follows:
var=$(sed -n '/<Name NameType="Also Known As">*/,/<\/Name>/p' $file)
sed -i 's/<\/NameValue>.*<\/Name>/<\/NameValue><\/Name>/g' "$var"
I get the following in return:
sed: can't read <Name NameType="Also Known As">...
I thought the above would have replaced everything between the first closing NameValue tag and the closing Name tag with nothing. I used the -i option as I want to save the changes to this file. Perhaps it's syntax related, or maybe my understanding of using the sed command on a variable is way off. A point to note is that these tags occur inside other nodes in the file and so doing one blanket update using sed would change blocks which i don't want to change. This is the reason I passed this block to a variable. Any pointer's in the right direction would be much appreciated.
As the previous comments point out, sed expects one or several input files (which may be something like stdin, of course). In your example code, you read the content of a file into a variable in a roundabout way, and then use that variable as the name of a file. That is bound to fail. While I'm sure you could achieve your objectives using sed, I do not advise it. Branching in sed is horrible. Use a more powerful tool, preferably xsltproc or something else that is designed to work with XML.
You could use a stylesheet similar to this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/Name">
<Name>
<xsl:copy-of select="#*" />
<xsl:copy-of select="*[1]" />
</Name>
</xsl:template>
</xsl:stylesheet>

Showing special characters in bash echo

I am writing a bash script that reads the results of an sql query in which the results are output as HTML (using the -H option) to a file (using the -o option) and then sends those results in an email. When the results are output to the file, they come out as:
'<IDLE>'
But when I parse them from the output file they show up in the email as:
<IDLE>
Can anyone help me format these so I get the actual characters and not the entity representation?
EDIT: The way I am sending the text now is:
echo -e $EMAIL_TXT | mail -s $SUBJECT $RECIPIENT
And the way I am extracting the text from the html file ($OUT_FILE) is:
QRY_LINE=$(sed "${QRY_LNUM}q;d" $OUT_FILE)
I ended up just using string replace to replace all <s and >s with < and > respectively.

Batch to read partial xml file

I'm trying to make a batch file to pull data out of a file and set it as a variable.
The tricky part is I need to read a XML file, and I only need the data between the quotes of the following line...
narrative="I only need this text here"
The text in that line can also contain spaces, brackets, slashes, dashes and colons.
SAMPLE XML FILE :
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<cadcall>
<call callnumber="123456" jurisdiction="abcd" department="dept 1" complaint="cost" priority="M" calltencode="" callername="Persons Name" phonenumber="Cell Number HERE" narrative="[10/02/2012 14:56:27 : pos9 : PERSON] Fairly long narrative here describing issue of dispatch, but sometimes these can be short." alarmtype="" ocanumber="0000000000" disposition1="TRAN" />
</cadcall>
The proper tool to do this is xmllint from libxml, please, provide a more complete XML example, I will tell you how to use a Xpath request on your XML.
EDIT :
here a solution using Xpath (with a little hack : contains) :
xmllint --xpath '/cadcall/call/#narrative[contains(.,'.')]' file.xml
without seeing the complete input, just based on your example line. grep works for you.
kent$ echo 'narrative="I only need this text here"'|grep -Po '(?<=narrative=")[^"]*'
I only need this text here

Resources