Identify all xml files with specific format - shell

I am trying to find xmls which are not having valid data and not process them. For example , below is a correct xml with all the data available which needs to be processes
(Loop Xpath used to read the data from xml files- Invoicing/Invoice/Serials/SerialNumber):
<?xml version='1.0' encoding='UTF-8'?>
<Invoicing>
<Invoice>
<VendorName>Contec</VendorName>
<InvoicePeriod>May</InvoicePeriod>
<InvoiceDt>2019-05-11</InvoiceDt>
<InvoiceNo>20190511</InvoiceNo>
<Serials>
<SerialNumber>
<TestLoc>HNMA01</TestLoc>
<EISSerial>PKQPLPXJC</EISSerial>
<ComcastModel>PX022ANC</ComcastModel>
<RMANo />
<ReceiptDt>05/09/2019</ReceiptDt>
<RepairDt>05/11/2019</RepairDt>
<Parts>
<Part>
<PartType>Cosmetic</PartType>
<PartId>SERVICEBUFFING</PartId>
<PartDescr>BUFF SERVC</PartDescr>
<ActionCd>RA003</ActionCd>
<FSC>FS005</FSC>
</Part>
</Parts>
</SerialNumber>
</Serials>
</Invoice>
</Invoicing>
I also get XML which are in below format,
<?xml version="1.0" encoding="UTF-8"?>
<Invoicing>
<Invoice>
<VendorName>Contec</VendorName>
<InvoicePeriod>May</InvoicePeriod>
<InvoiceDt>2017-05-01</InvoiceDt>
<InvoiceNo>20170501</InvoiceNo>
<Serials></Serials>
</Invoice>
</Invoicing>
The above xml , even though being valid is not correct..I want to identify the xmls which are in the second format without the complete data and move them into a error folder.
Thanks,
Kavin

Solution to this question is as follows :
I was able to achieve what i wanted with the solution provided by #EdMorton.
grep -L '<SerialNumber>' *.xml
But i wanted to get this done using a xml parser.
count=$(xmllint --xpath "count(//SerialNumber)" "$xml")
When the count is zero i have implemented my logic.
Thanks for all the help.

Related

How to find out which attribute having error in XML validating by using XSD and Validate Mediator in WSO2

I am trying to achieve XML message validation in WSO2 by using Validate Mediator. XSD is saved in Governance registry.For error in one attribute I am able to Print the log.But if the XML having 2-3 erroneous attributes How can I display it.Like how can I get in which attributes there is error?
My code is as below:
<?xml version="1.0" encoding="UTF-8"?>
<sequence name="Seq4" trace="disable" xmlns="http://ws.apache.org/ns/synapse">
<validate>
<schema key="gov:/as_uil/MT564_XSD.xsd"/>
<on-fail>
<sequence key="Error_Seq"/>
</on-fail>
</validate>
</sequence>
If having error in source field I am getting below output:
[2017-09-08 16:21:37,120] [] ERROR - LogMediator Error_Message = *******************Invalid Source***************

Can I modify an xml file only from command line?

There is an auto-generated xml file.
<?xml version="1.0" encoding="UTF-8"?>
<widget id="some ID" modes="max">
<icon src="icon.png"/>
<name>StackOverflow</name>
<feature name="some feature"/>
I would like to achieve following:
subtitute modes with "full"
subtitute name with ${name}
remove <feature>
add <content src="index.html"/> in between <icon> and <name>
Can this be done?
[adding more background]
It is an website mostly made of node.js. This shell command will be run on server-side Javascript and will be executed using child_process.exec().
Since awk is also an standard GNU tool, I think it might meet your requirement.
It's an very straightforward script you can adopt.
awk '
/modes/{gsub(/modes=".*"/,"modes=\"full\"") }
/<icon/{$0=$0"\n <content src=\"index.html\"/>"}
/<name>/{gsub(/>.*</,">$name<")}
!/feature/{print}
' Your_XML_file
Output:
<?xml version="1.0" encoding="UTF-8"?>
<widget id="some ID" modes="full">
<icon src="icon.png"/>
<content src="index.html"/>
<name>$name</name>
And here's the brief explain I would give,
find modes and in your xml file, and modify as you desire
find "< icon", and insert below it.
remove the line contain feature

How to add type metadata element in the OPF file?

I added following in the OPF file.
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" xml:lang="en" unique-identifier="pub-id">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:type>ePub</dc:type>
</metadata>
but getting error by Epub validator. http://validator.idpf.org/
element "dc:type" not allowed here; expected the element end-tag or element "dc:contributor", "dc:coverage", "dc:creator", "dc:description", "dc:format", "dc:identifier", "dc:language", "dc:publisher", "dc:relation", "dc:rights", "dc:subject", "dc:title", "link" or "meta"
How to add type metadata element in the OPF file?
Package Document (.opf)
The Package Document contains information about the book including the metadata, manifest, and
spine. It also defines the version must be 3.0.
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="bookid" version="3.0" prefix="rendition: http://www.idpf.org/vocab/rendition/#">
The Metadata
Minimum, you must include the following items.
Title
ID
Language
Type
Modified-date
For Example:
<dc:title>XXXXXX</dc:title>
<dc:creator>YYYYYY</dc:creator>
<dc:source>0000000</dc:source>
<dc:identifier id="p0000000">URN:ISBN:0000000<dc:identifier>
<dc:publisher>ZZZZZZ</dc:publisher>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:format>100 pages</dc:format>

Ruby Regexp matches the data from the previous xml tag

My log file is like this,
2015-04-10 19:10:39,688 INFO [abc] Reqt [283183]: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Data>..<Name>EVENT_1</Name>..</Data>
2015-04-10 19:10:39,688 INFO [abc] Req [283184]: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Data>..<Name>MY_EVENT</Name>..</Data>
Regex what i have written is,
pFile = File.read("C:/logs/pdata.log")
Regex = /<Data>(.*?)MY_EVENT(.*?)<\/Data>/m
pData = pFile.match(Regex).to_s
"MY_EVENT" might come in the first xml tag or second xml tag or even at last based on the scenario.
If it comes in first tag, regex works fine and if it comes in second tag, it matches from the first and my output is like,
<Data>..<Name>EVENT_1</Name>..</Data>
2015-04-10 19:10:39,688 INFO [abc] Req [283184]: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Data>..<Name>MY_EVENT</Name>..</Data>
I need to extract only one xml tag which has MY_EVENT.
Please help me out on this! Thanks in advance
Try this.
pData.match(/<Data>((?!<Data>).)*?MY_EVENT((?!<Data>).)*?<\/Data>/m)
I assume that all <Data> elements never contain another <Data> as its child element.

Can I write inside a xml node using Ruby/Builder?

There's my code:
require 'builder'
def initXML(builder)
builder.instruct!
builder.results(:result => 'result'){}
end
def writeXML(builder,name,hello)
builder.test(:name => name){
builder.hello hello
}
end
builder = Builder::XmlMarkup.new(:target=> STDOUT, :indent=>4)
initXML(builder)
writeXML(builder,'name1','hello1')
writeXML(builder,'name2','hello2')
Executing that I get this XML:
<?xml version="1.0" encoding="UTF-8"?>
<results result="result">
</results>
<test name="name1">
<hello>hello1</hello>
</test>
<test name="name2">
<hello>hello2</hello>
</test>
But I want the </results> end tag at the end of file. There's a way to write inside the <results> node? Or move the </results> to the end of file? Is better to use Nokogiri? Or is better to generate my XML manually? I'm trying to use that with Watir unit testing, there's something can I use to do that to write my results on a XML file?
(Update)That's the XML I want:
<?xml version="1.0" encoding="UTF-8"?>
<results result="result">
<test name="name1">
<hello>hello1</hello>
</test>
<test name="name2">
<hello>hello2</hello>
</test>
</results>
Thanks.
It's writing precisely what you're telling it to.
Pass a block into initXML (which would be init_xml in idiomatic Ruby, IMO) and execute it. But I wouldn't call it init_xml, because what you're actually doing is wrapping XML content.
I'm not sure what you're trying to accomplish; this is easily done using normal builder semantics. If you want to return XML nodes from a method consider passing the current node to a method, perhaps. Without knowing precisely what you're trying to do, it's difficult to advise.

Resources