NiFi XML to JSON - apache-nifi

I have a NiFi flow that among other things transforms XML into JSON. This is done to prep the data for inserting into MongoDB. I'm using the TransformXML processor and an XSL to do the transform. Is this the correct method? Ordinarily, I would say that XSLT is not the best way to transform XML to JSON but it wasn't able to find another way in NiFi.

If your XML has a specific structure(not dynamic), you can use ConvertRecord processor.
Choose XMLReader for read XML. For this, you must define an avro schema.
Choose JsonRecordSetWriter for write converted result. In this state, if you don't want to change structure, you don't have to change anything on JsonRecordSetWriter.
For more information, I suggest you look at the link below.
https://pierrevillard.com/2018/06/28/nifi-1-7-xml-reader-writer-and-forkrecord-processor/

Well, there can be two most preferable approaches to convert XML data with Apache NiFi:
A. Using the TransformXML processor with a XSLT file
There are so many examples providing solution to transform any XML into a JSON document using XSLT. And it’s very easy to use. But based on your requirement, you might need specific features.
E.g. https://community.hortonworks.com/articles/29474/nifi-converting-xml-to-json.html
https://gist.github.com/speby/9561961e06dc1b38822764b26ddc2159
https://community.hortonworks.com/questions/91784/could-transformxml-work-with-several-xslts.html
B. Using a Java processor with JSONObject library
Working with this approach you need to write your own custom processor.
Note: org.json is NOT Apache friendly in terms of licensing.
A very good example on this can be:
https://gist.github.com/pvillard31/408c6ba3a9b53880c751a35cffa9ccea

Related

How do I get CsvDozerBeanWriter to pull column headers from Dozer XML mapping files

I'm writing a feature to produce CSV snapshots of screen data.
I need this to be data-driven. Thus I need to avoid hard-coding each snapshot in Java, but rather load it from a data source such as an XML file or a database. The data is contained in Java beans.
I'm using SuperCSV with the Dozer extension both at 2.1.0.
This combination seems perfect since I can code the mappings from the beans to the columns in Dozer XML mapping files.
This works well for the data, but I have not found a way to specify the strings to use for the CSV's column headers other than to hard-code them in Java as is done in all of the examples and test cases I've looked at. That is not data-driven.
Is there a way for me to code the column headers in the mapper file. Or even to extract them from the mapper file, construct a List and pass them to the writerHeader() method?
I think it would be OK to just use the bean property names as the headers, although ideal situation is that I am provided some additional meta-data notation in the XML's <Field> tag that specifies the header.
I'd have posted this on SourceForge, but I'm getting a 500 error there.
I'm a Super CSV developer. You're the first person I've heard of who's using CsvDozerBeanWriter with their own DozerBeanMapper - great to hear that feature is useful :)
So what's the goal of being 'data driven'? It sounds like you want your code to be really generic, so you can alter the CSV just by changing the XML. Is that right? Of course, you can't configure the cell processors dynamically...or are you trying to do that too!!??
I'd take a look at the MappingMetadata API of Dozer, which you can access by calling getMappingMetadata() on the DozerBeanMapper. I've never used it, but it looks like you could derive the column names this way (though you'd probably be limited to the field names).
Otherwise, you'll have to parse the XML file yourself (I'd probably use XPath). You'd have to do it this way if you want to use some other metadata in the XML for the column name.

Is there XML binding library for Ruby (like JAXB)?

is there any tool for Ruby which can transform XML (SOAP) to objects and vice versa? And if possible, generate all the objects (models) from XML schema (XSD). I worked several times with JAXB tool (in Java) and I need something simmilar:
generate models from XML schema
easily create component for serializing and deserializing them
easily create component for storing the objects to database
if possible, generate database tables according to that schema
Do you know any tool for this? What approach would you recommend to complete such task?
Thanks for your answers.
Savon should cover SOAP part of it.
I haven't used it but there is a library called HappyMapper: http://happymapper.rubyforge.org/

Create Value class for Sequence Files at runtime

I have some types of data that I have to upload on HDFS as Sequence Files.
Initially, I had thought of creating a .jr file at runtime depending on the type of schema and use rcc DDL tool by Hadoop to create these classes and use them.
But looking at rcc documentation, I see that it has been deprecated. I was trying to see what other options I have to create these value classes per type of data.
This is a problem as I get to know the metadata of the data to be loaded at runtime along with the data-stream. So, I have, no choice, but to create Value class at runtime and then use it for writing (key, vale) to SequenceFile.Writer and finally saving it on HDFS.
Is there any solution for this problem?
You can try looking other serialization frameworks, like Protocol Buffers, Thrift, or Avro. You might want to look at Avro first, since it doesn't require static code generation, which might be more suitable for you.
Or if you want something really quick and dirty, each record in the SequenceFile can be a HashMap where the key/values are the name of the field and the value.

What is the opposite of JAXB? i.e. generating XML FROM classes?

I am currently designing a solution to a problem I have. I need to dynamically generate an XML file on the fly using Java objects, in the same way JAXB generates Java classes from XML files, however the opposite direction. Is there something out there already like this?
Alternatively, a way in which one could 'save' a state of java classes.
The goal I am working towards is a dynamically changing GUI, where a user can redesign their GUI in the same way you can with iGoogle.
You already have the answer. It's JAXB! You can annotate your classes and then have JAXB marshal them to XML (and back) without the need to create an XML schema first.
Look at https://jaxb.dev.java.net/tutorial/section_6_1-JAXB-Annotations.html#JAXB%20Annotations to get started.
I don't know, if this is exactly what you're looking for, but there's the java.beans.XMLEncoder:
XMLEncoder enc = new XMLEncoder(new FileOutputStream(file));
enc.writeObject(obj);
enc.close();
The result can then be loaded by XMLDecoder:
XMLDecoder dec = new XMLDecoder(new FileInputStream(file));
Object obj = dec.readObject();
dec.close();
"generate xml from java objects:"
try xtream.
Here's what is said on the tin:
No mappings required. Most objects can be serialized without need for specifying mappings.
Requires no modifications to objects.
Full object graph support
For saving java object state:
Serialization is the way to do this in Java

xsd - validating values from external dictionary file

I would like to define a schema for a document like:
...
<car>
<make>ford</make>
<model>mondeo</model>
</car>
...
the problem is that I would like to constraint possible values (so ford/mondeo or audi/a4 would be valid values for make/model, but audi/mondeo would not) from external data dictionary. In case when new car models needs to be added only external data file would change, but xsd schema would remain the same.
Is this possible at all? I have looked at key/keyref constraint, I see I can use them within a single document, but this is not I'm looking for. I don't want to repeat full data dictionary with every document instance, I would prefer to have the data file rather constitute part of the schema.
That is not possible in XML Schema 1.0.
XML Schema 1.1 will add some support that will allow expressing this kind of constraints (although AFAIK not in external files) - but that is not yet a W3C recommendation.
It is possible to implement this now with Schematron, eventually embedded in XML Schema.
However, there was already work in this area with usable results. See OASIS Code Lists
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=codelist
More details can be found here:
http://www.genericode.org/
This is used in the OASIS Universal Business Language (UBL)
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl
Best Regards,
George

Resources