I am completely new NiFi but I understand from people it is good.
However, I am going to be sent a JSON, where there be an embedded array which can contain hex, byte, an ASCII characters. These values will nee converting to string values before inserting into Oracle.
Searching the internet, there are no proper examples to follow which converts JSON to SQL and converts data from hex to string, etc. Are there any examples to follow? Has anyone done something similar and advise?
there are two ways as I know of to convert JSON to SQL:
The first one is by using Jolt Transformation, which is not very efficient with large data comparatively.
The second one is which I prefer by using a series of processors to convert JSON to SQL: EvaluateJsonPath --> AttributesToJson --> ConvertJSONToSQL -->PutSQL.
There is a processor known as EncodeContent or EncodeAttribute for the conversion of hex to different formats.
Related
How can we validate input from XML and JSON, need to create that element in DB2?
This page will help you with validating XML https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.xml.doc/doc/c0050643.html
For JSON, as of Db2 11.1.3.3 you can use SYSTOOLS.JSON2BSON() to validate that a string is valid JSON. To validate it conforms to some schema is less easy. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.swg.im.dbclient.json.doc/doc/r0070290.html
If you store your JSON data as BSON you know it will be valid JSON. Store your XML as XML datatype for validation and lots of other advantages.
XMLTABLE() is one way to extract elements from XML into other (non XML) Db2 columns. JSON_TABLE() can do similar for JSON.
For general info on Db2’s XML capability, google for “PureXML” as well as using information from the Knowledge Center manual.
(all above assumes you are using Db2 for Linux, Unix or Windows)
I am really new to Hive, I apologize if there are any misconceptions in my question.
I need to read a hadoop Sequence File into a Hive table, the sequence file is thrift binary data, which could be deserialized using SerDe2 that comes with Hive.
The problem now is: One column in the file is encoded with Google protobuf, so when thrift SerDe processes the sequence file it does not process the protobuf encoded column properly.
I wonder if there's a way in Hive to deal with this kind of protobuf encoded columns that are nested inside a thrift sequence file, so that each column could be parsed properly?
Thank you so much for any possible help!
I believe you should use some other serde to deserialize the proto buff format,
may be you can refer this,
https://github.com/twitter/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive
Just wondering how anyone has dealt with handling extended ASCII in hive. For example, characters like §.
I see that character in the raw data stored as string in Hive but once I query or export the data it does not show up properly. Is there anyway to retain the §?
I am using Amazon EMR Hadoop Hive for big data processing. Current data in my log files is in CSV format. In order to make the table from log files, I wrote regex expression to parse the data and store into different columns of external table. I know that SerDe can be used to read data in JSON format and this means that each log file line could be as JSON object. Are there any Hadoop performance advantages if my log files are in JSON format comparing CSV format.
If you can process the output of the table (that you created with the regexp) why do another processing? Try to avoid unnecessary stuff.
I think the main issue here is which format is faster to read. I believe CSV will provide better speed over JSON but don't take my word. Hadoop really doesn't care. It's all byte arrays to him, once in memory.
Is the following workflow possible with Informatica Powercenter?
AS400 -> Xml(in memory) -> Oracle 10g stored procedure (pass xml as param)
Specifically, I need to take a result set eg. 100 rows. Convert those rows into a single xml document as a string in memory and then pass that as a parameter to an Oracle stored procedure that is called only once. I understood that a workflow runs row-by-row and this kind of 'batching' is not possible.
Yes, this scenario should be possible.
You can connect to AS/400 sources with native Informatica connector(s), although this might require (expensive) licenses. Another option is to extract the data from AS/400 source into a text file, and use that as a normal file source.
To convert multiple rows into one row, you would use an Aggregator transformation. You may need to create a dummy column (with same value for all rows) using an Expression, and use that column as the grouping key of the Aggregator, to squeeze the input into one single row. Row values would be concatenated together (separated by some special character) and then you would use another Expression to split and parse the data into as many ports (fields) as you need.
Next, with an XML Generator transformation you can create the XML. This transformation can have multiple input ports (fields) and its result will be directed into a single output port.
Finally, you would load the generated XML value into your Oracle target, possibly using a Stored Procedure transformation.