Can Hive deserialize avro bytes to the schema provided? - hadoop

I have avro file to be loaded into Hive but my file is in binary.
What deserializer should be used to get the binary avro to hive?
I don't want binary data in hive but the decoded binary data.
This is how I create my table.
CREATE TABLE kst7
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='pathtoavsc.avsc');
When I use the above command table gets created, data gets loaded but when I do a select * from table I get below error:
Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found bytes, expecting union
avsc file:
{
"namespace": "com.nimesh.tripod.avro.enrichment",
"type": "record",
"name": "EnrichmentData",
"fields": [
{"name": "rowKey", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
{"name": "ownerGuid", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
{"name": "autotagsEnrichment", "type": ["bytes", "null", {
"namespace": "com.nimesh.tripod.avro.enrichment",
"type": "record",
"name": "AutotagEnrichment",
"fields": [
{"name": "version", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
{"name": "autotags", "type": ["null", {"type": "array", "items": {
"namespace": "com.nimesh.tripod.avro.enrichment",
"type": "record",
"name": "Autotag",
"fields": [
{"name": "tag", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
{"name": "score", "type": ["null", "double"], "default": null}
]
}}], "default": null}
]
}], "default": null},
{"name": "colorEnrichment", "type": ["bytes","null", {
"namespace": "com.nimesh.tripod.avro.enrichment",
"type": "record",
"name": "ColorEnrichment",
"fields": [
{"name": "version", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
{"name": "color", "type": ["null", {"type": "array", "items": {
"namespace": "com.nimesh.tripod.avro.enrichment",
"type": "record",
"name": "Color",
"fields": [
{"name": "color", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
{"name": "score", "type": ["null", "double"], "default": null}
]
}}], "default": null}
]
}], "default": null}
]
}

I think you are looking for SERDEPROPERTIES rather than TBLPROPERTIES
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.url'='pathtoschema.avsc')
Otherwise, try selecting individual fields until you find the one that's causing the error, then inspect what type(s) the AVSC are being mapped into the Hive table as.

Related

How to insert Oracle blob using Json in nifi

I have a table in Oracle DB with one column holding BLOB data.
I convert Blob field into Binary Array see example "Json Data"
I get Error in PutDatabaseRecord that show Can't insert table wiht different data type (LONG)
how can i do ?
Thank you.
Json Data:
[
{
"GCB01": "DOC-30981-20220712-110419",
"GCB02": "DOC",
"GCB03": "01",
"GCB04": "001",
"GCB05": "test",
"GCB06": "test",
"GCB07": "1.pdf",
"GCB08": "D",
"GCB09": [37,80,68,70,45,49,46,55,13,10,37,-75,-75,-75,-75,13,10,49,32,48,32,111,98,106,13,10,60,60,47,84,121,112,101,47,67,97,116,97,108,111,103,47,80,97,103,101,115,32,50,32,48,32,82,47,76,97,110,103,40,122,104,45,84,87,41,32,47,83,116,114,117,99,116,84,114,101,101,82,111,111,116,32,49,53,32,48,32,82,47,77,97,114,107,73,110,102,111,60,60,47,77,97,114,107,101,100,32,116,114,117,101,62,62,47,77,101,116,97,100,97,116,97,32,50,57,32,48,32,82,47,86,105,101,119,101,114,80,114,101,102,101,114,101,110,99,101,115,32,51,48,32,48,32,82,62,62,13,10,101,110,100,111,98,106,13,10,50,32,48,32,111,98,106,13,10,60,60,47,84,121,112,101,47,80,97,103,101,115,47,67,111,117,110,116,32,49,47,75,105,100,115,91,32,51,32,48,32,82,93,32,62,62,13,10,101,110,100,111,98,106,13,10,51,32,48,32,111,98,106,13,10,60,60,47,84,121,112,101,47,80,97,103,101,47,80,97,114,101,110,116,32,50,32,48,32,82,47,82,101,115,111,117,114,99,101,115,60,60,47,70,111,110,116,60,60,47,70,49,32,53,32,48,32,82,47,70,50,32,49,50,32,48,32,82,62,62,47,69,120,116,71,83,116,97,116,101,60,60,47,71,83,49,48,32,49,48,32,48,32,82,47,71,83,49,49,32,49,49,32,48,32,82,62,62,47,80,114,111,99,83,101,116,91,47,80,68,70,47,84,101,120,116,47,73,109,97,103,101,66,47,73,109,97,103,101,67,47,73,109,97,103,101,73,93,32,62,62,47,77,101,100,105,97,66,111,120,91,32,48,32,48,32,53,57,53,46,51,50,32,56,52,49,46,57,50,93,32,47,67,111,110,116,101,110,116,115,32,52,32,48,32,82,47,71,114,111,117,112,60,60,47,84,121,112,101,47,71,114,111,117,112,47,83,47,84,114,97,110,115,112,97,114,101,110,99,121,47,67,83,47,68,101,118,105,99,101,82,71,66,62,62,47,84,97,98,115,47,83,47,83,116,114,117,99,116,80,97,114,101,110,116,115,32,48,62,62,13,10,101,110,100,111,98,106,13,10,52,32,48,32,111,98,106,13,10,60,60,47,70,105,108,116,101,114,47,70,108,97,116,101,68,101,99,111,100,101,47,76,101,110,103,116,104,32,49,55,56,62,62,13,10,115,116,114,101,97,109,13,10,120,-100,-83,-50,-69,14,-126,64,16,5,-48,126,-109,-3,-121,91,-126,-119,-69,59,-53,107,73,8,5,15,-119,70,11,3,-58,-62,88,80,32,-107,-60,-57,-1,39,46,98,67,104,-99,-18,102,-18,76,14,100,-3,104,7,36,-119,60,-28,-37,2,74,-18,-37,-95,-121,-45,13,-21,83,-19,-90,41,-78,34,-57,-109,51,37,-44,56,-58,68,4,-123,32,14,-124,-89,97,124,18,-79,-58,-85,-29,-20,-68,-62,-64,89,-42,112,38,55,4,-46,104,110,-100,-115,85,5,66,108,-81,-75,-113,40,8,-123,-79,-101,-69,45,85,53,41,-12,111,-5,24,-3,20,-23,23,43,-50,46,-119,-14,-54,34,-67,-94,-39,113,86,-38,-97,71,-50,80,30,114,96,-114,-91,63,96,-11,2,27,10,-14,103,-40,47,113,-126,57,112,-105,-86,15,-126,-93,66,90,13,10,101,110,100,115,116,114,101,97,109,13,10,101,110,100,111,98,106,13,10,53,32,48,32,111,98,106,13,10,60,60,47,84,121,112,101,47,70,111,110,116,47,83,117,98,116,121,112,101,47,84,121,112,101,48,47,66,97,115,101,70,111,110,116,47,66,67,68,69,69,69,43,67,97,108,105,98,114,105,47,69,110,99,111,100,105,110,103,47,73,100,101,110,116,105,116,121,45,72,47,68,101,115,99,101,110,100,97,110,116,70,111,110,116,115,32,54,32,48,32,82,47,84,111,85,110,105,99,111,100,101,32,50,53,32,48,32,82,62,62,13,10,101,110,100,111,98,106,13,10,54,32,48,32,111,98,106,13,10,91,32,55,32,48,32,82,93,32,13,10,101,110,100,111,98,106,13,10,55,32,48,32,111,98,106,13,10,60,60,47,66,97,115,101,70,111,110,116,47,66,67,68,69,69,69,43,67,97,108,105,98,114,105,47,83,117,98,116,121,112,101,47,67,73,68,70,111,110,116,84,121,112,101,50,47,84,121,112,101,47,70,111,110,116,47,67,73,68,84,111,71,73,68,77,97,112,47,73,100,101,110,116,105,116,121,47,68,87,32,49,48,48,48,47,67,73,68,83,121,115,116,101,109,73,110,102,111,32,56,32,48,32,82,47,70,111,110,116,68,101,115,99,114,105,112,116,11],
"GCB10": null,
"GCB11": "O",
"GCB12": "U",
"GCB13": "Z0078",
"GCB14": "1500",
"GCB15": "1657555200000",
"GCB16": null,
"GCB17": null,
"GCB18": null
}
]
Schema defineļ¼š
{
"type": "record",
"name": "gcb_file",
"fields" : [
{"name": "GCB01", "type": ["null", "string"]},
{"name": "GCB02", "type": ["null", "string"]},
{"name": "GCB03", "type": ["null", "string"]},
{"name": "GCB04", "type": ["null", "string"]},
{"name": "GCB05", "type": ["null", "string"]},
{"name": "GCB06", "type": ["null", "string"]},
{"name": "GCB07", "type": ["null", "string"]},
{"name": "GCB08", "type": ["null", "string"]},
{"name": "GCB09", "type": ["null", "bytes"]},
{"name": "GCB10", "type": ["null", "string"]},
{"name": "GCB11", "type": ["null", "string"]},
{"name": "GCB12", "type": ["null", "string"]},
{"name": "GCB13", "type": ["null", "string"]},
{"name": "GCB14", "type": ["null", "string"]},
{"name": "GCB15", "type": ["null", "string"]},
{"name": "GCB16", "type": ["null", "string"]},
{"name": "GCB17", "type": ["null", "string"]},
{"name": "GCB18", "type": ["null", "string"]}
]
}

Cannot read properties of undefined (reading 'EIP712Domain')

I'm signing messages but today i've faced issue that i was not able to sign and i receive this error any idea:
Cannot read properties of undefined (reading 'EIP712Domain')
And this how the message looks like that is passed from OpenSea to me:
{"primaryType": "MetaTransaction", "types": {"EIP712Domain": [{"name": "name", "type": "string"}, {"name": "version", "type": "string"}, {"name": "verifyingContract", "type": "address"}, {"name": "salt", "type": "bytes32"}], "MetaTransaction": [{"name": "nonce", "type": "uint256"}, {"name": "from", "type": "address"}, {"name": "functionSignature", "type": "bytes"}]}, "domain": {"name": "Wrapped Ether", "version": "1", "verifyingContract": "0x7ceb23fd6bc0add59e62ac25578270cff1b9f619", "salt": "0x0000000000000000000000000000000000000000000000000000000000000089"}, "message": {"nonce": 0, "from": "0xe6329913560ff9ca1b3b44ebaef08f0304b36f07", "functionSignature": "0x095ea7b3000000000000000000000000411b0bcf1b6ea88cb7229558c89994a2449c302cffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"}}
https://github.com/MetaMask/eth-sig-util/issues/251

Unable to parse double values in csv

Source csv:
ReportDay;SKU;ProductName;Price;Type;ShowsCondition;Impressions;Clicks;CTR;CPM;Budget;Orders;Revenue;ModelOrders;RevenueByModel
15.04.2021;1362254;Product1;71,00;Search;Shelf;1;0;0,00;35,00;0,04;0;0,00;0;0,00
15.04.2021;145406;Product2;129,00;Promo;Shelf;1;0;0,00;35,00;0,04;0;0,00;0;0,00
My custom schema to parse csv:
{
"type": "record",
"name": "SKU",
"namespace": "test",
"fields": [
{"name": "ReportDay", "type": "string"},
{"name": "SKU", "type": ["long","null"]},
{"name": "ProductName", "type": ["string","null"]},
{"name": "Price", "type": ["double","null"]},
{"name": "Type", "type": ["string","null"]},
{"name": "ShowsCondition", "type": ["string","null"]},
{"name": "Impressions", "type": ["long","null"]},
{"name": "Clicks", "type": ["long","null"]},
{"name": "CTR", "type": ["double","null"]},
{"name": "CPM", "type": ["double","null"]},
{"name": "Budget", "type": ["double","null"]},
{"name": "Orders", "type": ["long","null"]},
{"name": "Revenue", "type": ["double","null"]},
{"name": "ModelOrders", "type": ["long","null"]},
{"name": "RevenueByModel", "type": ["double","null"]}
]
}
Also settings:
While I'm trying to read .csv with this reader in UpdateRecord processor I get an error:
Error while getting next record. Root cause: java.lang.NumberFormatException: For input string: "0,00"
If I'll change all double types to string it will parse it with no problem. But I store these values as numeric in my DB, so I need to "read" these values as double type, not string. What did I wrong?

How to get logs using rest api in apache nifi

I went through several guides and couldn't find a way to get the logs with related information like data size of the flowfile(shown in the image) using rest api (or otherway if rest api is not possible).Eventhough nifi writes these logs to app-logs, Other related details can not find from there. How can I do that?
EDIT
According to comment from daggett,I have the rest api - http://localhost:8080/nifi-api/flow/bulletin-board, which solved my half of the question. Now I need to know who I can get the flowfile details which caused to the bulletin.
There are few controller services provided by nifi which gives in-depth information about status of nifi as well as information about flowfiles. One of those services is SiteToSiteProvenanceReportingTask
which you can use to derive the information about the failed file.
These controller services basically send information about flowfile as json data which can be queried or processed as flowfile in nifi.
Here is json data that above controller service returns -
{
"type" : "record",
"name" : "provenance",
"namespace" : "provenance",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventOrdinal", "type": "long" },
{ "name": "eventType", "type": "string" },
{ "name": "timestampMillis", "type": "long" },
{ "name": "durationMillis", "type": "long" },
{ "name": "lineageStart", "type": { "type": "long", "logicalType": "timestamp-millis" } },
{ "name": "details", "type": ["null", "string"] },
{ "name": "componentId", "type": ["null", "string"] },
{ "name": "componentType", "type": ["null", "string"] },
{ "name": "componentName", "type": ["null", "string"] },
{ "name": "processGroupId", "type": ["null", "string"] },
{ "name": "processGroupName", "type": ["null", "string"] },
{ "name": "entityId", "type": ["null", "string"] },
{ "name": "entityType", "type": ["null", "string"] },
{ "name": "entitySize", "type": ["null", "long"] },
{ "name": "previousEntitySize", "type": ["null", "long"] },
{ "name": "updatedAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "previousAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "actorHostname", "type": ["null", "string"] },
{ "name": "contentURI", "type": ["null", "string"] },
{ "name": "previousContentURI", "type": ["null", "string"] },
{ "name": "parentIds", "type": { "type": "array", "items": "string" } },
{ "name": "childIds", "type": { "type": "array", "items": "string" } },
{ "name": "platform", "type": "string" },
{ "name": "application", "type": "string" },
{ "name": "remoteIdentifier", "type": ["null", "string"] },
{ "name": "alternateIdentifier", "type": ["null", "string"] },
{ "name": "transitUri", "type": ["null", "string"] }
]
}
entityId ,entitySize is what you may be looking for.

Nifi : Nested JSON records schema validation

I'm trying to split a JSON file containing nested records using the SplitRecord processor.
As a result, I always get a null value instead of the expected array of records:
{"userid":"xxx","bookmarks":null}
Below is sample JSON
{
"userid": "Ib6gZ8ZPwRBbAL0KRSSKS",
"bookmarks": [
{
"id": "10000XXXXXXW0007760",
"creator": "player",
"position": 42.96
},
{
"id": "41ANSMARIEEW0075484",
"creator": "player",
"position": 51.87
},
{
"id": "ALBATORCORSW0088197",
"creator": "player",
"position": 93.47
},
{
"id": "ALIGXXXXXXXW0007944",
"creator": "player",
"position": 95.06
}
]
}
And here is my avro Schema:
{
"namespace": "nifi",
"name": "bookmark",
"type": "record",
"fields": [
{ "name": "userid", "type": "string" },
{ "name": "bookmarks", "type": {
"type": "record",
"name": "bookmarks",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "creator", "type": "string" },
{ "name": "position", "type": "float" }
]
}
}
]
}
Any help would be greatly appreciated !
I had to implement a specific groovy processor to overcome the limitations of nifi, which took me a lot of time. The management of avro schemes is limited to the simplest cases, and does not work for advanced treatments.

Resources