Nifi : Nested JSON records schema validation - apache-nifi

I'm trying to split a JSON file containing nested records using the SplitRecord processor.
As a result, I always get a null value instead of the expected array of records:
{"userid":"xxx","bookmarks":null}
Below is sample JSON
{
"userid": "Ib6gZ8ZPwRBbAL0KRSSKS",
"bookmarks": [
{
"id": "10000XXXXXXW0007760",
"creator": "player",
"position": 42.96
},
{
"id": "41ANSMARIEEW0075484",
"creator": "player",
"position": 51.87
},
{
"id": "ALBATORCORSW0088197",
"creator": "player",
"position": 93.47
},
{
"id": "ALIGXXXXXXXW0007944",
"creator": "player",
"position": 95.06
}
]
}
And here is my avro Schema:
{
"namespace": "nifi",
"name": "bookmark",
"type": "record",
"fields": [
{ "name": "userid", "type": "string" },
{ "name": "bookmarks", "type": {
"type": "record",
"name": "bookmarks",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "creator", "type": "string" },
{ "name": "position", "type": "float" }
]
}
}
]
}
Any help would be greatly appreciated !

I had to implement a specific groovy processor to overcome the limitations of nifi, which took me a lot of time. The management of avro schemes is limited to the simplest cases, and does not work for advanced treatments.

Related

AWS Open Search/Elastic search wild card search on full index

For example I have below 1 json data sample where multiple fields having the value '1001'. Like this I have many Json document. I want to search particular keyword like '1001' across any field (can be nested json field as well). I have gone through multiple document where they are suggesting to put the particular field name to search. Is there a way to achieve this without knowing which field has the search text?
URL: https://linuxhint.com/wildcard-query-elasticsearch/
{
"id": "1001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "1001" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "1001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}

How I can transform a field of the Jsonfile witn NiFi?

Good morning
I am new in NiFi and I want modify a field in a JSON file (I am using NiFi v.1.12.0) and save it in other PATH.
This is and example of my JSON file:
"Id": "2b2ef24a-f3ce-4249-ad92-db9a565b5b66",
"Tipo": "AuditEvent",
"SubTipo": "Plataforma",
"Accion": "Audit.Middleware.EventData.HttpResponseSentEvent",
"IDCorrelacion": "7af48a20-587d-4e60-9c3b-02cc6a074662",
"TiempoEvento": "2020-07-30 11:45:08.315",
"Resultado": "No informado",
"ResultadoDesc": "No informado",
"Origen": {
"IDOrigen": "132403308182038429",
"Tipo": "Backend",
"Aplicacion": "fabric:/Omnicanalidad.Canales.Muro_v1",
"Servicio": "fabric:/Omnicanalidad.Canales.Muro_v1/Muro",
"Maquina": "ibsfabbe02pru",
"IP": "ibsfabbe02pru"
},
"OrigenInterno": "Audit.Middleware.AuditMiddleware",
"Agente": {
"Rol": "Sin rol asignado",
"IDUsuario": "1428",
"AltIDUsuario": "20141115",
"Localizador": "197.183.27.17",
"PropositoUso": "No informado",
"IDSession": "",
"XForwardedPort": "443",
"XForwardedFor": "162.37.0.100:30279, 162.37.0.5:10158, 172.37.0.5",
"XForwardedHost": "ebeprate.es",
"XForwardedProto": "https",
"XOriginalURL": "/test/v1/Relation/ObtieneGestor?IdUser=4355625&NiciTitular=43485326",
"XOriginalHost": "ebeprate.es",
"Referer": null,
"AuthenticationType": "AuthenticationTypes.Federation",
"UserAgent": "HttpApplicationGateway",
"Claims": "Hello World",
"AcceptedLanguage": null
},
"DatosEvento": {
"Headers": ["Content-Length: 0", "Request-Context: appId=cid-v1:d8b40be1-4838-4a94-a4f8-3ec374989b27"],
"StatusCode": 204,
"Body": ""
}
}
I want modify the field TiempoEvento from date to timestamp.
In this case 2020-07-30 11:45:08.315 convert to 1596109508
So I use this procedure:
1.- I used the GetFile Processor for take the file. I configure the properties (without any problems) and everything it is ok.
2.- I used UpdateRecord Processor to modify the field. (The problems appears)
In properties I have 3 properties:
I read that I need configure a schema-registry if I want to work with any data in NiFi (I don't know if it is totaly true). In this case how I am working with a JsonFile I supposed that I need it, so I did it.
In controller service I configure JsonPathReader, JsonRecordSetWriter and AvroSchemaRegistry.
I started with AvroSchemaRegistry.
SETTING
Name: Test
PROPERTIES
Validate Field Names -> true
test-schema ->
{
"name": "MyFirstNiFiTest",
"type": "record",
"namespace": "test.example",
"fields": [
{
"name": "Id",
"type": "string"
},
{
"name": "Tipo",
"type": "string"
},
{
"name": "SubTipo",
"type": "string"
},
{
"name": "Accion",
"type": "string"
},
{
"name": "IDCorrelacion",
"type": "string"
},
{
"name": "TiempoEvento",
"type": "string"
},
{
"name": "Resultado",
"type": "string"
},
{
"name": "ResultadoDesc",
"type": "string"
},
{
"name": "Origen",
"type": {
"name": "Origen",
"type": "record",
"fields": [
{
"name": "IDOrigen",
"type": "string"
},
{
"name": "Tipo",
"type": "string"
},
{
"name": "Aplicacion",
"type": "string"
},
{
"name": "Servicio",
"type": "string"
},
{
"name": "Maquina",
"type": "string"
},
{
"name": "IP",
"type": "string"
}
]
}
},
{
"name": "OrigenInterno",
"type": "string"
},
{
"name": "Agente",
"type": {
"name": "Agente",
"type": "record",
"fields": [
{
"name": "Rol",
"type": "string"
},
{
"name": "IDUsuario",
"type": "string"
},
{
"name": "AltIDUsuario",
"type": "string"
},
{
"name": "Localizador",
"type": "string"
},
{
"name": "PropositoUso",
"type": "string"
},
{
"name": "IDSession",
"type": "string"
},
{
"name": "XForwardedPort",
"type": "string"
},
{
"name": "XForwardedFor",
"type": "string"
},
{
"name": "XForwardedHost",
"type": "string"
},
{
"name": "XForwardedProto",
"type": "string"
},
{
"name": "XOriginalURL",
"type": "string"
},
{
"name": "XOriginalHost",
"type": "string"
},
{
"name": "Referer",
"type": [
"string",
"null"
]
},
{
"name": "AuthenticationType",
"type": [
"string",
"null"
]
},
{
"name": "UserAgent",
"type": "string"
},
{
"name": "Claims",
"type": "string"
},
{
"name": "AcceptedLanguage",
"type": [
"string",
"null"
]
}
]
}
},
{
"name": "DatosEvento",
"type": {
"name": "DatosEvento",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "Category",
"type": "string"
},
{
"name": "EventType",
"type": "int"
},
{
"name": "Id",
"type": "int"
},
{
"name": "ApiName",
"type": "string"
},
{
"name": "Token",
"type": "string"
},
{
"name": "ApiScopes",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "TokenScopes",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "Message",
"type": "string"
},
{
"name": "ActivityId",
"type": "string"
},
{
"name": "TimeStamp",
"type": "int",
"logicalType": "date"
},
{
"name": "ProcessId",
"type": "int"
},
{
"name": "LocalIpAddress",
"type": "string"
},
{
"name": "RemoteIpAddress",
"type": "string"
}
]
}
}
]
}
I converted the JSON file to avroSchema
I enable it and everything it is OK.
Then I configure the JsonRecordSetWrite:
SETTING
Name: TestRecordSetWriter
PROPERTIES
I enable it and everything it is OK.
and then I configue de JsonPathReader
SETTING
Name: TestPathReader
PROPERTIES
And in this point I have and alert that said:
'JSON paths' is invalid bacause No JSON Paths were specified
and I can't enable this controller services, and I don't know what am I missing?
I don't know if there are another way to do it easier. I don't know if I am totally wrong. So I need some help.
Thank you
I found the answer. I has a bad configuration in JsonPathreader, because I had not configured the records of the schema in the properties.

Problem with schema validation using Postman

Body of my req:
[
{
"postId": 1,
"id": 1,
"name": "name abc",
"email": "Eliseo#gardner.biz",
"body": "something"
},
...
]
I am trying to validate it like below:
var schema = {
"type": "array",
"properties": {
"postId": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"email": {
"type": "string",
"pattern": "^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}$"
},
"body": {
"type": "string"
}
},
"required": [
"postId",
"id",
"name",
"email",
"body"
]
};
pm.test('Schemat jest poprawny', function() {
pm.expect(tv4.validate(jsonData, schema)).to.be.true;
});
The test is ok even if I change for example id type for string or email pattern for invalid one.
What is wrong with that code?
I would recommend moving away from tv4 for schema validations and use the built-in jsonSchema function, as this uses AJV.
Apart from that, your schema didn't look right and was missing the validation against the object, it looks like it was doing it against the array.
This might help you out:
let schema = {
"type": "array",
"items": {
"type": "object",
"required": [
"postId",
"id",
"name",
"email",
"body"
],
"properties": {
"postId": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"email": {
"type": "string",
"pattern": "^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}$"
},
"body": {
"type": "string"
}
}
}
}
pm.test("Schemat jest poprawny", () => {
pm.response.to.have.jsonSchema(schema)
})

How to get logs using rest api in apache nifi

I went through several guides and couldn't find a way to get the logs with related information like data size of the flowfile(shown in the image) using rest api (or otherway if rest api is not possible).Eventhough nifi writes these logs to app-logs, Other related details can not find from there. How can I do that?
EDIT
According to comment from daggett,I have the rest api - http://localhost:8080/nifi-api/flow/bulletin-board, which solved my half of the question. Now I need to know who I can get the flowfile details which caused to the bulletin.
There are few controller services provided by nifi which gives in-depth information about status of nifi as well as information about flowfiles. One of those services is SiteToSiteProvenanceReportingTask
which you can use to derive the information about the failed file.
These controller services basically send information about flowfile as json data which can be queried or processed as flowfile in nifi.
Here is json data that above controller service returns -
{
"type" : "record",
"name" : "provenance",
"namespace" : "provenance",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventOrdinal", "type": "long" },
{ "name": "eventType", "type": "string" },
{ "name": "timestampMillis", "type": "long" },
{ "name": "durationMillis", "type": "long" },
{ "name": "lineageStart", "type": { "type": "long", "logicalType": "timestamp-millis" } },
{ "name": "details", "type": ["null", "string"] },
{ "name": "componentId", "type": ["null", "string"] },
{ "name": "componentType", "type": ["null", "string"] },
{ "name": "componentName", "type": ["null", "string"] },
{ "name": "processGroupId", "type": ["null", "string"] },
{ "name": "processGroupName", "type": ["null", "string"] },
{ "name": "entityId", "type": ["null", "string"] },
{ "name": "entityType", "type": ["null", "string"] },
{ "name": "entitySize", "type": ["null", "long"] },
{ "name": "previousEntitySize", "type": ["null", "long"] },
{ "name": "updatedAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "previousAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "actorHostname", "type": ["null", "string"] },
{ "name": "contentURI", "type": ["null", "string"] },
{ "name": "previousContentURI", "type": ["null", "string"] },
{ "name": "parentIds", "type": { "type": "array", "items": "string" } },
{ "name": "childIds", "type": { "type": "array", "items": "string" } },
{ "name": "platform", "type": "string" },
{ "name": "application", "type": "string" },
{ "name": "remoteIdentifier", "type": ["null", "string"] },
{ "name": "alternateIdentifier", "type": ["null", "string"] },
{ "name": "transitUri", "type": ["null", "string"] }
]
}
entityId ,entitySize is what you may be looking for.

hive query avro uniontype

Similar to hive querying records for a specific uniontype
I have data on s3 in avro format and following is the avro structure:
{
"type": "record",
"name": "Event",
"namespace": "com.company.avro.event",
"fields": [
{
"name": "content",
"type": [
{
"type": "record",
"name": "Follow",
"fields": [
{
"name": "content",
"type": [
{
"type": "record",
"name": "UserFollowBrand",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "actor",
"type": "com.company.avro.entity.User"
},
{
"name": "verb",
"type": "string",
"default": "UserFollowBrand"
},
{
"name": "direct_object",
"type": "com.company.avro.entity.Brand"
},
{
"name": "on",
"type": [
"com.company.avro.type.IoSScreen",
"com.company.avro.type.AndroidScreen",
"null"
]
},
{
"name": "using",
"type": "com.company.avro.entity.App"
},
{
"name": "from",
"type": "string"
},
{
"name": "at",
"type": "long"
}
]
},
{
"type": "record",
"name": "UserFollowUser",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "actor",
"type": "com.company.avro.entity.User"
},
{
"name": "verb",
"type": "string",
"default": "UserFollowUser"
},
{
"name": "direct_object",
"type": "com.company.avro.entity.User"
},
{
"name": "on",
"type": [
"com.company.avro.type.IoSScreen",
"com.company.avro.type.AndroidScreen",
"null"
]
},
{
"name": "using",
"type": "com.company.avro.entity.App"
},
{
"name": "from",
"type": "string"
},
{
"name": "at",
"type": "long"
}
]
}
]
}
]
},
{
"type": "record",
"name": "Like",
"fields": [
{
"name": "content",
"type": [
{
"type": "record",
"name": "UserLikeListing",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "actor",
"type": "com.company.avro.entity.User"
},
{
"name": "verb",
"type": "string",
"default": "UserLikeListing"
},
{
"name": "direct_object",
"type": "com.company.avro.entity.Listing"
},
{
"name": "on",
"type": [
"com.company.avro.type.IoSScreen",
"com.company.avro.type.AndroidScreen",
"com.company.avro.type.WebScreen",
"null"
]
},
{
"name": "using",
"type": "com.company.avro.entity.App"
},
{
"name": "from",
"type": "string"
},
{
"name": "at",
"type": "long"
}
]
}
]
}
]
}
]
}
]
}
I am not sure how can I query for specific field within the uniontype.
For ex: select * from events where content.verb = "a" and content.actor.id = 34
Earlier hive did not support union types but now it seems it does support https://issues.apache.org/jira/browse/HIVE-2390
Unable to figure out how to use create_union function to query this.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypes

Resources