How I can transform a field of the Jsonfile witn NiFi? - apache-nifi

Good morning
I am new in NiFi and I want modify a field in a JSON file (I am using NiFi v.1.12.0) and save it in other PATH.
This is and example of my JSON file:
"Id": "2b2ef24a-f3ce-4249-ad92-db9a565b5b66",
"Tipo": "AuditEvent",
"SubTipo": "Plataforma",
"Accion": "Audit.Middleware.EventData.HttpResponseSentEvent",
"IDCorrelacion": "7af48a20-587d-4e60-9c3b-02cc6a074662",
"TiempoEvento": "2020-07-30 11:45:08.315",
"Resultado": "No informado",
"ResultadoDesc": "No informado",
"Origen": {
"IDOrigen": "132403308182038429",
"Tipo": "Backend",
"Aplicacion": "fabric:/Omnicanalidad.Canales.Muro_v1",
"Servicio": "fabric:/Omnicanalidad.Canales.Muro_v1/Muro",
"Maquina": "ibsfabbe02pru",
"IP": "ibsfabbe02pru"
},
"OrigenInterno": "Audit.Middleware.AuditMiddleware",
"Agente": {
"Rol": "Sin rol asignado",
"IDUsuario": "1428",
"AltIDUsuario": "20141115",
"Localizador": "197.183.27.17",
"PropositoUso": "No informado",
"IDSession": "",
"XForwardedPort": "443",
"XForwardedFor": "162.37.0.100:30279, 162.37.0.5:10158, 172.37.0.5",
"XForwardedHost": "ebeprate.es",
"XForwardedProto": "https",
"XOriginalURL": "/test/v1/Relation/ObtieneGestor?IdUser=4355625&NiciTitular=43485326",
"XOriginalHost": "ebeprate.es",
"Referer": null,
"AuthenticationType": "AuthenticationTypes.Federation",
"UserAgent": "HttpApplicationGateway",
"Claims": "Hello World",
"AcceptedLanguage": null
},
"DatosEvento": {
"Headers": ["Content-Length: 0", "Request-Context: appId=cid-v1:d8b40be1-4838-4a94-a4f8-3ec374989b27"],
"StatusCode": 204,
"Body": ""
}
}
I want modify the field TiempoEvento from date to timestamp.
In this case 2020-07-30 11:45:08.315 convert to 1596109508
So I use this procedure:
1.- I used the GetFile Processor for take the file. I configure the properties (without any problems) and everything it is ok.
2.- I used UpdateRecord Processor to modify the field. (The problems appears)
In properties I have 3 properties:
I read that I need configure a schema-registry if I want to work with any data in NiFi (I don't know if it is totaly true). In this case how I am working with a JsonFile I supposed that I need it, so I did it.
In controller service I configure JsonPathReader, JsonRecordSetWriter and AvroSchemaRegistry.
I started with AvroSchemaRegistry.
SETTING
Name: Test
PROPERTIES
Validate Field Names -> true
test-schema ->
{
"name": "MyFirstNiFiTest",
"type": "record",
"namespace": "test.example",
"fields": [
{
"name": "Id",
"type": "string"
},
{
"name": "Tipo",
"type": "string"
},
{
"name": "SubTipo",
"type": "string"
},
{
"name": "Accion",
"type": "string"
},
{
"name": "IDCorrelacion",
"type": "string"
},
{
"name": "TiempoEvento",
"type": "string"
},
{
"name": "Resultado",
"type": "string"
},
{
"name": "ResultadoDesc",
"type": "string"
},
{
"name": "Origen",
"type": {
"name": "Origen",
"type": "record",
"fields": [
{
"name": "IDOrigen",
"type": "string"
},
{
"name": "Tipo",
"type": "string"
},
{
"name": "Aplicacion",
"type": "string"
},
{
"name": "Servicio",
"type": "string"
},
{
"name": "Maquina",
"type": "string"
},
{
"name": "IP",
"type": "string"
}
]
}
},
{
"name": "OrigenInterno",
"type": "string"
},
{
"name": "Agente",
"type": {
"name": "Agente",
"type": "record",
"fields": [
{
"name": "Rol",
"type": "string"
},
{
"name": "IDUsuario",
"type": "string"
},
{
"name": "AltIDUsuario",
"type": "string"
},
{
"name": "Localizador",
"type": "string"
},
{
"name": "PropositoUso",
"type": "string"
},
{
"name": "IDSession",
"type": "string"
},
{
"name": "XForwardedPort",
"type": "string"
},
{
"name": "XForwardedFor",
"type": "string"
},
{
"name": "XForwardedHost",
"type": "string"
},
{
"name": "XForwardedProto",
"type": "string"
},
{
"name": "XOriginalURL",
"type": "string"
},
{
"name": "XOriginalHost",
"type": "string"
},
{
"name": "Referer",
"type": [
"string",
"null"
]
},
{
"name": "AuthenticationType",
"type": [
"string",
"null"
]
},
{
"name": "UserAgent",
"type": "string"
},
{
"name": "Claims",
"type": "string"
},
{
"name": "AcceptedLanguage",
"type": [
"string",
"null"
]
}
]
}
},
{
"name": "DatosEvento",
"type": {
"name": "DatosEvento",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "Category",
"type": "string"
},
{
"name": "EventType",
"type": "int"
},
{
"name": "Id",
"type": "int"
},
{
"name": "ApiName",
"type": "string"
},
{
"name": "Token",
"type": "string"
},
{
"name": "ApiScopes",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "TokenScopes",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "Message",
"type": "string"
},
{
"name": "ActivityId",
"type": "string"
},
{
"name": "TimeStamp",
"type": "int",
"logicalType": "date"
},
{
"name": "ProcessId",
"type": "int"
},
{
"name": "LocalIpAddress",
"type": "string"
},
{
"name": "RemoteIpAddress",
"type": "string"
}
]
}
}
]
}
I converted the JSON file to avroSchema
I enable it and everything it is OK.
Then I configure the JsonRecordSetWrite:
SETTING
Name: TestRecordSetWriter
PROPERTIES
I enable it and everything it is OK.
and then I configue de JsonPathReader
SETTING
Name: TestPathReader
PROPERTIES
And in this point I have and alert that said:
'JSON paths' is invalid bacause No JSON Paths were specified
and I can't enable this controller services, and I don't know what am I missing?
I don't know if there are another way to do it easier. I don't know if I am totally wrong. So I need some help.
Thank you

I found the answer. I has a bad configuration in JsonPathreader, because I had not configured the records of the schema in the properties.

Related

Problem with schema validation using Postman

Body of my req:
[
{
"postId": 1,
"id": 1,
"name": "name abc",
"email": "Eliseo#gardner.biz",
"body": "something"
},
...
]
I am trying to validate it like below:
var schema = {
"type": "array",
"properties": {
"postId": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"email": {
"type": "string",
"pattern": "^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}$"
},
"body": {
"type": "string"
}
},
"required": [
"postId",
"id",
"name",
"email",
"body"
]
};
pm.test('Schemat jest poprawny', function() {
pm.expect(tv4.validate(jsonData, schema)).to.be.true;
});
The test is ok even if I change for example id type for string or email pattern for invalid one.
What is wrong with that code?
I would recommend moving away from tv4 for schema validations and use the built-in jsonSchema function, as this uses AJV.
Apart from that, your schema didn't look right and was missing the validation against the object, it looks like it was doing it against the array.
This might help you out:
let schema = {
"type": "array",
"items": {
"type": "object",
"required": [
"postId",
"id",
"name",
"email",
"body"
],
"properties": {
"postId": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"email": {
"type": "string",
"pattern": "^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}$"
},
"body": {
"type": "string"
}
}
}
}
pm.test("Schemat jest poprawny", () => {
pm.response.to.have.jsonSchema(schema)
})

How to get logs using rest api in apache nifi

I went through several guides and couldn't find a way to get the logs with related information like data size of the flowfile(shown in the image) using rest api (or otherway if rest api is not possible).Eventhough nifi writes these logs to app-logs, Other related details can not find from there. How can I do that?
EDIT
According to comment from daggett,I have the rest api - http://localhost:8080/nifi-api/flow/bulletin-board, which solved my half of the question. Now I need to know who I can get the flowfile details which caused to the bulletin.
There are few controller services provided by nifi which gives in-depth information about status of nifi as well as information about flowfiles. One of those services is SiteToSiteProvenanceReportingTask
which you can use to derive the information about the failed file.
These controller services basically send information about flowfile as json data which can be queried or processed as flowfile in nifi.
Here is json data that above controller service returns -
{
"type" : "record",
"name" : "provenance",
"namespace" : "provenance",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventOrdinal", "type": "long" },
{ "name": "eventType", "type": "string" },
{ "name": "timestampMillis", "type": "long" },
{ "name": "durationMillis", "type": "long" },
{ "name": "lineageStart", "type": { "type": "long", "logicalType": "timestamp-millis" } },
{ "name": "details", "type": ["null", "string"] },
{ "name": "componentId", "type": ["null", "string"] },
{ "name": "componentType", "type": ["null", "string"] },
{ "name": "componentName", "type": ["null", "string"] },
{ "name": "processGroupId", "type": ["null", "string"] },
{ "name": "processGroupName", "type": ["null", "string"] },
{ "name": "entityId", "type": ["null", "string"] },
{ "name": "entityType", "type": ["null", "string"] },
{ "name": "entitySize", "type": ["null", "long"] },
{ "name": "previousEntitySize", "type": ["null", "long"] },
{ "name": "updatedAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "previousAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "actorHostname", "type": ["null", "string"] },
{ "name": "contentURI", "type": ["null", "string"] },
{ "name": "previousContentURI", "type": ["null", "string"] },
{ "name": "parentIds", "type": { "type": "array", "items": "string" } },
{ "name": "childIds", "type": { "type": "array", "items": "string" } },
{ "name": "platform", "type": "string" },
{ "name": "application", "type": "string" },
{ "name": "remoteIdentifier", "type": ["null", "string"] },
{ "name": "alternateIdentifier", "type": ["null", "string"] },
{ "name": "transitUri", "type": ["null", "string"] }
]
}
entityId ,entitySize is what you may be looking for.

(kibana) mapper_parsing_exception

I'm trying to setup a mapping for this kind of logs (the automatic mapping didn't work).
here's the log that I've to analyse thanks to kibana (found on the internet):
{"index":
{"_index":"logstash-2015.05.18","_type":"log"
}
}
{"#timestamp":"2015-05-18T09:03:25.877Z","ip":"185.124.182.126","extension":"gif","response":"404",
"geo":{
"coordinates":{
"lat":36.518375,"lon":-86.05828083
},
"src":"PH","dest":"MM","srcdest":"PH:MM"
},
"#tags":["success","info"],"utc_time":"2015-05-18T09:03:25.877Z","referer":"http://twitter.com/error/william-shepherd","agent":"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1","clientip":"185.124.182.126","bytes":804,"host":"motion-media.theacademyofperformingartsandscience.org","request":"/canhaz/gemini-7.gif","url":"https://motion-media.theacademyofperformingartsandscience.org/canhaz/gemini-7.gif","#message":"185.124.182.126 - - [2015-05-18T09:03:25.877Z] \"GET /canhaz/gemini-7.gif HTTP/1.1\" 404 804 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\"","spaces":"this is a thing with lots of spaces wwwwoooooo","xss":"<script>console.log(\"xss\")</script>","headings":["<h3>f-i-j-nl-ng</h5>","http://facebook.com/success/lodewijk-van-den-berg"],"links":["daniel-tani#facebook.com","http://nytimes.com/security/kathryn-sullivan","www.nytimes.com"],
"relatedContent":[
{"url":"http://www.laweekly.com/news/cbs-crew-rat-fink-2368032","og:type":"article","og:title":"CBS Crew Rat Fink","og:description":"Near a couple of auto body shops (and a sharp new Space Invader mosaic that we'll post soon) near Temple and Westmoreland is a CBS wall with a nice Rat ...","og:url":"http://www.laweekly.com/news/cbs-crew-rat-fink-2368032","article:published_time":"2008-01-14T08:05:26-08:00","article:modified_time":"2014-10-28T14:59:52-07:00","article:section":"News","article:tag":"Mark Mauer","og:image":"http://IMAGES1.laweekly.com/imager/cbs-crew-rat-fink/u/original/2430299/img_2049.jpg","og:image:height":"360","og:image:width":"480","og:site_name":"LA Weekly","twitter:title":"CBS Crew Rat Fink","twitter:description":"Near a couple of auto body shops (and a sharp new Space Invader mosaic that we'll post soon) near Temple and Westmoreland is a CBS wall with a nice Rat ...","twitter:card":"summary","twitter:image":"http://IMAGES1.laweekly.com/imager/cbs-crew-rat-fink/u/original/2430299/img_2049.jpg","twitter:site":"#laweekly"
},
{"url":"http://www.laweekly.com/news/push-and-retna-in-koreatown-2368043","og:type":"article","og:title":"Push and Retna in Koreatown","og:description":"Yeah, I originally had this posted this morning as Push & Ayer - Sorry. It looked like a Retna piece, but I saw the Ayer in there and thought that must ...","og:url":"http://www.laweekly.com/news/push-and-retna-in-koreatown-2368043","article:published_time":"2008-01-29T07:28:32-08:00","article:modified_time":"2014-10-28T14:59:54-07:00","article:section":"News","article:tag":"Shelley Leopold","og:image":"http://IMAGES1.laweekly.com/imager/push-and-retna-in-koreatown/u/original/2430376/img_3671.jpg","og:image:height":"360","og:image:width":"480","og:site_name":"LA Weekly","twitter:title":"Push and Retna in Koreatown","twitter:description":"Yeah, I originally had this posted this morning as Push & Ayer - Sorry. It looked like a Retna piece, but I saw the Ayer in there and thought that must ...","twitter:card":"summary","twitter:image":"http://IMAGES1.laweekly.com/imager/push-and-retna-in-koreatown/u/original/2430376/img_3671.jpg","twitter:site":"#laweekly"
},
{"url":"http://www.laweekly.com/news/asylm-ruets-pdb-on-santa-monica-2368012","og:type":"article","og:title":"Asylm, Ruets, PDB on Santa Monica","og:description":"Not a new piece, but a well-hidden gem a little south of Santa Monica Blvd. in an alley off of Heliotrope or Edgemont. I've been sitting on this for a w...","og:url":"http://www.laweekly.com/news/asylm-ruets-pdb-on-santa-monica-2368012","article:published_time":"2008-04-22T15:11:15-07:00","article:modified_time":"2014-10-28T14:59:48-07:00","article:section":"News","article:tag":"Culture and Lifestyle","og:image":"http://images1.laweekly.com/imager/asylm-ruets-pdb-on-santa-monica/u/original/2430137/img_5027.jpg","og:image:height":"360","og:image:width":"480","og:site_name":"LA Weekly","twitter:title":"Asylm, Ruets, PDB on Santa Monica","twitter:description":"Not a new piece, but a well-hidden gem a little south of Santa Monica Blvd. in an alley off of Heliotrope or Edgemont. I've been sitting on this for a w...","twitter:card":"summary","twitter:image":"http://images1.laweekly.com/imager/asylm-ruets-pdb-on-santa-monica/u/original/2430137/img_5027.jpg","twitter:site":"#laweekly"
},
{"url":"http://www.laweekly.com/news/laurence-tribe-tangles-with-cbs-and-la-city-hall-2396867","og:type":"article","og:title":"Laurence Tribe Tangles with CBS and L.A. City Hall","og:description":"The United States Court of Appeals for the Ninth Circuit’s Courtroom 3 - a miniature auditorium with comfortable, smoked salmon-colored seats - wa...","og:url":"http://www.laweekly.com/news/laurence-tribe-tangles-with-cbs-and-la-city-hall-2396867","article:published_time":"2008-06-04T14:16:10-07:00","article:modified_time":"2014-11-26T14:43:59-08:00","article:section":"News","og:site_name":"LA Weekly","twitter:title":"Laurence Tribe Tangles with CBS and L.A. City Hall","twitter:description":"The United States Court of Appeals for the Ninth Circuit’s Courtroom 3 - a miniature auditorium with comfortable, smoked salmon-colored seats - wa...","twitter:card":"summary","twitter:site":"#laweekly"
}
],
"machine":{
"os":"win xp","ram":3221225472
},
"#version":"1"
}
and here's the mapping I've put in the Kibana's dev tool:
PUT logstash-2019.05.09
{
"mappings": {
"doc": {
"properties": {
"index": {
"_index": {
"type": "keyword"
},
"_type": {
"type": "text"
}
},
"#timestamp": {
"type": "date"
},
"ip": {
"type": "ip"
},
"extension": {
"type": "text"
},
"response": {
"type": "text"
},
"geo": {
"coordinates": {
"type": "geo_point"
},
"src": {
"type": "text"
},
"dest": {
"type": "text"
},
"srcdest": {
"type": "text"
}
},
"tags": {
"type": "text"
},
"utc_time": {
"type": "date"
},
"referer": {
"type": "text"
},
"agent": {
"type": "text"
},
"clientip": {
"type": "ip"
},
"bytes": {
"type": "integer"
},
"host": {
"type": "text"
},
"request": {
"type": "text"
},
"url": {
"type": "text"
},
"#message": {
"type": "text"
},
"spaces": {
"type": "text"
},
"xss": {
"type": "text"
},
"links": {
"type": "text"
},
"relatedContent": {
"url": {
"type": "text"
},
"og:type": {
"type": "text"
},
"og:title": {
"type": "text"
},
"og:description": {
"type": ""
},
"og:url": {
"type": ""
},
"article:published_time": {
"type": "date"
},
"article:modified_time": {
"type": "date"
},
"article:section": {
"type": "keyword"
},
"article:tag": {
"type": "text"
},
"og:image": {
"type": "text"
},
"og:image:height": {
"type": "integer"
},
"og:image:width": {
"type": "integer"
},
"og:site_name": {
"type": "text"
},
"twitter:title": {
"type": "text"
},
"twitter:description": {
"type": "text"
},
"twitter:card": {
"type": "keyword"
},
"twitter:image": {
"type": "text"
},
"twitter:site": {
"type": "keyword"
}
},
"machine": {
"os": {
"type": "text"
},
"ram": {
"type": "integer"
}
},
"#version": {
"type": "integer"
}
}
}
}
}
and here's the error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "No type specified for field [index]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [doc]: No type specified for field [index]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "No type specified for field [index]"
}
},
"status": 400
}
I've already search on internet to find some solutions but I didn't found anything that can help me.
You're missing the properties keyword for all your object fields. Use this mapping instead
PUT logstash-2019.05.09
{
"mappings": {
"doc": {
"properties": {
"#timestamp": {
"type": "date"
},
"ip": {
"type": "ip"
},
"extension": {
"type": "text"
},
"response": {
"type": "text"
},
"geo": {
"properties": {
"coordinates": {
"type": "geo_point"
},
"src": {
"type": "text"
},
"dest": {
"type": "text"
},
"srcdest": {
"type": "text"
}
}
},
"tags": {
"type": "text"
},
"utc_time": {
"type": "date"
},
"referer": {
"type": "text"
},
"agent": {
"type": "text"
},
"clientip": {
"type": "ip"
},
"bytes": {
"type": "integer"
},
"host": {
"type": "text"
},
"request": {
"type": "text"
},
"url": {
"type": "text"
},
"#message": {
"type": "text"
},
"spaces": {
"type": "text"
},
"xss": {
"type": "text"
},
"links": {
"type": "text"
},
"relatedContent": {
"properties": {
"url": {
"type": "text"
},
"og:type": {
"type": "text"
},
"og:title": {
"type": "text"
},
"og:description": {
"type": ""
},
"og:url": {
"type": ""
},
"article:published_time": {
"type": "date"
},
"article:modified_time": {
"type": "date"
},
"article:section": {
"type": "keyword"
},
"article:tag": {
"type": "text"
},
"og:image": {
"type": "text"
},
"og:image:height": {
"type": "integer"
},
"og:image:width": {
"type": "integer"
},
"og:site_name": {
"type": "text"
},
"twitter:title": {
"type": "text"
},
"twitter:description": {
"type": "text"
},
"twitter:card": {
"type": "keyword"
},
"twitter:image": {
"type": "text"
},
"twitter:site": {
"type": "keyword"
}
}
},
"machine": {
"properties": {
"os": {
"type": "text"
},
"ram": {
"type": "integer"
}
}
},
"#version": {
"type": "integer"
}
}
}
}
}

Elasticsearch aggregation on multiple fields across multiple indexes

I have two indexes - one for Application model and the other for Databases model (many-to-many relational).
Each document is denormalized to contain attributes from the other model
Application
|_ vendor_name
|_ databases
|_ db_1
|_ db_2
Database
|_ database_applications
|_ app_1
|_vendor_name
|_ app_2
|_ vendor_name
Executing a multi-index search for a vendor name - it seems I'm getting the proper results from both Indexes.
The challenge is properly aggregating on the vendor_name field
using the following aggregation seems to work when the result is only from Database. I also tried field: '*vendor_name' but doesn't seem to work.
What am I missing? Should the model be changed?
aggregation:
vendor_name: {
terms: {
field: "database_applications.vendor_name"
}
}
UPDATE 1:
As per #Andrie-Stefan - Here's a more accurate representation of both indexes mappings (abbreviated for shortness):
Database
{
"company-company_databases": {
"aliases": {},
"mappings": {
"company_database": {
"properties": {
"company_applications": {
"properties": {
"application_id": {
"type": "long"
},
"application_name": {
"type": "string"
},
"business_owner": {
"type": "string"
},
"company_system_applications": {
"properties": {
"allow_add_request": {
"type": "string"
},
"allow_remove_request": {
"type": "string"
},
"asset_type": {
"type": "string"
},
"company_application_id": {
"type": "long"
},
"company_application_name": {
"type": "string"
},
"company_business_owner": {
"type": "string"
},
"company_division_id": {
"type": "long"
},
"company_it_app_steward": {
"type": "string"
},
"company_notes": {
"type": "string"
},
"company_system_id": {
"type": "long"
},
"company_vendor": {
"type": "string"
},
"id": {
"type": "long"
},
"it_app_steward": {
"type": "string"
},
"it_owner": {
"type": "string"
},
"last_modified": {
"type": "string"
},
"last_modified_by": {
"type": "string"
},
"media_location": {
"type": "string"
},
"media_source": {
"type": "string"
},
"name": {
"type": "string"
},
"owned_by": {
"type": "string"
},
"status": {
"type": "string"
},
"status_id": {
"type": "long"
},
"system_application": {
"properties": {
"division": {
"type": "string"
},
"id": {
"type": "long"
},
"name": {
"type": "string"
},
"owner_id": {
"type": "string"
},
"status": {
"type": "string"
},
"steward_id": {
"type": "string"
},
"vendor_name": {
"type": "string"
},
"vendor_url_web_site": {
"type": "string"
},
"version": {
"type": "string"
}
}
},
"vendor_name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"version": {
"type": "string"
}
}
},
"division_id": {
"type": "long"
},
"it_app_steward": {
"type": "string"
},
"notes": {
"type": "string"
},
"software_inventory_id": {
"type": "long"
},
"vendor": {
"type": "string"
}
}
},
"company_instances": {
"properties": {
"business_environment_id": {
"type": "long"
},
"cgi_service_id": {
"type": "long"
},
"char_set": {
"type": "string"
},
"confirmed_license_purchase_dt": {
"type": "string"
},
"company_server": {
"properties": {
"business_environment_id": {
"type": "long"
},
"division_id": {
"type": "long"
},
"domain": {
"type": "string"
},
"hw_platform_id": {
"type": "long"
},
"ip_address": {
"type": "string"
},
"location_id": {
"type": "long"
},
"no_of_cpu": {
"type": "long"
},
"notes": {
"type": "string"
},
"os_platform_id": {
"type": "long"
},
"os_version": {
"type": "string"
},
"server_id": {
"type": "long"
},
"server_name": {
"type": "string"
}
}
},
"description": {
"type": "string"
},
"division_id": {
"type": "long"
},
"edition_id": {
"type": "long"
},
"instance_id": {
"type": "long"
},
"instance_name": {
"type": "string"
},
"itap_have_access": {
"type": "string"
},
"listener_port": {
"type": "long"
},
"notes": {
"type": "string"
},
"patch_number": {
"type": "string"
},
"rdbms_type_id": {
"type": "long"
},
"server_id": {
"type": "long"
},
"service_level_id": {
"type": "long"
},
"version": {
"type": "string"
}
}
},
"db_security_model_id": {
"type": "long"
},
"schema_or_db": {
"type": "string"
},
"schema_or_db_id": {
"type": "long"
},
"schema_or_db_type_id": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"creation_date": "1442976578465",
"uuid": "TxQZoNSpR5qa2Y2ERZzuYw",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1070299"
}
}
},
"warmers": {}
}
}
Application
{
"applications": {
"aliases": {},
"mappings": {
"application": {
"properties": {
"application_view": {
"properties": {
"app_name": {
"type": "string"
},
"app_status": {
"type": "string"
},
"app_steward_name": {
"type": "string"
},
"app_suite": {
"type": "string"
},
"app_vendor_name": {
"type": "string"
},
"app_version": {
"type": "string"
},
"assignment_group": {
"type": "string"
},
"business_domain_name": {
"type": "string"
},
"exception": {
"type": "string"
},
"id": {
"type": "long"
},
"it_owner_name": {
"type": "string"
},
"service_level": {
"type": "string"
}
}
},
"assignment_group": {
"type": "string"
},
"company_databases": {
"properties": {
"backup_history_info": {
"type": "string"
},
"company_applications": {
"properties": {
"alternate_name": {
"type": "string"
},
"application_id": {
"type": "long"
},
"application_name": {
"type": "string"
},
"business_owner": {
"type": "string"
},
"company_system_applications": {
"properties": {
"aka": {
"type": "string"
},
"allow_add_request": {
"type": "string"
},
"allow_remove_request": {
"type": "string"
},
"asset_type": {
"type": "string"
},
"contract_number": {
"type": "string"
},
"cost_level": {
"type": "string"
},
"company_alternate_name": {
"type": "string"
},
"company_application_id": {
"type": "long"
},
"company_application_name": {
"type": "string"
},
"company_business_owner": {
"type": "string"
},
"company_division_id": {
"type": "long"
},
"company_it_app_steward": {
"type": "string"
},
"company_notes": {
"type": "string"
},
"company_system_id": {
"type": "long"
},
"company_vendor": {
"type": "string"
},
"description": {
"type": "string"
},
"display_in_catalog": {
"type": "string"
},
"id": {
"type": "long"
},
"inform_of_removal": {
"type": "string"
},
"is_restricted": {
"type": "string"
},
"it_app_steward": {
"type": "string"
},
"it_owner": {
"type": "string"
},
"last_modified": {
"type": "string"
},
"last_modified_by": {
"type": "string"
},
"media_location": {
"type": "string"
},
"media_source": {
"type": "string"
},
"name": {
"type": "string"
},
"os_environment": {
"type": "string"
},
"owned_by": {
"type": "string"
},
"retirement_date": {
"type": "date",
"format": "dateOptionalTime"
},
"status": {
"type": "string"
},
"status_id": {
"type": "long"
},
"suite_name": {
"type": "string"
},
"system_application": {
"properties": {
"assignment_group": {
"type": "string"
},
"division": {
"type": "string"
},
"id": {
"type": "long"
},
"name": {
"type": "string"
},
"owner_id": {
"type": "string"
},
"status": {
"type": "string"
},
"steward_id": {
"type": "string"
},
"suite": {
"type": "string"
},
"vendor_name": {
"type": "string"
},
"vendor_url_web_site": {
"type": "string"
},
"version": {
"type": "string"
}
}
},
"vendor_name": {
"type": "string"
},
"version": {
"type": "string"
}
}
},
"division_id": {
"type": "long"
},
"it_app_steward": {
"type": "string"
},
"notes": {
"type": "string"
},
"software_inventory_id": {
"type": "long"
},
"vendor": {
"type": "string"
}
}
},
"company_instances": {
"properties": {
"business_environment_id": {
"type": "long"
},
"cgi_service_id": {
"type": "long"
},
"char_set": {
"type": "string"
},
"confirmed_license_purchase_dt": {
"type": "string"
},
"company_server": {
"properties": {
"business_environment_id": {
"type": "long"
},
"division_id": {
"type": "long"
},
"domain": {
"type": "string"
},
"hw_platform_id": {
"type": "long"
},
"ip_address": {
"type": "string"
},
"location_id": {
"type": "long"
},
"no_of_cpu": {
"type": "long"
},
"notes": {
"type": "string"
},
"os_platform_id": {
"type": "long"
},
"os_version": {
"type": "string"
},
"server_id": {
"type": "long"
},
"server_name": {
"type": "string"
}
}
},
"description": {
"type": "string"
},
"division_id": {
"type": "long"
},
"edition_id": {
"type": "long"
},
"instance_id": {
"type": "long"
},
"instance_name": {
"type": "string"
},
"itap_have_access": {
"type": "string"
},
"listener_port": {
"type": "long"
},
"location_id": {
"type": "long"
},
"notes": {
"type": "string"
},
"patch_number": {
"type": "string"
},
"rdbms_type_id": {
"type": "long"
},
"server_id": {
"type": "long"
},
"service_level_id": {
"type": "long"
},
"version": {
"type": "string"
}
}
},
"db_security_model_id": {
"type": "long"
},
"notes": {
"type": "string"
},
"schema_or_db": {
"type": "string"
},
"schema_or_db_id": {
"type": "long"
},
"schema_or_db_type_id": {
"type": "long"
}
}
},
"division": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"id": {
"type": "long"
},
"name": {
"type": "string"
},
"owner": {
"properties": {
"email_address": {
"type": "string"
},
"id": {
"type": "long"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"search_type": {
"type": "string"
},
"user_id": {
"type": "string"
}
}
},
"owner_id": {
"type": "string"
},
"status": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"steward": {
"properties": {
"email_address": {
"type": "string"
},
"id": {
"type": "long"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"search_type": {
"type": "string"
},
"user_id": {
"type": "string"
}
}
},
"steward_id": {
"type": "string"
},
"suite": {
"type": "string"
},
"vendor_name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"vendor_url_web_site": {
"type": "string"
},
"version": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1442970067540",
"uuid": "O7DTaCSESbqhjJpv62T0Wg",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1070299"
}
}
},
"warmers": {}
}
}
Fields can only be aggregated across indices if they are named alike. There is no wildcard syntax for aggregation fields.
Here is what your mapping currently defines:
INDEX: company-company_databases
TYPE: company_database
FIELD NAMES:
company_applications.company_system_applications.vendor_name
company_applications.company_system_applications.system_application.vendor_name
INDEX: applications
TYPE: application
FIELD NAMES:
company_databases.company_applications.company_system_applications.vendor_name
company_databases.company_applications.company_system_applications.system_application.vendor_name
As far as Elasticsearch is concerned, these fields have nothing in common (even though part of the path is vendor_name).
If your goal is to aggregate vendor_name across a query that spans the two indices, think about restructuring your indices/mappings to accomplish this.
Note that Elasticsearch doesn't model many-to-many relationships
If you can get away with duplicating Database info across applications, you might be able to re-formulate your relationships as a hierarchy, e.g.:
INDEX: applications
--
TYPE: application
FIELDS: vendor_name, etc...
--
TYPE: database_application
FIELDS: vendor_name, databases.<inner fields>, etc...
--
Then you'd be able to aggregate across types on the same field path vendor_name with the added bonus of querying a single applications index.

hive query avro uniontype

Similar to hive querying records for a specific uniontype
I have data on s3 in avro format and following is the avro structure:
{
"type": "record",
"name": "Event",
"namespace": "com.company.avro.event",
"fields": [
{
"name": "content",
"type": [
{
"type": "record",
"name": "Follow",
"fields": [
{
"name": "content",
"type": [
{
"type": "record",
"name": "UserFollowBrand",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "actor",
"type": "com.company.avro.entity.User"
},
{
"name": "verb",
"type": "string",
"default": "UserFollowBrand"
},
{
"name": "direct_object",
"type": "com.company.avro.entity.Brand"
},
{
"name": "on",
"type": [
"com.company.avro.type.IoSScreen",
"com.company.avro.type.AndroidScreen",
"null"
]
},
{
"name": "using",
"type": "com.company.avro.entity.App"
},
{
"name": "from",
"type": "string"
},
{
"name": "at",
"type": "long"
}
]
},
{
"type": "record",
"name": "UserFollowUser",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "actor",
"type": "com.company.avro.entity.User"
},
{
"name": "verb",
"type": "string",
"default": "UserFollowUser"
},
{
"name": "direct_object",
"type": "com.company.avro.entity.User"
},
{
"name": "on",
"type": [
"com.company.avro.type.IoSScreen",
"com.company.avro.type.AndroidScreen",
"null"
]
},
{
"name": "using",
"type": "com.company.avro.entity.App"
},
{
"name": "from",
"type": "string"
},
{
"name": "at",
"type": "long"
}
]
}
]
}
]
},
{
"type": "record",
"name": "Like",
"fields": [
{
"name": "content",
"type": [
{
"type": "record",
"name": "UserLikeListing",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "actor",
"type": "com.company.avro.entity.User"
},
{
"name": "verb",
"type": "string",
"default": "UserLikeListing"
},
{
"name": "direct_object",
"type": "com.company.avro.entity.Listing"
},
{
"name": "on",
"type": [
"com.company.avro.type.IoSScreen",
"com.company.avro.type.AndroidScreen",
"com.company.avro.type.WebScreen",
"null"
]
},
{
"name": "using",
"type": "com.company.avro.entity.App"
},
{
"name": "from",
"type": "string"
},
{
"name": "at",
"type": "long"
}
]
}
]
}
]
}
]
}
]
}
I am not sure how can I query for specific field within the uniontype.
For ex: select * from events where content.verb = "a" and content.actor.id = 34
Earlier hive did not support union types but now it seems it does support https://issues.apache.org/jira/browse/HIVE-2390
Unable to figure out how to use create_union function to query this.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypes

Resources