How to store nested document as String in elastic search - elasticsearch

Context:
1) We are building a CDC pipeline (using kafka & connect framework)
2) We are using debezium for capturing mysql Tx logs
3) We are using Elastic Search connector to add documents to ES index
Sample change event generated by Debezium:
{
"source" : {
"before" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 0
},
"after" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 1
},
"source" : {
"version" : "0.7.5",
"name" : "__",
"server_id" : 252639387,
"ts_sec" : 1547805940,
"gtid" : null,
"file" : "mysql-bin-changelog.000570",
"pos" : 236,
"row" : 0,
"snapshot" : false,
"thread" : 614,
"db" : "bazaarify",
"table" : "state"
},
"op" : "u",
"ts_ms" : 1547805939683
}
What we want :
We want to visualize only 3 columns in kibana :
1) before - containing the nested JSON as string
2) after - containing the nested JSON as string
3) source - containing the nested JSON as string
I can think below possibilities here :
a) Either converting nested JSON as string
b) Combining column data in elastic search
I am a newbie to elastic search . Can someone please guide me how to do that.
I tried defining custom mapping as well but it is giving me exception.

You can always view your document as a Raw JSON in Kibana.
You don't need to manipulate it before indexing in elastic.
As this is related to visualization, handle this in Kibana only.
Check this link for a screenshot.
Refer this to add the columns which you want to see onto the results

I don't fully understand your use case, but if you would like to turn some json's to their representing strings, then you can use logstash for that, or even Elasticsearch ingest capabilities to convert an object (json) to a string.
From the link above, an example:
PUT _ingest/pipeline/my-pipeline-id { "description": "converts the
content of the id field to an integer", "processors" : [
{
"convert" : {
"field" : "source",
"type": "string"
}
} ] }

Related

NiFi Route on JSON Attribute

Trying to use NiFi to route on an attribute.
I am attempting to take a json file, where two of the json records contain the following attributes (there are other json documents with different attributes in this file):
{
"ts" : "2020-010-07T12:00:00.448392Z",
"uid" : "CHh3F30dkfueLhnxSk",
"id.orig_h" : "10.10.10.10",
"id.orig_p" : 19726,
"id.resp_h" : "172.10.10.20",
"id.resp_p" : 443,
"proto" : "tcp",
"conn_state" : "SH",
"local_orig" : false,
"local_resp" : false,
"missed_bytes" : 0,
"history" : "F",
"orig_pkts" : 1,
"orig_ip_bytes" : 52,
"resp_pkts" : 0,
"resp_ip_bytes" : 0}
{
"ts" : "2020-10-10T12:00:00.461880Z",
"uid" : "CdoiLnRscrxO1BSYb",
"id.orig_h" : "10.10.17.777",
"id.orig_p" : 40433,
"id.resp_h" : "172.10.10.77",
"id.resp_p" : 443,
"version" : "TLSv12",
"cipher" : "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"curve" : "secp777r1",
"server_name" : "connect-stackoverflow.questions.com",
"resumed" : false,
"established" : true,
"cert_chain_fuids" : [ "FR84qjkl2342SZLwV7", "Ffweqiof48b8j" ],
"client_cert_chain_fuids" : [ ],
"subject" : "CN=connect-ulastm.bentley.com",
"issuer" : "CN=Let's Encrypt Authority X3,O=Let's Encrypt,C=US",
"validation_status" : "ok"}
I want to specifically route on the $.conn_state attribute but it is not working. I have tried to match the expression with the evaluateJSONpath processor and passed it to routeOnAttribute. Here are my settings:
evaluateJSONpath processor:
The above processor does not match the json and forward the document
and the evaluateJSONpath is followed by the routeOnAttribute processor:
I have attempted to routeOnAttribute directly from my jsonRecord, but it does not appear to pull out or identify the attribute for routing...
How would I do this?
#Advent United
The JSON sample is more than one object, your evaluateJson is going to expect a single record in matching $.conn_state. For multiple record work against json or any data stream you should use QueryRecord. Once configured w/ record reader and record writer within in that processor you just click + to create the route and the value is the select statement where conn_state is not null. Then you can drag that route to the next processor.

document field returns null when querying groups of Prismic Content-Realtionship fields in graphql

Issue:
I am using Prismic to send data through to my website.
In Prismic I have a Type (testimonial_list) that consists of a group of content-relation fields (Prismic Type testimonials).
To query the data on the inner Types I need to access them via the document field in graphql and use inline-fragments.
I have followed as instructed here:
https://github.com/angeloashmore/gatsby-source-prismic#Query-Content-Relation-fields
Inside graphql I have managed to navigate to the testimonial data-fields (on the document field) but the document field returns null, this is where I'm stuck. I can't work out why it would return null as the content exists and the fields are clearly being found in graphql.
Info:
My project is built using Gatsby and I'm using the plugin gatsby-source-prismic v3.1.1
Here you can see I can access the correct field data and I am getting the right number of nodes returned but document is empty:
This is the JSON for the testimonial_list Type on Prismic:
{
"Main" : {
"prismic_title" : {
"type" : "StructuredText",
"config" : {
"single" : "heading6",
"label" : "Title (only used to name entry in Prismic list)",
"placeholder" : "Prismic list title (otherwise \"undefined\")"
}
},
"page" : {
"type" : "Select",
"config" : {
"options" : [ "Homepage", "Option 2", "Option 3" ],
"label" : "Website page to appear on:"
}
},
"testimonial_list" : {
"type" : "Group",
"config" : {
"fields" : {
"testimonial" : {
"type" : "Link",
"config" : {
"select" : "document",
"customtypes" : [ "testimonial" ],
"label" : "testimonial"
}
}
},
"label" : "Testimonial List"
}
}
}
}
Thank you for any help, if there is any more info I can supply to help deduce the issue please let me know.
In the end, the issue turned out to be a typo in my gatsby-config where I was requiring the schema.
It was a daft mistake but stare at something too long and these things happen I guess.
In case anybody else has a similar issue you must ensure the property names of your Prismic schemas inside your gatsby-config are exactly the same as in Prismic.
For example if your Type in Prismic is called "my_type" then you must use that exact syntax - so for example don't use "myType".
Hey it might be something related to the gatsby-source-prismic plugin
I would directly open an issue for it here if I were you: https://github.com/angeloashmore/gatsby-source-prismic/issues

Cosmos DB Collection not using _id index when querying by _id?

I have a CosmosDb - MongoDb collection that I'm using purely as a key/value store for arbitrary data where the _id is the key for my collection.
When I run the query below:
globaldb:PRIMARY> db.FieldData.find({_id : new BinData(3, "xIAPpVWVkEaspHxRbLjaRA==")}).explain(true)
I get this result:
{
"_t" : "ExplainResponse",
"ok" : 1,
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "data.FieldData",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 106,
"totalKeysExamined" : 0,
"totalDocsExamined" : 3571,
"executionStages" : {
},
"allPlansExecution" : [ ]
},
"serverInfo" : #REMOVED#
}
Notice that the totalKeysExamined is 0 and the totalDocsExamined is 3571 and the query took over 106ms. If i run without .explain() it does find the document.
I would have expected this query to be lightning quick given that the _id field is automatically indexed as a unique primary key on the collection. As this collection grows in size, I only expect this problem to get worse.
I'm definitely not understanding something about the index and how it works here. Any help would be most appreciated.
Thanks!

Elastic search Update by Query to Update Complex Document

I have a use case of elastic search to update a doc.
My doc is something like this-
{
"first_name" : "firstName",
"last_name" : "lastName",
"version" : 1234,
"user_roles" : {
"version" : 12345,
"id" : 1234,
"name" : "role1"},
},
"groups" : {
"version" : 123,
"list": [
{"id":123, "name" : "ashd"},
{"id":1234, "name" : "awshd"},
]
}
}
Now depepeding on some feed I will either will be updating the parent doc or will be updating the nested doc.
I am able to find how to update the basic attributes like firstName and lastName but unable to get how to update complex/nested ones
I did something like from REST client-
"script": {
"inline": "ctx._source.user_roles = { "id" : 5678, "name" :"hcsdl"}
}
but its giving me exception-
Actual use case-
I will actually be getting a Map in java.
This key can be simple key like "first_name" or can be complex key like "user_role" and "groups"
I want to update the document using update by query on version.
The code I wrote is something like-
for (String key : document.keySet()) {
String value = defaultObjectMapper.writeValueAsString(document.get(key));
scriptBuilder.append("ctx._source.");
scriptBuilder.append(key);
scriptBuilder.append('=');
scriptBuilder.append(value);
scriptBuilder.append(";");
}
where document is the Map
Now I might get the simple fields to update or complex object.
I tried giving keys like user_roles.id and user_roles.name and also tried giving complete user_roles but nothing is working.
Can someone helpout
Try this with groovy maps instead of verbatim JSON inside your script:
"script": {
"inline": "ctx._source.user_roles = [ 'id' : 5678, 'name' : 'hcsdl']}
}

elastic search filter by documents count in nested document

I have this schema in elastic search.
79[
'ID' : '1233',
Geomtries:[{
'doc1' : 'F1',
'doc2' : 'F2'
},
(optional for some of the documents)
{
'doc2' : 'F1',
'doc3' : 'F2'
}]
]
the Geometries is a nested element.
I want to get all of the documents that have one object inside Geometries.
Tried so far :
"script" : {"script" : "if (Geomtries.size < 2) return true"}
But i get exceptions : no such property GEOMTRIES
If you have the field as type nested in the mapping, the typical doc[fieldkey].values.size() approached does not seem to work. I found the following script to work:
{
"from" : 0,
"size" : <SIZE>,
"query" : {
"filtered" : {
"filter" : {
"script" : {
"script" : "_source.containsKey('Geomtries') && _source['Geomtries'].size() == 1"
}
}
}
}
}
NB: You must use _source instead of doc.
The problem is in the way you access fields in your script, use:
doc['Geometry'].size()
or
_source.Geometry.size()
By the way for performance reasons, I would denormalize and add GeometryNumber field. You can use the transform mapping to compute size at index time.

Resources