AWS Open Search/Elastic search wild card search on full index - elasticsearch

For example I have below 1 json data sample where multiple fields having the value '1001'. Like this I have many Json document. I want to search particular keyword like '1001' across any field (can be nested json field as well). I have gone through multiple document where they are suggesting to put the particular field name to search. Is there a way to achieve this without knowing which field has the search text?
URL: https://linuxhint.com/wildcard-query-elasticsearch/
{
"id": "1001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "1001" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "1001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}

Related

Return document based on nested array matched field count in Elastic search

Using Elastic version 7.15.1
{
"mappings": {
"properties": {
"Activity": {
"type": "nested",
"properties": {
"Data": {
"type": "text"
},
"Type": {
"type": "keyword"
},
"created_at": {
"type": "date"
},
"updated_at": {
"type": "date"
}
}
},
"FirstName": {
"type": "text",
"analyzer": "standard_autocomplete",
"search_analyzer": "standard_autocomplete_search"
}
}
}
}
Example Data
{
"Activity": [
{
"Type": "type1",
"Data": "data",
"created_at": "2022-08-08T15:23:58.000000Z"
},
{
"Type": "type1",
"Data": "data",
"created_at": "2022-08-08T15:25:45.000000Z"
},
{
"Type": "type2",
"Data": "data",
"created_at": "2022-08-08T15:26:03.000000Z"
}
],
"FirstName": "Testtt"
}
Want this document to return only if "Activity.Type" is "type1" and the count of the "type1" is greater than 1.
Also how can we use created_at in nested array with above constraint

How can i do nested field queries in Elastic search using Lucene query syntax

Here is the simple usecase,
I have a system that sends the Lucene query to my elastic search. I have a mapping
{
"mappings": {
"properties": {
"grocery_name":{
"type": "text"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"stock": {
"type": "integer"
},
"category": {
"type": "text"
}
}
}
}
}
}
and the data looks like
{
"grocery_name": "Elastic Eats",
"items": [
{
"name": "banana",
"stock": "12",
"category": "fruit"
},
{
"name": "peach",
"stock": "10",
"category": "fruit"
},
{
"name": "carrot",
"stock": "9",
"category": "vegetable"
},
{
"name": "broccoli",
"stock": "5",
"category": "vegetable"
}
]
}
How can I query to get all items where the item name is banana and stock > 10, In KQL i can write something like items:{ name:banana and stock > 10 }
The Lucene expression language doesn't support querying nested documents. That's why the KQL language fills that gap.
That's currently the only way to query nested documents via the Kibana search bar.

How to get logs using rest api in apache nifi

I went through several guides and couldn't find a way to get the logs with related information like data size of the flowfile(shown in the image) using rest api (or otherway if rest api is not possible).Eventhough nifi writes these logs to app-logs, Other related details can not find from there. How can I do that?
EDIT
According to comment from daggett,I have the rest api - http://localhost:8080/nifi-api/flow/bulletin-board, which solved my half of the question. Now I need to know who I can get the flowfile details which caused to the bulletin.
There are few controller services provided by nifi which gives in-depth information about status of nifi as well as information about flowfiles. One of those services is SiteToSiteProvenanceReportingTask
which you can use to derive the information about the failed file.
These controller services basically send information about flowfile as json data which can be queried or processed as flowfile in nifi.
Here is json data that above controller service returns -
{
"type" : "record",
"name" : "provenance",
"namespace" : "provenance",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventOrdinal", "type": "long" },
{ "name": "eventType", "type": "string" },
{ "name": "timestampMillis", "type": "long" },
{ "name": "durationMillis", "type": "long" },
{ "name": "lineageStart", "type": { "type": "long", "logicalType": "timestamp-millis" } },
{ "name": "details", "type": ["null", "string"] },
{ "name": "componentId", "type": ["null", "string"] },
{ "name": "componentType", "type": ["null", "string"] },
{ "name": "componentName", "type": ["null", "string"] },
{ "name": "processGroupId", "type": ["null", "string"] },
{ "name": "processGroupName", "type": ["null", "string"] },
{ "name": "entityId", "type": ["null", "string"] },
{ "name": "entityType", "type": ["null", "string"] },
{ "name": "entitySize", "type": ["null", "long"] },
{ "name": "previousEntitySize", "type": ["null", "long"] },
{ "name": "updatedAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "previousAttributes", "type": { "type": "map", "values": "string" } },
{ "name": "actorHostname", "type": ["null", "string"] },
{ "name": "contentURI", "type": ["null", "string"] },
{ "name": "previousContentURI", "type": ["null", "string"] },
{ "name": "parentIds", "type": { "type": "array", "items": "string" } },
{ "name": "childIds", "type": { "type": "array", "items": "string" } },
{ "name": "platform", "type": "string" },
{ "name": "application", "type": "string" },
{ "name": "remoteIdentifier", "type": ["null", "string"] },
{ "name": "alternateIdentifier", "type": ["null", "string"] },
{ "name": "transitUri", "type": ["null", "string"] }
]
}
entityId ,entitySize is what you may be looking for.

Nifi : Nested JSON records schema validation

I'm trying to split a JSON file containing nested records using the SplitRecord processor.
As a result, I always get a null value instead of the expected array of records:
{"userid":"xxx","bookmarks":null}
Below is sample JSON
{
"userid": "Ib6gZ8ZPwRBbAL0KRSSKS",
"bookmarks": [
{
"id": "10000XXXXXXW0007760",
"creator": "player",
"position": 42.96
},
{
"id": "41ANSMARIEEW0075484",
"creator": "player",
"position": 51.87
},
{
"id": "ALBATORCORSW0088197",
"creator": "player",
"position": 93.47
},
{
"id": "ALIGXXXXXXXW0007944",
"creator": "player",
"position": 95.06
}
]
}
And here is my avro Schema:
{
"namespace": "nifi",
"name": "bookmark",
"type": "record",
"fields": [
{ "name": "userid", "type": "string" },
{ "name": "bookmarks", "type": {
"type": "record",
"name": "bookmarks",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "creator", "type": "string" },
{ "name": "position", "type": "float" }
]
}
}
]
}
Any help would be greatly appreciated !
I had to implement a specific groovy processor to overcome the limitations of nifi, which took me a lot of time. The management of avro schemes is limited to the simplest cases, and does not work for advanced treatments.

Elasticsearch how to exclude some fields in indexing

I have a JSON data that maps automatically by the elastic search when I'm indexing the data. How can I exclude some fields in the mapping. I already tried to define the map manually but when I'm doing bulk index, It automatically maps the other fields.
ex. my JSON data looks like this
[
{
"id": "232",
"name": "Lorem",
"description": "Ipsum Dolor",
"image": [
{"key": "asadasd.jpg"},
{"key": "asasd2d.jpg"}
],
"is_active": true
},
...
My map when I'm defining it manually
PUT myindex
{
"mappings": {
"product": {
"properties": {
"id": { "type": "text },
"name": { "type": "text"},
"description": { "type": "text" },
"is_active": { "type": "boolean" }
}
}
}
}
What I want to achieve is the data still remain I just want to exclude the image property to be not included in the indexing.
So that When I query in the elastic search is still I get the data with image
Is that possible?
Thank you guys. I'm new in elasticsearch
{
"id": "232",
"name": "Lorem",
"description": "Ipsum Dolor",
"image": [
{"key": "asadasd.jpg"},
{"key": "asasd2d.jpg"}
],
"is_active": true
}
Yes, that's possible simply by adding dynamic: false to your mapping, like this:
PUT myindex
{
"mappings": {
"product": {
"dynamic": false, <-- add this line
"properties": {
"id": {
"type": "text"
},
"name": {
"type": "text"
},
"description": {
"type": "text"
},
"is_active": {
"type": "boolean"
}
}
}
}
}
The image array will still be in the source, but the mapping won't be modified.
The problem with the accepted answer is, that you need to explicitly add mappings for all fields, which is not always wanted (e.g. for array types).
You could disable the field like this:
PUT myindex
{
"mappings": {
"product": {
"properties": {
"id": { "type": "text },
"name": { "type": "text"},
"description": { "type": "text" },
"is_active": { "type": "boolean" },
"image": { "type": "object, "enabled": false }
}
}
}
}
The image array is still going to be in the _source.
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html

Resources