Indexing In ElasticSearch For Auditing - elasticsearch

There is a microservice-based architecture wherein each service has a different type of entity. For example:
Service-1:
{
"entity_type": "SKU",
"sku": "123",
"ext_sku": "201",
"store": "1",
"product": "abc",
"timestamp": 1564484862000
}
Service-2:
{
"entity_type": "PRODUCT",
"product": "abc",
"parent": "xyz",
"description": "curd",
"unit_of_measure": "gm",
"quantity": "200",
"timestamp": 1564484863000
}
Service-3:
{
"entity_type": "PRICE",
"meta": {
"store": "1",
"sku": "123"
},
"price": "200",
"currency": "INR",
"timestamp": 1564484962000
}
Service-4:
{
"entity_type": "INVENTORY",
"meta": {
"store": "1",
"sku": "123"
},
"in_stock": true,
"inventory": 10,
"timestamp": 1564484864000
}
I want to write an Audit Service backed by elasticsearch, which will ingest all these entities and it will index based on entity_type, store, sku, timestamp.
Will elasticsearch be a good choice here? Also, how will the indexing work? So, for example, if I search for store=1, it should return all the different entities that have store as 1. Secondly, will I be able to get all the entities between 2 timestamps?
Will ES and Kibana (to visualize) be good choices here?

Yes. Your use case is pretty much exactly what is described in the docs under filter context:
In filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Does this timestamp fall into the range 2015 to 2016?
Is the status field set to published?

Related

I used StartsWith but its not working in azure cosmosDB documents through linq

I have used STARTSWITH to filter the records through linq to fetch the data from azure cosmos db. but it always shows the record count as zero.
The query used to fetch the data:
var locs = from data
in locations
where data.Properties.Street.ToLower().StartsWith("chest")
select data;
Here's a sample from the document database:
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-0.141246,
51.56292
]
},
"properties": {
"city": "london",
"countryCode": "GB",
"parkingType": "permit holders only",
"approxNoOfSpaces": 4,
"timesOfOperation": "mon-fri 10:00-12:00",
"maximumStay": "N/A",
"tariff": "N/A",
"cashlessIdentifier": "N/A",
"nearestMachine": "N/A",
"street": "Chester Road",
"postcode": "N19 5DE",
"controlledParkingZone": "CA-U",
"validParkingPermits": "CA-U",
"parkingBayLengthMetres": 19,
"disclaimer": "The information provider and/ or licensor are not liable for any errors or omissions contained within this dataset and shall not be liable for any loss, injury or damage caused by its use.",
"easting": 528937,
"northing": 186530,
"EPSG:27700 Well Known Text Geometry": "LINESTRING (528943.9746311313 186522.8076569282,528944.9970187079 186524.33721909559,528929.9827624428 186537.32350181084,528928.7624933998 186535.9039081653)",
"EPSG:4326 Well Known Text Geometry": "LINESTRING (-0.14114833166779203 51.56285353559299,-0.14113302974759298 51.56286704785321,-0.14134475707247782 51.56298717904703,-0.14136287187179347 51.56297470018317)",
"externalFeatureId": 46044821,
"spatialAccuracy": "Defined By Custodian",
"lastUploaded": "07/08/2019 11:01:52 PM",
"location": "(51.56292, -0.141246)",
"organisationURI": "http://opendatacommunities.org/id/london-borough-council/camden",
":#computed_region_hxwp_gyfc": 10,
":#computed_region_6i9a_26nj": 66
},
"documentType": "location",
"id": "********************",
"_rid": "*******************",
"_self": "****************************",
"_etag": "*****************************",
"_attachments": "********************",
"_ts": 1565344913
}

ElasticSearch URI Search null field

I need to create a query via URI to filter all data between two dates and also if this date field is null.
For example:
I have the field "creation_date" in some objects, however I want that in the resulting also does not appear the objects that the field does not have.
I tried something similar below:
http://localhost//elasticsearch/channels/channel/_search?q=channel.schedule.creation_date:[2018-06-19 TO 2018-12-22] OR channel.schedule.creation_date: NULL
As far as comparing the dates is OK, it works. The problem is to get the NULL values.
Edited
Source sample:
"_source": {
"channel": {
"activated": false,
"approved": false,
"content": "Jvjv",
"creation_date": "2018-06-21T13:06:10.000Z",
"facebookLink": "J jv",
"id": "Kvjvjv",
"instagramId": "Jvjv",
"name": "Kbkbkvk",
"ownerId": "sZtxdhiNbNY9sr2DtiCzlgJfsqb2",
"plan": 0,
"purpose": "Jvjv",
"recurrence": 1,
"segment": "Jvjvjv",
"twitterId": "Jvjv",
"youtubeId": "Jvj"
}
}
}
You can do this using the NOT(_exists_:field_name) constraint:
Can you try this ?
http://localhost//elasticsearch/channels/channel/_search?q=channel.schedule.creation_date:[2018-06-19 TO 2018-12-22] OR NOT(_exists_:channel.schedule.creation_date)

What are good ways to solve a strange data retrieval issue in elastic search?

I've got a strange issue with an elastic search server.
The elastic search version is 1.6. 'records' is the name of the type. The url for the search is http://some.domain:9200/user/records/_search. The field mapping for 'un' is string.
The following query which been working for years is sometimes failing depending on the value of {someId} newer ids fail, old ones work. The data is there it's just not being found ...
{
"from": 0,
"size": 1,
"sort": {
"un": "desc",
"_score": "desc"
},
"query": {
"query_string": {
"query": "un:\"{someId}\"",
"fields": [
"id",
"un",
"e",
"fn",
"ln",
"bn",
"jt",
"sy",
"c",
"st",
"p",
"fbid",
"lnid"
]
}
}
}
After doing some diagnostics I discovered the following query always works whether or not {someId} is old or new ...
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "records.un",
"query": "{someId}"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"aggs": {}
}
This is a sample document that matches with the second query and fails with the first.
{
"un": "xxxxxxx.xxxxxxx",
"e": "xxxxxxx",
"pswd": "xxxxxxx",
"fn": "xxxxxxx",
"ln": "xxxxxxx",
"bn": "xxxxxxx",
"jt": "",
"sy": "xxxxxxx",
"urole": "User",
"id": "xxxxxxx",
"status": "1",
"lld": "201704280016",
"cd": "201702100132",
"md": "201704280549",
"cc": "0",
"p": "",
"logo": "",
"mlogo": "",
"ad": "201702100132",
"com": "xxxxxxx",
"rr": "true",
"sid": "00000000-0000-0000-0000-000000000000",
"fbidp": "",
"lnidp": "",
"role": "Lots of data is in this one",
"dim": "",
"drm": "",
"drcm": "xxxxxxx",
"drcfbm": "xxxxxxx",
"drclnm": "xxxxxxx",
"as": "false",
"apr": "true",
"iuid": "xxxxxxx",
"vcount": "9",
"pplatform": "",
"pname": "",
"pid": "00000000-0000-0000-0000-000000000000",
"preciept": "",
"ms": "Free"
}
I'm thinking that reindexing the server might solve the issue. What are good ways to solve strange data retrieval issues in elastic search?
There is significant difference between your first ("query": "un:\"{someId}\"") query and second ("query": "{someId}") query. In former query as you are wrapping someId in quotes as a result it will search for exact phrase i.e if you have xxx.yyy then it will look for whole id including dot(.) so id will be matched only when id doesn't contains dot where as in latter query your someId will be analyzed i.e xxx.yyy will be tokenized into two strings (xxx and yyy) and it will be matched if you have dot.
You need to change mappings of un field. If you are not doing any full-text search queries on un then I'd suggest you to make it not_analyzed. Otherwise you need to use different analyzer like whitespace instead of default standard analyzer. I'd really suggest to go with former solution as it(structured exact fields) is more efficient than latter.

Elastic Search. Search by sub-collection value

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..
What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

ElasticSearch _Source is always empty on the return

I am posting a query to http://localhost:9200/movie_db/movie/_search but _source attribute is always empty on the return resposne. I made it enabled but that doesn't help.
Movie DB:
TRY DELETE /movie_db
PUT /movie_db {"mappings": {"movie": {"properties": {"title": {"type": "string", "analyzer": "snowball"}, "actors": {"type": "string", "position_offset_gap" : 100, "analyzer": "standard"}, "genre": {"type": "string", "index": "not_analyzed"}, "release_year": {"type": "integer", "index": "not_analyzed"}, "description": {"_source": true, "type": "string", "analyzer": "snowball"}}}}}
BULK INDEX movie_db/movie
{"_id": 1, "title": "Hackers", "release_year": 1995, "genre": ["Action", "Crime", "Drama"], "actors": ["Johnny Lee Miller", "Angelina Jolie"], "description": "High-school age computer expert Zero Cool and his hacker friends take on an evil corporation's computer virus with their hacking skills."}
{"_id": 2, "title": "Johnny Mnemonic", "release": 1995, "genre": ["Science Fiction", "Action"], "actors": ["Keanu Reeves", "Dolph Lundgren"], "description": "A guy with a chip in his head shouts incomprehensibly about room service in this dystopian vision of our future."}
{"_id": 3, "title": "Swordfish", "release_year": 2001, "genre": ["Action", "Crime"], "actors": ["John Travolta", "Hugh Jackman", "Halle Berry"], "description": "A cast of characters challenge society's commonly held view that computer experts are not the beautiful people. Somehow, the CIA is hacked in under 5 minutes."}
{"_id": 4, "title": "Tomb Raider", "release_year": 2001, "genre": ["Adventure", "Action", "Fantasy"], "actors": ["Angelina Jolie", "Jon Voigt"], "description": "The story of a girl and her quest for antiquities in the face of adversity. This epic is adapter from its traditional video-game format to the big screen"}
Query:
{
"query" :
{
"term" : { "genre" : "Crime" }
},
}
Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
"_index": "movie_db",
"_type": "movie",
"_id": "3",
"_score": 0.30685282,
"_source": {}
},
{
"_index": "movie_db",
"_type": "movie",
"_id": "1",
"_score": 0.30685282,
"_source": {}
}
]
}
}
I had the same problem: despite enabling _source in my query as well as in my mappings, _source would always be {}.
Your proposed solution of setting cluster.name in elasticsearch.yml gave me the hint that the problem must be some hidden setting in the old cluster.
I found out that I had an index template definition that came with a plugin I installed (in my case elasticsearch-transport-couchbase), which said
"_source" : {
"includes" : [ "meta.*" ]
},
thereby implicitely excluding all fields other than meta.* from source.
Check your templates like this:
curl -XGET localhost:9200/_template/?pretty
I deleted the couchbase template like so
curl -XDELETE localhost:9200/_template/couchbase
and created a new, almost identical one but with source enabled.
Here is how:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
Solution:
In elasticsearch config folder, open elasticsearch.yml and set cluster.name to a different value, then restart elasticsearch.bat
I once accidentally passed a single field in source array and that too didn't exist. Just for example "_source": ["bazinga"] and in the aggregations result source was empty.
So maybe you could simple pass a totally unrelated string into the _source array. This can be a better solution instead of making changes in the elasticsearch.yml file.

Resources