Elastic Search. Search by sub-collection value

Elastic Search. Search by sub-collection value - elasticsearch

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..

What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

Related

How to use JSONpath to extract specific values

I'm using JSONpath to try and find data with an array of JSON objects but I'm struggling to get to the information I want. The array contains many objects similar to below where there are values for RecID throughout. If I use $..RecID I get them all when I only want the first Key.RecID of each object (with a value 1338438 in this example). Is there a way to only extract the top level Key.RecID value?
BTW I'm trying to do this in jMeter and I'm assuming JSONpath is the best way to do what I want but if there is a better way I'd be happy to hear about it.
Thanks in advance
[{
"Key": {
"RecID": 1338438
},
"Users": [{
"FullName": "Miss Burns",
"Users": {
"Key": {
"Name": "Burns",
"RecID": 1317474
}
}
},
{
"FullName": "Mrs Fisher",
"Users": {
"Key": {
"Name": "Fisher",
"RecID": 1317904
}
}
}
],
"User": {
"FullName": "Mrs Fisher",
"Key": {
"Name": "Fisher",
"RecID": 1317904
}
},
"Organisation": {
"Key": {
"RecID": 1313881
}
}
}]

How to cleanly batch queries together in Gremlin

I am writing a GraphQL resolver that retrieves all vertices by a particular edge using the following query (created returns label person):
software {
created {
name
}
}
Which would resolve to the following Gremlin Query for each software node found:
g.V().hasLabel('software').has('name', 'ripple').in('created')
This returns a result that includes all properties of the object:
{
"result": [
{
"#type": "d",
"#rid": "#24:0",
"#version": 6,
"#class": "person",
"in_knows": [
"#35:0"
],
"name": "josh",
"out_created": [
"#32:0",
"#33:0"
],
"age": 32,
"#fieldTypes": "in_knows=g,out_created=g"
}
],
"dbStats": {
...
}
}
I realize that this will fall foul on GraphQL's N+1 query so i'm trying to batch queries together using a Dataloader pattern. (i'm also hoping to do property selections, so i'm not asking the database to return too much info)
So i'm trying to craft a query like so:
g.V().union(
__.hasLabel('software').has('name', 'ripple').
project('parent', 'child').by('id').
by(__.in('created').fold()),
__.hasLabel('software').has('name', 'lop').
project('parent', 'child').by('id').
by(__.in('created').fold())
)
But this results in the following where the props are missing and it just includes the id of the vertices I want:
{
"result": [
{
"parent": "ripple",
"child": [
"#24:0"
]
},
{
"parent": "lop",
"child": [
"#22:0",
"#23:0",
"#24:0"
]
}
],
"dbStats": {
...
}
}
My Question is, how can I have the Gremlin query return all of the props for the found vertices and none of the other props? Should I even been doing batching this way?

For anyone else reading, the query I was trying to write wouldn't work because the TraversalSet created in the .by(_.in('created') can't be cast from a List to an ElementMap as the stream cardinality wouldn't be enforced. (You can only have one record per row, I think?)
My working query would be to duplicate the keys for each row and specify the props needed (the query below is ok for gremlin 3.3 as used in ODB, otherwise if you've got < gremlin 3.4 replace the last by step with be(elementMap('name', 'age')):
g.V().union(
__.hasLabel('software').has('name', 'ripple').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value())),
__.hasLabel('software').has('name', 'lop').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value()))
)
So that you get a result like this:
{"data": [
{
"parent": "ripple",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
},
{
"parent": "lop",
"child": {
"id": 5709,
"name": "peter",
"age": 35
}
},
{
"parent": "lop",
"child": {
"id": 5713,
"name": "marko",
"age": 29
}
},
{
"parent": "lop",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
}
]
}
Which would allow you to create a lookup where you concat all results for "lop" and "ripple" into arrays.

JMESPath current array index

In JMESPath with this query:
people[].{"index":#.index,"name":name, "state":state.name}
On this example data:
{
"people": [
{
"name": "a",
"state": {"name": "up"}
},
{
"name": "b",
"state": {"name": "down"}
},
{
"name": "c",
"state": {"name": "up"}
}
]
}
I get:
[
{
"index": null,
"name": "a",
"state": "up"
},
{
"index": null,
"name": "b",
"state": "down"
},
{
"index": null,
"name": "c",
"state": "up"
}
]
How do I get the index property to actually have the index of the array? I realize that #.index is not the correct syntax but have not been able to find a function that would return the index. Is there a way to include the current array index?

Use-case
Use Jmespath query syntax to extract the numeric index of the current array element, from a series of array elements.
Pitfalls
As of this writing (2019-03-22) this feature is not a part of the standard Jmespath specification.
Workaround
This is possible when running Jmespath from within any of various programming languages, however this must be done outside of Jmespath.

This is not exactly the form you requested but I have a possible answer for you:
people[].{"name":name, "state":state.name} | merge({count: length(#)}, #[*])
this request give this result:
{
"0": {
"name": "a",
"state": "up"
},
"1": {
"name": "b",
"state": "down"
},
"2": {
"name": "c",
"state": "up"
},
"count": 3
}
So each attribute of this object have a index except the last one count it just refer the number of attribute, so if you want to browse the attribute of the object with a loop for example you can do it because you know that the attribute count give the number of attribute to browse.

Best approch of Elastic Search time based feeds module?

I am new with elastic search and looking for the best solution with which i can create a feed module which have time based feeds along with there group and comment.
I learned little and come up with following.
PUT /group
{
"mappings": {
"groupDetail": {},
"content": {
"_parent": {
"type": "groupDetail"
}
},
"comment": {
"_parent": {
"type": "content"
}
}
}
}
so that will be placed separately as per index.
but than after i found one post where i found that parent child is costly operation for search than nested objects.
something like following is two group(feed) having details with content and comments as nested element.
{
"_index": "group",
"_type": "groupDetail",
"_id": 6829,
"_score": 1,
"_source": {
"groupid": 6829,
"name": "Jignesh Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": true,
"tags": [
"spotrs",
"surat"
],
"content": [
{
"contentid": 1,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 1"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 2,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
{
"_index": "group",
"_type": "groupDetail",
"_id": 6849,
"_score": 1,
"_source": {
"groupid": 6849,
"name": "Xyz Group Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": false,
"tags": [
"spotrs",
"food"
],
"content": [
{
"contentid": 3,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 3"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 4,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
now if i try to think with nested object than i confused if user add comment very frequently than reindexing factor will effect?
So main think i want to ask is which is the best approach with which i can add comment frequently and my content searching result is also faster.

Performance
Parent/child stores relevant data in same shards, as separately doc, which avoid the network;
Parent/child needs a joining process when retrieving data;
Nested object store the inner and outer object together, as a single doc;
So, we can infer:
Update nested object will re-index whole index, which can very expensive if your document is large;
Update parent or child alone will not affect the other one;
Searching nested object is a little fast, which save the process of joining;
Suggestions
As far as I understand your problem, you should use parent/child.
When your group's comments become more and more, adding a new comment will still re-index whole content, which can be very time-consuming;
On the other hand, search a comment with parent/child just need one more look up after finding the child, which is relative acceptable.
Furthermore, you should also take the rate of searching a comment comparing to adding a comment into account:
If you need searching a lot but a little new comments, maybe you can choose nested object;
Otherwise, choose parent/child;
By the way, you may combine both of them:
When this feed is active, use parent/child to store them;
When it is closed, i.e., no more comments can be added, move them to a new index with nested object;

If you do not specify more detailed info other than very frequently it is going to be hard to come up with a recommendation. Also you have not mentioned how your data looks like. A comment in a blog post might be happening rare, even in heated discussions. A comment/reply in a forum post (that will result in a huge document) might be sth very different. I'd personally start with nested and see how it goes, but I also do not know all the requirements, so this might be a very wrong answer.

What are good ways to solve a strange data retrieval issue in elastic search?

I've got a strange issue with an elastic search server.
The elastic search version is 1.6. 'records' is the name of the type. The url for the search is http://some.domain:9200/user/records/_search. The field mapping for 'un' is string.
The following query which been working for years is sometimes failing depending on the value of {someId} newer ids fail, old ones work. The data is there it's just not being found ...
{
"from": 0,
"size": 1,
"sort": {
"un": "desc",
"_score": "desc"
},
"query": {
"query_string": {
"query": "un:\"{someId}\"",
"fields": [
"id",
"un",
"e",
"fn",
"ln",
"bn",
"jt",
"sy",
"c",
"st",
"p",
"fbid",
"lnid"
]
}
}
}
After doing some diagnostics I discovered the following query always works whether or not {someId} is old or new ...
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "records.un",
"query": "{someId}"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 10,
"sort": [],
"aggs": {}
}
This is a sample document that matches with the second query and fails with the first.
{
"un": "xxxxxxx.xxxxxxx",
"e": "xxxxxxx",
"pswd": "xxxxxxx",
"fn": "xxxxxxx",
"ln": "xxxxxxx",
"bn": "xxxxxxx",
"jt": "",
"sy": "xxxxxxx",
"urole": "User",
"id": "xxxxxxx",
"status": "1",
"lld": "201704280016",
"cd": "201702100132",
"md": "201704280549",
"cc": "0",
"p": "",
"logo": "",
"mlogo": "",
"ad": "201702100132",
"com": "xxxxxxx",
"rr": "true",
"sid": "00000000-0000-0000-0000-000000000000",
"fbidp": "",
"lnidp": "",
"role": "Lots of data is in this one",
"dim": "",
"drm": "",
"drcm": "xxxxxxx",
"drcfbm": "xxxxxxx",
"drclnm": "xxxxxxx",
"as": "false",
"apr": "true",
"iuid": "xxxxxxx",
"vcount": "9",
"pplatform": "",
"pname": "",
"pid": "00000000-0000-0000-0000-000000000000",
"preciept": "",
"ms": "Free"
}
I'm thinking that reindexing the server might solve the issue. What are good ways to solve strange data retrieval issues in elastic search?

There is significant difference between your first ("query": "un:\"{someId}\"") query and second ("query": "{someId}") query. In former query as you are wrapping someId in quotes as a result it will search for exact phrase i.e if you have xxx.yyy then it will look for whole id including dot(.) so id will be matched only when id doesn't contains dot where as in latter query your someId will be analyzed i.e xxx.yyy will be tokenized into two strings (xxx and yyy) and it will be matched if you have dot.
You need to change mappings of un field. If you are not doing any full-text search queries on un then I'd suggest you to make it not_analyzed. Otherwise you need to use different analyzer like whitespace instead of default standard analyzer. I'd really suggest to go with former solution as it(structured exact fields) is more efficient than latter.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elastic Search. Search by sub-collection value - elasticsearch

What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out: https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

Related

How to use JSONpath to extract specific values

How to cleanly batch queries together in Gremlin

JMESPath current array index

Best approch of Elastic Search time based feeds module?

What are good ways to solve a strange data retrieval issue in elastic search?

Categories

Resources