Parsing source fields in a SearchResult - elasticsearch

{
"hits": {
"total": 4,
"max_score": 12.914036,
"hits": [
{
"_index": "cars",
"_type": "sports",
"_id": "359809062-169200612195",
"_score": 12.914036,
"_source": {
"uniqueId": "35980",
"productName": "Tesla",
"Year": "2008"
}
},
{
"_index": "cars",
"_type": "sports",
"_id": "359809061-169200612191",
"_score": 11.914036,
"_source": {
"uniqueId": "33980",
"productName": "Ferrari",
"Year": "2015"
}
}
]
}
}
How to parse all the _source fields? Trying to return a list which contain only the _source fields in the hits.
val searchHits = searchResult.getHits(classOf[Object]).toList
searchHits.map(hit => {
CarDetails(
hit.source.get("uniqueId").getAsString(),
hit.source.get("productName").getAsString(),
hit.source.get("productName").getAsString(),
})
}
For this code, i get the error: value get is not a member of Object which is kind of expected.
Trying to parse the result without defining a model. Possible?
I'm Jest Client with Scala.
Context:
CarDetails is a case class. Basically, what i'm trying to do is parse the hits (only the items inside _source) and return a list of CarDetails objects to the method that calls this function.

Related

elasticsearch query for finding id in fields in json file

I have a json file that I indexed on elasticsearch and I need a query to retrieve "_id_osm". can you help me plz.
and this is one line of my json file:
{
"index": {
"_index": "pariss",
"_type": "sig",
"_id": 1
}
}{
"fields": {
"_id_osm": 416747747,
"_categorie": "",
"_name": [
""
],
"_location": [
36.1941834,
5.3595221
]
}
}
Based on the comments in the answer updated the answer,
If you have store true in your mapping for _id_osm then you can use below query to fetch the field value.
{
"stored_fields" : ["_id_osm"],
"query": {
"match": {
"_id": 1
}
}
}
Above call returns below response and you can notice the fields section in the response which contains the field name and value.
"hits": [
{
"_index": "intqu",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"_id_osm": [
416747747
]
}
}
]
If you don't have store true which is default, then use _source filtering to get the data.
{
"_source": [ "_id_osm" ],
"query": {
"match": {
"_id": 1
}
}
}
which returns below response, you can see _source has the data.
"hits": [
{
"_index": "intqu",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"_id_osm": 416747747
}
}
]

Elasticsearch: Transpose and aggregate the data

I am using the ES 6.5. When I fetch the required messages, I have to transpose and aggregate it. See example for more details.
Message retrieved - 2 messages retried for example:
{
"_index": "index_name",
"_type": "data",
"_id": "data_id",
"_score": 5.0851293,
"_source": {
"header": {
"id": "System_20190729152502239_57246_16667",
"creationTimestamp": "2019-07-29T15:25:02.239Z",
},
"messageData": {
"messageHeader": {
"date": "2019-06-03",
"mId": "1000",
"mDescription": "TEST",
},
"messageBreakDown": [
{
"category": "New",
"subCategory": "Sub",
"messageDetails": [
{
"Amount": 5.30
}
]
}
]
}
}
},
{
"_index": "index_name",
"_type": "data",
"_id": "data_id",
"_score": 5.09512,
"_source": {
"header": {
"id": "System_20190729152502239_57246_16667",
"creationTimestamp": "2019-07-29T15:25:02.239Z",
},
"messageData": {
"messageHeader": {
"date": "2019-06-03",
"mId": "1000",
"mDescription": "TEST",
},
"messageBreakDown": [
{
"category": "Old",
"subCategory": "Sub",
"messageDetails": [
{
"Amount": 4.30
}
]
}
]
}
}
}
Now I am looking for a query to post on ES which will transpose the data and group by on category and sub category .
So basically if you check the messages, they have same header.id (which is the main search criteria). Within this header.id, one message is for category New and other Old (messageData.messageBreakDown is array and in it category value).
So ideally as you see the output, both messages belong to same mId, and it has New price and Old Price.
How to aggregate for the desired results ?
Final output message can have desired fields only e.g. date, mId, mDesciption, New price and Old price (both in one output)?
UPDATE:
Below is the mapping,
{"index_name":{"mappings":{"data":{"properties":{"header":{"properties":{"id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"creationTimestamp":{"type":"date"}}},"messageData":{"properties":{"messageBreakDown":{"properties":{"category":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"messageDetails":{"properties":{"Amount":{"type":"float"}}},"subCategory":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"messageHeader":{"properties":{"mDescription":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"mId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"date":{"type":"date"}}}}}}}}}}

Elasticsearch "match_phrase" query and "fuzzy" query - can both be used in conjunction

I need a query using match_phrase along with fuzzy matching. However I'm not able to find any documentation to construct such a query. Also, when I try combining the queries(one within another), it throws errors. Is it possible to construct such a query?
You would need to make use of Span Queries.
The below query would perform phrase match+fuzzy query for champions league say for e.g. on a sample field name which is of type text
If you'd want multiple fields, then add another must clause.
Notice I've mentioned slop:0 and in_order:true which would do exact phrase match, while you achieve fuzzy behaviour using fuzzy queries inside match query.
Sample Documents
POST span-index/mydocs/1
{
"name": "chmpions leage"
}
POST span-index/mydocs/2
{
"name": "champions league"
}
POST span-index/mydocs/3
{
"name": "chompions leugue"
}
Span Query:
POST span-index/_search
{
"query":{
"bool":{
"must":[
{
"span_near":{
"clauses":[
{
"span_multi":{
"match":{
"fuzzy":{
"testField":"champions"
}
}
}
},
{
"span_multi":{
"match":{
"fuzzy":{
"testField":"league"
}
}
}
}
],
"slop":0,
"in_order":true
}
}
]
}
}
}
Response:
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.5753642,
"hits": [
{
"_index": "span-index",
"_type": "mydocs",
"_id": "2",
"_score": 0.5753642,
"_source": {
"name": "champions league"
}
},
{
"_index": "span-index",
"_type": "mydocs",
"_id": "1",
"_score": 0.5753642,
"_source": {
"name": "chmpions leage"
}
},
{
"_index": "span-index",
"_type": "mydocs",
"_id": "3",
"_score": 0.5753642,
"_source": {
"name": "chompions leugue"
}
}
]
}
}
Let me know if this helps!

Reindex multiple types from one index to single type in another index

I have two indexes:
twitter and reitwitter
twitter has multiple documents across different types like:
"hits": [
{
"_index": "twitter",
"_type": "tweet",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "tweet2",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch2"
}
},
{
"_index": "twitter",
"_type": "tweet1",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch1"
}
}
]
Now, when I reindex, I wanted to get rid of all the different types and just use one because essentially they have the same field mappings.
I tried several different combinations but I always only get one document instead of those three:
Approach 1:
POST _reindex/
{
"source": {
"index": "twitter"
}
,
"dest": {
"index": "reitwitter",
"type": "reitweet"
}
}
Response:
{
"took": 12,
"timed_out": false,
"total": 3,
"updated": 3,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
Note : It says updated 3 because this was the second time I made the same call I guess?
Second approach:
POST _reindex/
{
"source": {
"index": "twitter",
"query": {
"match_all": {
}
}
}
,
"dest": {
"index": "reitwitter",
"type": "reitweet"
}
}
Same response as first one.
In both cases when I make this GET call:
GET reitwitter/_search
{
"query": {
"match_all": {
}
}
}
I only get one document:
{
"_index": "reitwitter",
"_type": "reitweet",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch1"
}
Is this use case even supported by reindex ? If not, do I have to write a script using scan and scroll to get all the documents from source index and reindex them with same doc type in destination?
PS: I don't want to use "_source": ["tweet1", "tweet"] because I have around million doc type which have one document each that I want to map to the same doc type in the destination.
The problem is that all the documents has the same id(1), and then they are overriding themselves during the re-index process.
Try to index your documents with different ids and you will see it works.

ElasticSearch query with conditions on multiple documents

I have data of this format in elasticsearch, each one is in seperate document:
{ 'pid': 1, 'nm' : 'tom'}, { 'pid': 1, 'nm' : 'dick''},{ 'pid': 1, 'nm' : 'harry'}, { 'pid': 2, 'nm' : 'tom'}, { 'pid': 2, 'nm' : 'harry'}, { 'pid': 3, 'nm' : 'dick'}, { 'pid': 3, 'nm' : 'harry'}, { 'pid': 4, 'nm' : 'harry'}
{
"took": 137,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": null,
"hits": [
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KS86AaDUbQTYUmwY",
"_score": null,
"_source": {
"pid": 1,
"nm": "Harry"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KJ9BAaDUbQTYUmwW",
"_score": null,
"_source": {
"pid": 1,
"nm": "Tom"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KRlbAaDUbQTYUmwX",
"_score": null,
"_source": {
"pid": 1,
"nm": "Dick"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KYnKAaDUbQTYUmwa",
"_score": null,
"_source": {
"pid": 2,
"nm": "Harry"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KXL5AaDUbQTYUmwZ",
"_score": null,
"_source": {
"pid": 2,
"nm": "Tom"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KbcpAaDUbQTYUmwb",
"_score": null,
"_source": {
"pid": 3,
"nm": "Dick"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9Kdy5AaDUbQTYUmwc",
"_score": null,
"_source": {
"pid": 3,
"nm": "Harry"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KetLAaDUbQTYUmwd",
"_score": null,
"_source": {
"pid": 4,
"nm": "Harry"
}
}
]
}
}
And I need to find the pid's which have 'harry' and do not have 'tom', which in the above example are 3 and 4. Which essentialy means look for the documents having same pids where none of them has nm with value 'tom' but at least one of them have nm with value 'harry'.
How do I query that?
EDIT: Using Elasticsearch version 5
What if you have a POST request body which could look something like below, where you might use bool :
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "nm" : "harry" }
},
"must_not" : {
"term" : { "nm" : "tom" }
}
}
}
}
I am relatively very new in Elasticsearch, so I might be wrong. But I have never seen such query. Simple filters can not be used here as those are applied on a doc (and not aggregations) which you do not want. What I see is you want to do a "Group by" query with "Having" clause (in terms of SQL). But Group by queries involve some aggregation (like avg, max, min of any field) which is used in "Having" clause. Basically you use a reducer for Post processing of aggregation results. For queries like this Bucket Selector Aggregation can be used. Read this
But your case is different. You do not want to apply Having clause on any metric aggregation but you want to check if some value is present in field (or column) of your "group by" data. In terms of SQL, you want to do a "where" query in "group by". This is what I have never seen. You can also read this
However, at application level, you can easily do this by breaking your query. First find unique pid where nm= harry using term aggs. Then get docs for those pid with additional condition nm != tom.
P.S. I am very new to ES. And I will be very happy if any one contradicts me show ways to do this in one query. I will also learn that.

Resources