ElasticSearch query/search/match - elasticsearch

I have inserted 3 records in my ElasticSearch index as follows:
curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1' -d '
{ "cityNames" : [ { "language" : "ENG",
"name" : "w bridgewater",
"raw_name" : "W BRIDGEWATER"
},
{ "language" : "ENG",
"name" : "west bridgewater",
"raw_name" : "West Bridgewater"
}
],
"id" : 1,
"streetNames" : [ { "language" : "ENG",
"name" : "cram rd",
"raw_name" : "Cram Rd"
} ]
}'
curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1' -d '
{ "cityNames" : [ { "language" : "ENG",
"name" : "bridgewater corners",
"raw_name" : "BRIDGEWATER CORNERS"
},
{ "language" : "ENG",
"name" : "bridgewater center",
"raw_name" : "Bridgewater Center"
}
],
"id" : 2,
"streetNames" : [ { "language" : "ENG",
"name" : "valley view rd",
"raw_name" : "Valley View Rd"
} ]
}'
curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1' -d '
{ "cityNames" : [ { "language" : "ENG",
"name" : "bridgewater",
"raw_name" : "Bridgewater"
},
{ "language" : "ENG",
"name" : "windsor",
"raw_name" : "Windsor"
}
],
"id" : 3,
"streetNames" : [ { "language" : "ENG",
"name" : "valley view rd",
"raw_name" : "Valley View Rd"
} ]
}'
And I perform a search as follows:
curl -XGET 'http://127.0.0.1:9200/geoindex_test/STREET/_search?pretty=1' -d '
{
"query" : {
"match" : { "cityNames.name" : "bridgewater" }
}
}'
I thought ElasticSearch would return the third record (id == 3) as the best match (record 3 is the only exact match to "bridgewater"), but instead it returns the record for id 1 (w bridgewater) as the best match. What am I doing wrong?

I imagine this is happening because you are using inner objects which basically collapse the objects under it, into one for search purposes. So when you're querying the search field for Object 1, for example, you're querying against ["w bridgewater", "west bridgewater"] and not discrete fields as you may imagine.
Since 'bridgewater' appears twice in object 1 and 2 (two name fields) vs once in object 3, those items rank higher in the search. Object 1 is ultimately picked, because the fields that 'bridgewater' appears in are shorter strings than in Object 2 ("w bridgewater" vs "bridgewater corners").
Instead of using inner objects like you're doing, use nested objects instead http://www.elasticsearch.org/guide/reference/mapping/nested-type/. setting the score mode to "max" will then make things match in a more intuitive manner for you.

Related

Mongoose + GraphQL (Apollo Server) Schema

We have db collection which is little complicated. Many of our keys are JSON objects where fields aren't fixed and change based on input given by user on UI. How should we write mongoose and GraphQL Schema for such complex type ?
{
"_id" : ObjectId("5ababb359b3f180012762684"),
"item_type" : "Sample",
"title" : "This is sample title",
"sub_title" : "Sample sub title",
"byline" : "5c6ed39d6ed6def938b71562",
"lede" : "Sample description",
"promoted" : "",
"slug" : [
"myurl"
],
"categories" : [
"Technology"
],
"components" : [
{
"type" : "Slide",
"props" : {
"description" : {
"type" : "",
"props" : {
"value" : "Sample value"
}
},
"subHeader" : {
"type" : "",
"props" : {
"value" : ""
}
},
"ButtonWorld" : {
"type" : "a-button",
"props" : {
"buttonType" : "product",
"urlType" : "Internal Link",
"isServices" : false,
"title" : "Hello World",
"authors" : [
{
"__dataID__" : "Qm9va0F1dGhvcjo1YWJhYjI0YjllNDIxNDAwMTAxMGNkZmY=",
"_id" : null,
"First_Name" : "John",
"Last_Name" : "Doe",
"Display_Name" : "John Doe",
"Slug" : "john-doe",
"Role" : 1
}
],
"isbns" : [
"9781497603424"
],
"image" : "978-cover.jpg",
"price" : "8.99",
"bisacs" : [],
"customCategories" : [],
},
"salePrice" : {
"type" : "",
"props" : {
"value" : ""
}
}
}
},
"tags" : [
{
"id" : "5abab58042e2c90011875801",
"name" : "Tag Test 1"
},
{
"id" : "5abab5831242260011c248f9",
"name" : "Tag Test 2"
},
{
"id" : "592450e0b1be5055278eb5c6",
"name" : "horror",
},
{
"id" : "59244a96b1be5055278e9b0e",
"name" : "Special Report",
"_id" : "59244a96b1be5055278e9b0e"
}
],
"created_at" : ISODate("2018-03-27T21:44:21.412Z"),
"created_by" : ObjectId("591345bda429e90011c1797e")
}
I believe Mongoose have Mixed type but how do i represent such complex type in Apollo GraphQL Server and Mongoose Schema. Also, currently my resolver is just models.product.find(). So if i have such complex type, need to understand what update needs to make to my resolver.
It will be great if i get complete solution for GraphQL Apollo schema, mongoose schema and resolver for my data.
Finally found solution for problem.
You can declare new type and reference it in typeDef for GraphQL Schema.
In mongoose model, you can reference it as {type: Array}

Number of records processed in logstash

We're using logstash to sync Elastic search and we've around 3 million documents. It takes 3 to 4 hours to sync. Currently all we get is, it is started and stopped. Is there any way to see how many records processed in logstash ?
If you're using Logstash 5 and higher, the Logstash Monitoring API can help you. You can see and monitor what's happening inside Logstash as it processes events. If you hit the Pipeline stats API you'll get the total number of processed events per stage and plugin (input/filter/output):
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
You'll get this type of response in which you can clearly see at any time how many events have been processed:
{
"pipelines" : {
"test" : {
"events" : {
"duration_in_millis" : 365495,
"in" : 216485,
"filtered" : 216485,
"out" : 216485,
"queue_push_duration_in_millis" : 342466
},
"plugins" : {
"inputs" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
"events" : {
"out" : 216485,
"queue_push_duration_in_millis" : 342466
},
"name" : "beats"
} ],
"filters" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
"events" : {
"duration_in_millis" : 55969,
"in" : 216485,
"out" : 216485
},
"failures" : 216485,
"patterns_per_field" : {
"message" : 1
},
"name" : "grok"
}, {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
"events" : {
"duration_in_millis" : 3326,
"in" : 216485,
"out" : 216485
},
"name" : "geoip"
} ],
"outputs" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
"events" : {
"duration_in_millis" : 278557,
"in" : 216485,
"out" : 216485
},
"name" : "elasticsearch"
} ]
},
"reloads" : {
"last_error" : null,
"successes" : 0,
"last_success_timestamp" : null,
"last_failure_timestamp" : null,
"failures" : 0
},
"queue" : {
"type" : "memory"
}
}
}

Partial update overwriting whole structure

I'm indexing a new document with the following content
{
"lastUpdate" : "20180114144020452",
"name" : "My Process",
"startDate" : "20180114162356585",
"endData" : "",
"tasks" : [
{
"1" : {
"lastUpdate" : "20180114144020452",
"taskId" : "123",
"subject" : "Terceira Atividade",
"status" : "Active",
"type" : "userTask",
"assign" : [
{
"date" : "20180114144020452",
"type" : "role",
"name" : "Time 3",
"id" : "Team3_345"
}
],
"receivedDate" : "",
"readDate" : "",
"finishDate" : ""
}
}
]
}
And then I'm trying to change task.1.status value with the following update content
{
"doc" : {
"tasks" : [
{
"1" : {
"status" : "Closed"
}
}
]
}
}
But it's overwriting the whole task.1 structure, deleting other values and letting only status value to closed instead of keep other values and change only status value.
How can I solve this? Thanks
You need to do it via a scripted partial updated like this
POST updates/update/1/_update
{
"script": {
"source": "ctx._source.tasks[0].1.status = 'Closed'"
}
}

Update object in array with new fields mongodb

ai have some mongodb document
horses is array with id, name, type
{
"_id" : 33333333333,
"horses" : [
{
"id" : 72029,
"name" : "Awol",
"type" : "flat",
},
{
"id" : 822881,
"name" : "Give Us A Reason",
"type" : "flat",
},
{
"id" : 826474,
"name" : "Arabian Revolution",
"type" : "flat",
}
}
I need to add new fields
I thought something like that, but I did not go to his head
horse = {
"place" : 1,
"body" : 11
}
Card.where({'_id' => 33333333333}).find_and_modify({'$set' => {'horses.' + index.to_s => horse}}, upsert:true)
But all existing fields are removed and inserted new how to do that would be new fields added to existing
Indeed, this command will overwrite the subdocument
'$set': {
'horses.0': {
"place" : 1,
"body" : 11
}
}
You need to set individual fields:
'$set': {
'horses.0.place': 1,
'horses.0.body': 11
}

ElasticSerach - Statistical facets on length of the list

I have the following sample mappipng:
{
"book" : {
"properties" : {
"author" : { "type" : "string" },
"title" : { "type" : "string" },
"reviews" : {
"properties" : {
"url" : { "type" : "string" },
"score" : { "type" : "integer" }
}
},
"chapters" : {
"include_in_root" : 1,
"type" : "nested",
"properties" : {
"name" : { "type" : "string" }
}
}
}
}
}
I would like to get a facet on number of reviews - i.e. length of the "reviews" array.
For instance, verbally spoken results I need are: "100 documents with 10 reviews, 20 documents with 5 reviews, ..."
I'm trying the following statistical facet:
{
"query" : {
"match_all" : {}
},
"facets" : {
"stat1" : {
"statistical" : {"script" : "doc['reviews.score'].values.size()"}
}
}
}
but it keeps failing with:
{
"error" : "SearchPhaseExecutionException[Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w][facettest][0]: QueryPhaseExecutionException[[facettest][0]: query[ConstantScore(NotDeleted(cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter#a2a5984b)))],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException[[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup]
[Near : {... doc[reviews.score].values.size() ....}]
^
[Line: 1, Column: 5]]; }]",
"status" : 500
}
How can I achieve my goal?
ElasticSearch version is 0.19.9.
Here is my sample data:
{
"author" : "Mark Twain",
"title" : "The Adventures of Tom Sawyer",
"reviews" : [
{
"url" : "amazon.com",
"score" : 10
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
{
"author" : "Jack London",
"title" : "The Call of the Wild",
"reviews" : [
{
"url" : "amazon.com",
"score" : 8
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
},
{
"url" : "www.books.com",
"score" : 5
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
It looks like you are using curl to execute your query and this curl statement looks like this:
curl localhost:9200/my-index/book -d '{....}'
The problem here is that because you are using apostrophes to wrap the body of the request, you need to escape all apostrophes that it contains. So, you script should become:
{"script" : "doc['\''reviews.score'\''].values.size()"}
or
{"script" : "doc[\"reviews.score"].values.size()"}
The second issue is that from your description it looks like your are looking for a histogram facet or a range facet but not for a statistical facet. So, I would suggest trying something like this:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"key_script" : "doc[\"reviews.score\"].values.size()",
"value_script" : "doc[\"reviews.score\"].values.size()",
"interval" : 1
}
}
}
}'
The third problem is that the script in the facet will be called for every single record in the result list and if you have a lot of results it might take really long time. So, I would suggest indexing an additional field called number_of_reviews that should be populated with the number of reviews by your client. Then your query would simply become:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"field" : "number_of_reviews"
"interval" : 1
}
}
}
}'

Resources