Seems, I have an issue with sorting the results. Below is my config:
curl -X PUT localhost:9200/companies -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_edge_ngram_analyzer" : {
"tokenizer" : "my_edge_ngram_tokenizer"
}
},
"tokenizer" : {
"my_edge_ngram_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "5",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
"company" : {
"properties" : {
"name" : { "type" : "string" },
"count" : {"type" : "long" },
"name_suggest" : {
"type" : "completion",
"index_analyzer": "my_edge_ngram_analyzer",
"search_analyzer": "my_edge_ngram_analyzer"
}
}
}
}
}'
Here are examples to put into ES:
curl -X PUT localhost:9200/companies/company/1 -d '
{
"name" : "1800flowers",
"count": 1000,
"name_suggest" : {
"input" : [
"1800 flowers.Com, Inc",
"1800 Flowers","1800-Flowers.com",
"1 800 Flowers",
"www.1800flowers.com",
"1800Flowers.com",
"Inc,1-800-FLOWERS.COM",
"1-800-FLOWERS.COM, INC",
"1800Flowers",
"1800Flowers Inc",
"1800Flowers.com (Consultant)",
"1-800-FLOWERS","1800Flowers.com",
"1800FLOWERS INTERNATIONAL",
"1-800 Flowers",
"1-800 FLOWERS.COM, INC",
"1-800-FLOWERS, Inc",
"1800 flowers.com",
"1-800Flowers.com",
"1-800 flowers.com",
"1800 Flowers Inc"
],
"output" : "1800 Flowers"
}
}'
curl -X PUT localhost:9200/companies/company/2 -d '
{
"name" : "1800 Ruby",
"count": 10000,
"name_suggest" : {
"input" : [
"1800"
],
"output" : "1800 Ruby"
}
}'
Now clearly if I do a text search on 1800, I should get both of these objects back, giving output "1800 Flowers", "1800 Ruby".
Now really, I would like these results to be sorted by count, descending, so that I should have: "1800 Ruby", "1800 Flowers" but this isn't working!
curl -X POST localhost:9200/companies/_suggest -d '
{
"query":{
"companies" : {
"text" : "18",
"completion" : {
"field" : "name_suggest"
}
}
},
"sort": [ {"count": {"order": "desc"} } ]
}'
You have to add a weight in the in the "name_suggest" field of the docs. Also, you cannot use the "query" syntax in a _suggest request.
So, using the same settings and mapping you have above, I can update the docs with suggester weights equal to the "count", as follows:
curl -XPUT "http://localhost:9200/companies/company/1" -d'
{
"name" : "1800flowers",
"count": 1000,
"name_suggest" : {
"input" : [
"1800 flowers.Com, Inc",
"1800 Flowers","1800-Flowers.com",
"1 800 Flowers",
"www.1800flowers.com",
"1800Flowers.com",
"Inc,1-800-FLOWERS.COM",
"1-800-FLOWERS.COM, INC",
"1800Flowers",
"1800Flowers Inc",
"1800Flowers.com (Consultant)",
"1-800-FLOWERS","1800Flowers.com",
"1800FLOWERS INTERNATIONAL",
"1-800 Flowers",
"1-800 FLOWERS.COM, INC",
"1-800-FLOWERS, Inc",
"1800 flowers.com",
"1-800Flowers.com",
"1-800 flowers.com",
"1800 Flowers Inc"
],
"output" : "1800 Flowers",
"weight" : 1000
}
}'
curl -XPUT "http://localhost:9200/companies/company/2" -d'
{
"name" : "1800 Ruby",
"count": 10000,
"name_suggest" : {
"input" : [
"1800"
],
"output" : "1800 Ruby",
"weight" : 10000
}
}'
And then a normal suggest request
curl -XPOST "http://localhost:9200/companies/_suggest" -d'
{
"companies": {
"text": "18",
"completion": {
"field": "name_suggest"
}
}
}'
yields the expected results
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"companies": [
{
"text": "18",
"offset": 0,
"length": 2,
"options": [
{
"text": "1800 Ruby",
"score": 10000
},
{
"text": "1800 Flowers",
"score": 1000
}
]
}
]
}
Here is a full runnable example:
http://sense.qbox.io/gist/21cac07480e4077e083b037c5ac00016b04503f2
Related
I have created and index the same as the example tutorials, in here...
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/geo-point.html
in specific writing the following:
curl -PUT 'localhost:9200/my_index?pretty' -d '
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}'
I have also added two points as data
curl -PUT 'localhost:9200/my_index/my_type/1?pretty' -d'
{
"text": "first geo-point",
"location": {
"lat": 41.12,
"lon": -71.34
}
}'
curl -PUT 'localhost:9200/my_index/my_type/1?pretty' -d'
{
"text": "second geo-point",
"location": {
"lat": 41.13,
"lon": -71.35
}
}'
The example geo bounding box query on the page works (i.e):
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 42,
"lon": -72
},
"bottom_right": {
"lat": 40,
"lon": -74
}
}
}
}
}'
But the example from this page (https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-geo-bounding-box-query.html) doesn't work:
What I have tried looks like the following:
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"my_type.location" : {
"top_left" : {
"lat" : 42,
"lon" : -72
},
"bottom_right" : {
"lat" : 40,
"lon" : -74
}
}
}
}
}
}'
The error I get is as follows:
"error" : {
"root_cause" : [ {
"type" : "search_parse_exception",
"reason" : "failed to parse search source. unknown search element [bool]",
"line" : 3,
"col" : 5
} ],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [ {
"shard" : 0,
"index" : "my_index",
"node" : "0qfvkynhTRyjHFRurBLJeQ",
"reason" : {
"type" : "search_parse_exception",
"reason" : "failed to parse search source. unknown search element [bool]",
"line" : 3,
"col" : 5
}
} ]
},
"status" : 400
}
I hope its just a simple error, so would like to know what am i doing wrong?
You need to specify that the whole thing is a query:
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"my_type.location" : {
"top_left" : {
"lat" : 42,
"lon" : -72
},
"bottom_right" : {
"lat" : 40,
"lon" : -74
}
}
}
}
}
}
}'
However as far as I understand using bool with must and filter is the old way of doing things. In previous versions, geo queries were thought of as "filters", so you had to first run a match_all query to return all the results, and then filter using the geo bounding box. In Elasticssearch 2.0+, there is no separation between filters and queries - everything is a query. So you can run the geo query directly:
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 42,
"lon": -72
},
"bottom_right": {
"lat": 40,
"lon": -74
}
}
}
}
}'
Having the following children-father mapping:
curl -XPUT 'localhost:9200/my_index' -d '{
"mappings": {
"my_parent": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
"source_id" : { "type": "integer" },
}
},
"my_child": {
"_parent": {"type": "my_parent" },
"properties" : {
"user_id" : { "type": "string" },
}}}}'
... this two parents with ids 10 and 11:
curl -X PUT 'localhost:9200/my_index/my_parent/10' -d '{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado",
"source_id" : "27",
}'
curl -X PUT 'localhost:9200/my_index/my_parent/11' -d '{
"title" : "Microsiervos - En el 69 llegamos a la luna",
"body" : "Se cumplen 3123 anos de la llegada a la luna",
"source_id" : "27",
}'
... and this two childrens:
curl -XPUT 'localhost:9200/my_index/my_child/1234_10?parent=10' -d '{
"user_id": "1234",
}'
curl -XPUT 'localhost:9200/my_index/my_child/1234_11?parent=11' -d '{
"user_id": "1234",
}'
With the following query, I want to get the _id of the father with user_id = 1234.
curl -XGET 'localhost:9200/my_index/my_parent/_search?pretty=true' -d '{
"_source" : "_id",
"query": {
"has_child": {
"type": "my_child",
"query" : {
"query_string" : {
"default_field" : "user_id",
"query" : "1234"
}}}}}'
This outputs the two ids, 10 and 11.
Now I want to search on parent on those specific ids only, something like this:
curl -XGET 'localhost:9200/my_index/my_parent/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"terms": {
"_id": ["10", "11"]
}},
{
"query_string": {
"default_field": "body",
"query": "mercado"
}}]}}}'
As you can notice, the "_id": ["10", "11"] part is written by hand. I would like to know if there's a way to combine this two queries in one single query putting the ids returned in the first query automatically on the second query.
So the output to this should be:
},
"hits" : {
"total" : 1,
"max_score" : 0.69177496,
"hits" : [ {
"_index" : "my_index",
"_type" : "my_parent",
"_id" : "10",
"_score" : 0.69177496,
"_source":{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado",
"source_id" : "27"
}}]}}
Use bool Query and put both conditions in must:
curl -XGET "http://localhost:9200/my_index/my_parent/_search" -d'
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "body",
"query": "mercado"
}
},
{
"has_child": {
"type": "my_child",
"query": {
"query_string": {
"default_field": "user_id",
"query": "1234"
}
}
}
}
]
}
}
}'
I am using elasticsearch as my search engine, I am now trying to create an custom analyzer to make the field value just lowercase. The following is my code:
Create index and mapping
create index with a custom analyzer named test_lowercase:
curl -XPUT 'localhost:9200/test/' -d '{
"settings": {
"analysis": {
"analyzer": {
"test_lowercase": {
"type": "pattern",
"pattern": "^.*$"
}
}
}
}
}'
create a mapping using the test_lowercase analyzer for the address field:
curl -XPUT 'localhost:9200/test/_mapping/Users' -d '{
"Users": {
"properties": {
"name": {
"type": "string"
},
"address": {
"type": "string",
"analyzer": "test_lowercase"
}
}
}
}'
To verify if the test_lowercase analyzer work:
curl -XGET 'localhost:9200/test/_analyze?analyzer=test_lowercase&pretty' -d '
Beijing China
'
{
"tokens" : [ {
"token" : "\nbeijing china\n",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
} ]
}
As we can see, the string 'Beijing China' is indexed as a single lowercase-ed whole term 'beijing china', so the test_lowercase analyzer works fine.
To verify if the field 'address' is using the lowercase analyzer:
curl -XGET 'http://localhost:9200/test/_analyze?field=address&pretty' -d '
Beijing China
'
{
"tokens" : [ {
"token" : "\nbeijing china\n",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
} ]
}
curl -XGET 'http://localhost:9200/test/_analyze?field=name&pretty' -d '
Beijing China
'
{
"tokens" : [ {
"token" : "beijing",
"start_offset" : 1,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "china",
"start_offset" : 9,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
As we can see, for the same string 'Beijing China', if we use field=address to analyze, it creates a single item 'beijing china', when using field=name, we got two items 'beijing' and 'china', so it seems the field address is using my custom analyzer 'test_lowercase'.
Insert a document to the test index to see if the analyzer works for documents
curl -XPUT 'localhost:9200/test/Users/12345?pretty' -d '{"name": "Jinshui Tang", "address": "Beijing China"}'
Unfortunately, the document has been successfully inserted but the address field has not been correctly analyzed. I can't search out it by using the wildcard query as follows:
curl -XGET 'http://localhost:9200/test/Users/_search?pretty' -d '
{
"query": {
"wildcard": {
"address": "*beijing ch*"
}
}
}'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
List all terms analyzed for the document:
So I run the following commands to see all terms of the document, and I found that the 'Beijing China' is not in the term vector at all.
curl -XGET 'http://localhost:9200/test/Users/12345/_termvector?fields=*&pretty'
{
"_index" : "test",
"_type" : "Users",
"_id" : "12345",
"_version" : 3,
"found" : true,
"took" : 2,
"term_vectors" : {
"name" : {
"field_statistics" : {
"sum_doc_freq" : 2,
"doc_count" : 1,
"sum_ttf" : 2
},
"terms" : {
"jinshui" : {
"term_freq" : 1,
"tokens" : [ {
"position" : 0,
"start_offset" : 0,
"end_offset" : 7
} ]
},
"tang" : {
"term_freq" : 1,
"tokens" : [ {
"position" : 1,
"start_offset" : 8,
"end_offset" : 12
} ]
}
}
}
}
}
We can see that the name is correctly analyzed and it became two terms 'jinshui' and 'tang', but the address is lost.
Can anyone please help? Is there anything missing?
Thanks a lot!
To lowercase the text you don't need a pattern. Use something like this:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"test_lowercase": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
}
}
PUT /test/_mapping/Users
{
"Users": {
"properties": {
"name": {
"type": "string"
},
"address": {
"type": "string",
"analyzer": "test_lowercase"
}
}
}
}
PUT /test/Users/12345
{"name": "Jinshui Tang", "address": "Beijing China"}
And to verify you did the right thing, use this:
GET /test/Users/_search
{
"fielddata_fields": ["name", "address"]
}
And you will see exactly how Elasticsearch is indexing your data:
"fields": {
"name": [
"jinshui",
"tang"
],
"address": [
"beijing",
"china"
]
}
I'm currently parsing text from internal résumés in my company. The goal is to index everything in elasticsearch to perform search on them.
for the moment I have the following JSON document with no mapping defined :
Each coworker has a list of project with the client name
{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
{
"client": "SutrixMedia",
"missions": [
"Responsible for the quality on time and within budget",
"Writing specs, testing,..."
],
"technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
},
{
"client": "Société Générale",
"missions": [
" Writing test cases and scenarios",
" UAT"
],
"technologies": "HP QTP/QC"
}
]
}
The 2 main questions we would like to answer are :
Which coworker has already worked in this company ?
Which client use this technology ?
The first question is really easy to answer, for example:
Projects.client="SutrixMedia" returns me the right resume.
But how can I answer to the second one ?
I would like to make a query like this : Projects.technologies="HP QTP/QC" and the answer would be only the client name ("Société Générale" in this case) and NOT the entire document.
Is it possible to get this answer by defining a mapping with nested type ?
Or should I go for a parent/child mapping ?
Yes, indeed, that's possible with ES 1.5.* if you map projects as nested type and then retrieve nested inner_hits.
So here goes the mapping for your sample document above:
curl -XPUT localhost:9200/resumes -d '
{
"mappings": {
"resume": {
"properties": {
"name": {
"type": "string"
},
"position": {
"type": "string"
},
"projects": {
"type": "nested", <--- declare "projects" as nested type
"properties": {
"client": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"missions": {
"type": "string"
},
"technologies": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}'
Then, you can index your sample document from above:
curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'
Finally, with the following query which only retrieves the nested inner_hits you can retrieve only the nested object that matches Projects.technologies="HP QTP/QC"
curl -XPOST localhost:9200/resumes/resume/_search -d '
{
"_source": false,
"query": {
"nested": {
"path": "projects",
"query": {
"term": {
"projects.technologies.raw": "HP QTP/QC"
}
},
"inner_hits": { <----- only retrieve the matching nested document
"_source": "client" <----- and only the "client" field
}
}
}
}'
which yields only the client name instead of the whole matching document:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_score" : 1.4054651,
"inner_hits" : {
"projects" : {
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_nested" : {
"field" : "projects",
"offset" : 1
},
"_score" : 1.4054651,
"_source":{"client":"Société Générale"} <--- here is the client name
} ]
}
}
}
} ]
}
}
I have following index:
curl -XPUT "http://localhost:9200/test/" -d '
{
"mappings": {
"files": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"owners": {
"type": "nested",
"properties": {
"name": {
"type":"string",
"index":"not_analyzed"
},
"mail": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
}
}
'
With sample documents:
curl -XPUT "http://localhost:9200/test/files/1" -d '
{
"name": "first.jpg",
"owners": [
{
"name": "John Smith",
"mail": "js#example.com"
},
{
"name": "Joe Smith",
"mail": "joes#example.com"
}
]
}
'
curl -XPUT "http://localhost:9200/test/files/2" -d '
{
"name": "second.jpg",
"owners": [
{
"name": "John Smith",
"mail": "js#example.com"
},
{
"name": "Ann Smith",
"mail": "as#example.com"
}
]
}
'
curl -XPUT "http://localhost:9200/test/files/3" -d '
{
"name": "third.jpg",
"owners": [
{
"name": "Kate Foo",
"mail": "kf#example.com"
}
]
}
'
And I need to find all owners that match some query, let's say "mit":
curl -XGET "http://localhost:9200/test/files/_search" -d '
{
"facets": {
"owners": {
"terms": {
"field": "owners.name"
},
"facet_filter": {
"query": {
"query_string": {
"query": "*mit*",
"default_field": "owners.name"
}
}
},
"nested": "owners"
}
}
}
'
This gives me following result:
{
"facets" : {
"owners" : {
"missing" : 0,
"_type" : "terms",
"other" : 0,
"total" : 4,
"terms" : [
{
"count" : 2,
"term" : "John Smith"
},
{
"count" : 1,
"term" : "Joe Smith"
},
{
"count" : 1,
"term" : "Ann Smith"
}
]
}
},
"timed_out" : false,
"hits" : {...}
}
And it's ok.
But what I exaclty need is to get owners with their email addresses (for each entry in facet I need additional field in results).
Is it achievable?
Not possible i think? Depending on your needs I would have
Create a composite field with both name & email and do the facet on that field, or
Run the query in addition to the facet and extract it from the query-result, but this is obviously not scalable
Two step-operation, get the facet, build the needed queries and merge results.