Exact match a field in Elasticsearch - elasticsearch

In each of the documents I am indexing I have a field called "permalink" which I would like to exact match on.
An example document:
{
"entity_type": "company",
"entity_id": 1383221763,
"company_type": "developer",
"name": "Runewaker Entertainment",
"permalink": "runewaker-entertainment"
}
The mapping for these documents is:
{
"properties": {
"entity_id": {
"type": "integer",
"include_in_all": false
},
"name": {
"type": "string",
"include_in_all": true,
},
"permalink": {
"type": "string",
"include_in_all": true,
"index": "not_analyzed"
},
"company_type": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
}
}
}
When I run the following query then I don't get any hits:
POST /companies/company/_search HTTP/1.1
Host: localhost:8082
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": { "permalink": "runewaker-entertainment" }
}
}
}
}
but I get match with this query:
POST /companies/company/_search HTTP/1.1
Host: localhost:8082
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": { "permalink": "runewaker" }
}
}
}
}
It appears any permalink with a hyphen in it results in a failed query but I was under the impression that if the mapping for a property has the index set to not_analyzed then ElasticSearch wouldn't analyze the field at all.
What should the correct query be?
Thank you
UPDATE:
getMapping result on the Companies index:
{
"companies" : {
"company" : {
"properties" : {
"company_type" : {
"type" : "string"
},
"entity_id" : {
"type" : "long"
},
"entity_type" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"node_id" : {
"type" : "long"
},
"permalink" : {
"type" : "string"
}
}
}
}
}

What you described is correct.
I tested and it works as expected. So you probably have some problem with your index. Maybe you indexed the document before you set the mapping?
Try to do it again -
delete your index or create a new one.
do a putMapping with your mapping.
index the document.
The search should work as expected.

Related

Include joined children with Elasticsearch GET request

I have an Elasticsearch index events that has a join field so that an event can have multiple instances (i.e. the same event can occur on different dates). In this simplified mapping, an event doc has fields for title and url while an instance doc has start/end date fields:
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"url": {
"type": "keyword"
},
"dt": {
"type": "date"
},
"end_dt": {
"type": "date"
},
"event_or_instance": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"event": "instance"
}
}
}
}
}
I know how to get an event and includes all of its instances using has_child:
GET /events/_search
{
"query" : {
"bool": {
"filter": [
{
"term": {
"_id": {
"value": "c8871a79-1907-46c0-958c-9731c529b93e"
}
}
},
{
"has_child" : {
"type" : "instance",
"query" : { "match_all": {} },
"inner_hits" : {
"_source": true,
"sort": [{"dt": "asc"}]
}
}
}
]
}
},
"_source": true
}
This works fine, but is there a way to do this using the Get/Multi-get API instead of the Search API?

Simple elasticsearch input - Rejecting mapping update final mapping would have more than 1 type: [_doc, doc]

I'm trying to send data to elasticsearch but running into an issue where my number field only comes up as a string. These are the steps I took.
Step 1. Add index & map
PUT http://123.com:5101/core_060619/
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
Result:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "core_060619"
}
Step 2. Add data
PUT http://123.com:5101/core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
Result:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
},
"status": 400
}
You can not have more than one type of document in Elasticsearch 6.0.0+. If you set your document type to doc, then you can add another document by simply PUT http://123.com:5101/core_060619/doc/1, PUT http://123.com:5101/core_060619/doc/2 etc.
Elasticsearch 6.+
PUT core_060619/
{
"mappings": {
"doc": { //type of documents in index is 'doc'
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
}
Since we created mapping to have doc type of documents, now we can add new documents by simply adding /doc/_id:
PUT core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
PUT core_060619/doc/2
{
"test" : [ {
"data" : "111120300",
"date" : "10:15 2019-06-02"
} ]
}
Elasticsearch 7.+
Types are removed, but you can use custom like field(s):
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}
}
PUT twitter/_doc/user-kimchy
{
"type": "user",
"name": "Shay Banon",
"user_name": "kimchy",
"email": "shay#kimchy.com"
}
PUT twitter/_doc/tweet-1
{
"type": "tweet",
"user_name": "kimchy",
"tweeted_at": "2017-10-24T09:00:00Z",
"content": "Types are going away"
}
GET twitter/_search
{
"query": {
"bool": {
"must": {
"match": {
"user_name": "kimchy"
}
},
"filter": {
"match": {
"type": "tweet"
}
}
}
}
}
Removal of mapping types

elasticsearch run any query on field exists

I want to run the any query/filter based on the field exists. In our case if user answers a particular field then only we will store that value, other wise will not store that field it self. How can I run the query?
Below is my mapping:
"mappings": {
"responses_10_57": {
"properties": {
"rid: {
"type": "long"
},
"end_time": {
"type": "date",
"format": "dateOptionalTime"
},
"start_time": {
"type": "date",
"format": "dateOptionalTime"
},
"qid_1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
},
"qid_2": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
},
"qid_3": {
"properties": {
"msg_text": {
"type": "string"
},
"msg_tags": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
}
}
}
}
}
}
qid_1 is the name field, qid_2 is the category field, qid_3 is the text message field.
But the qid_3 is not a mandatory field. So we will not insert the record if user doesn't entered any text message.
1) I want each category wide count those who responded the third question.
2) I have to search the names who answered the third question.
How can I write these two queries?
Both queries should have an exists filter to limit the response to only those documents where the qid_3 exists (is not null). For your first query you could try a terms aggregation. For your second query, you can filter the source to include only the names in the response or store the field and use fields.
1)
{
"size": 0,
"filter" : {
"exists" : { "field" : "quid_3" }
},
"aggs" : {
"group_by_category" : {
"terms" : { "field" : "qid_2" }
}
}
}
2)
{
"filter" : {
"exists" : { "field" : "quid_3" }
},
"_source": [ "qid_1"]
}

Add an extra flag in ES query to check if a field exists

I am writing a query to get some records like this:
curl -X GET 'http://localhost:9200/posts/post/_search?from=0&size=30&pretty' -d '{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "content:(aid OR hiv)"
}
}
}
},
"fields": [
"content",
"entity_avatar_link",
"author_link",
"name"
],
size: 30,
from: 0
}
This much is working fine and I am getting the results.
I am trying to add a script field (which acts a flag) which returns whether a field exists in the doc along with every doc returned (I cannot return the field, as in most cases, it will be a very large size (an embedded field)). So, I added this also to the query:
"script_fields": {
"is_arranged_flag": {
"script": "!_source.arranged_retweets.empty"
}
}
So the whole query will be like:
curl -X GET 'http://localhost:9200/posts/post/_search?from=0&size=30&pretty' -d '{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "content:(aid OR hiv)"
}
}
}
},
"fields": [
"content",
"entity_avatar_link",
"author_link",
"name"
],
"script_fields": {
"is_arranged_flag": {
"script": "!_source.arranged_retweets.empty"
}
}
size: 30,
from: 0
}
But after adding the script_fields section, no result is coming out (results is empty [] for the same search query).
I have also tried:
"script_fields": {
"is_arranged_flag": {
"script": "!doc['arranged_retweets'].empty"
}
}
What am I doing wrong?
Here is the mapping http://localhost:9200/posts/post/_mapping
{
"post": {
"properties": {
"arranged_retweets": {
"properties": {
"author_gender": {
"type": "string"
},
"author_link": {
"type": "string"
}
}
},
"content": {
"type": "string",
"analyzer": "tweet_analyzer"
},
"name": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
"author_link": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
"entity_avatar_link": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
}
}
}
I think this is the valid script_fields segment.
"script_fields": {
"is_arranged_flag": {
"script": "!doc['arranged_retweets'].empty"
}
}
Reference: scripting (Read the section on document fields)
I figured it out with the help of the discussion here (https://groups.google.com/forum/#!topic/elasticsearch/BJZdlFSJSRg). The field arranged_retweets is an object. So, we need to check down to the inner level arranged_retweets.author_gender and check if it is empty like this:
"script_fields": {
"is_arranged_flag": {
"script": "!doc['arranged_retweets.author_gender'].empty"
}
}

How to filter fields in ElasticSearch using GET

I recently installed ElasticSearch with the Wikipedia river because I'm working on an autocomplete box with article titles. I have been trying to figure out the best way to query the dataset. The following works:
/wikipedia/_search?q=mysearch&fields=title,redirect&size=20
but I would like to add more constraints to the search:
disambiguation=false, redirect=false, stub=false, special=false
I'm new to ElasticSearch and the documentation hasn't gotten me far. From what I've read I need a filtered query; is there a way to do that from a GET request? That would make it much easier for my specific use case. If not, how would the POST request look? Thanks in advance.
For reference, the mapping is:
{
"wikipedia": {
"page": {
"properties": {
"category": {
"type": "string"
},
"disambiguation": {
"type": "boolean"
},
"link": {
"type": "string"
},
"redirect": {
"type": "boolean"
},
"special": {
"type": "boolean"
},
"stub": {
"type": "boolean"
},
"text": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
}
}
For adding more constraints you can continue with the lucene syntax and do something like:
/wikipedia/_search?q=mysearch AND disambiguation:false AND redirect:false AND stub:false AND special:false&fields=title,redirect&size=20
For improving the performance you can use filters using the json API, the query will look like:
curl -XGET 'http://localhost:9200/wikipedia/_search?pretty=true' -d '
{
"from" : 0,
"size" : 20,
"query":{
"filtered" : {
"query" : {
"text" : { "title" : "test" }
},
"filter" : {
"and": [
{
"term" : {
"stub" : false
}
},
{
"term" : {
"disambiguation" : false
}
},
{
"term" : {
"redirect" : false
}
},
{
"term" : {
"special" : false
}
}
]
}
}
}
}
'

Resources