elasticsearch run any query on field exists - elasticsearch

I want to run the any query/filter based on the field exists. In our case if user answers a particular field then only we will store that value, other wise will not store that field it self. How can I run the query?
Below is my mapping:
"mappings": {
"responses_10_57": {
"properties": {
"rid: {
"type": "long"
},
"end_time": {
"type": "date",
"format": "dateOptionalTime"
},
"start_time": {
"type": "date",
"format": "dateOptionalTime"
},
"qid_1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
},
"qid_2": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
},
"qid_3": {
"properties": {
"msg_text": {
"type": "string"
},
"msg_tags": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
}
}
}
}
}
}
qid_1 is the name field, qid_2 is the category field, qid_3 is the text message field.
But the qid_3 is not a mandatory field. So we will not insert the record if user doesn't entered any text message.
1) I want each category wide count those who responded the third question.
2) I have to search the names who answered the third question.
How can I write these two queries?

Both queries should have an exists filter to limit the response to only those documents where the qid_3 exists (is not null). For your first query you could try a terms aggregation. For your second query, you can filter the source to include only the names in the response or store the field and use fields.
1)
{
"size": 0,
"filter" : {
"exists" : { "field" : "quid_3" }
},
"aggs" : {
"group_by_category" : {
"terms" : { "field" : "qid_2" }
}
}
}
2)
{
"filter" : {
"exists" : { "field" : "quid_3" }
},
"_source": [ "qid_1"]
}

Related

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

Elasticsearch: Merge result of aggregation by bucket key

I've indexed entities in Elasticsearch, which occur in my documents. The mapping for the entities looks like the following:
"Entities": {
"properties": {
"EntFrequency": {
"type": "long"
},
"EntId": {
"type": "long"
},
"EntType": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"Entname": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
},
[...]
Furthermore, I use this aggregation query to determine the most-occurring entities:
GET cable/document/_search
{
"size" :0,
"query": {
"match_all": {}
},
"aggs" : {
"entities_agg" : {
"terms" : {
"field" : "Entities.EntId"
}
}
}
}
}
Response
"buckets": [
{
"key": 323644,
"doc_count": 231038
},
[...]
However, some of those entity mentions refer to the same entity e.g. "USA" and "United States" and I do know their ids. How do I merge the buckets and the counts of these duplicates in ES?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application. The knowledge about duplicates is acquired through runtime. Thus, I cannot use this knowledge for the initial creation of my ES index.
Thanks for your help and comments!

Update and search in multi field properties in ElasticSearch

I'm trying to use multi field properties for multi language support. I created following mapping for this:
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
I created test record:
{
"prod-id": "1234567",
"prod-name": [
"Test product",
"Produit d'essai"
]
}
and tried to query using some language:
{
"query": {
"bool": {
"must": [
{"match": {
"prod-name.en": "Produit"
}}
]
}
}
}
As a result I got my document. But I expected that I will have empty result when I use French but choose English. It seems ElasticSearch ignores which field I specified in query. There is no difference in search result when I use "prod-name.en" or "prod-name.fr" or just "prod-name". Is this behaviour expected? Should I do some special things to have searching just in one language?
Another problem with updating multi field property. I can't update just one field.
{
"doc" : {
"prod-name.en": "Test"
}
}
I got following error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
}
],
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
},
"status": 400
}
Is there any way to update just one field in multi field property?
In your mapping, the prod-name.en field will simply be analyzed using the english analyzer and the same for the french field. However, ES will not choose for you which value to put in which field.
Instead, you need to modify your mapping like this
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "object",
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
and input document to be like this and you'll get the results you expect.
{
"prod-id": "1234567",
"prod-name": {
"en": "Test product",
"fr": "Produit d'essai"
}
}
As for the updating part, your partial document should be like this instead.
{
"doc" : {
"prod-name": {
"en": "Test"
}
}
}

Elasticsearch: multiple languages in two fields when the query's language is unknown or mixed

I am new to Elasticsearch, and I am not sure how to proceed in my situation.
I have the following mapping:
{
"mappings": {
"book": {
"properties": {
"title": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
},
"keyword": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
}
}
}
}
}
A sample document may have two languages for the same field of the same book. Here are two example documents:
{
"title" : {
"en": "hello",
"ar": "مرحبا"
},
"keyword" : {
"en": "world",
"ar": "عالم"
}
}
{
"title" : {
"en": "Elasticsearch"
},
"keyword" : {
"en": "full-text index"
}
}
When I know what language is used in query, I am able to build query as follows (when English is used):
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en" ]
}
}
Based on my current document mapping, how can I build a query if
the query language is unknown or
is mixed with English and Arabic?
Thanks for any input!
Regards.
p.s. I am also open to any improvement to the above mapping.
the query language is unknown
You can use same multi match query but on all the fields.for eg,
Assuming you are using keyword analyzer
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en", "title.ar", "keyword.ar" ]
}
}
is mixed with English and Arabic
You need to change the analyzer to standard and then you can perform the same query.
Thanks

Exact match a field in Elasticsearch

In each of the documents I am indexing I have a field called "permalink" which I would like to exact match on.
An example document:
{
"entity_type": "company",
"entity_id": 1383221763,
"company_type": "developer",
"name": "Runewaker Entertainment",
"permalink": "runewaker-entertainment"
}
The mapping for these documents is:
{
"properties": {
"entity_id": {
"type": "integer",
"include_in_all": false
},
"name": {
"type": "string",
"include_in_all": true,
},
"permalink": {
"type": "string",
"include_in_all": true,
"index": "not_analyzed"
},
"company_type": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
}
}
}
When I run the following query then I don't get any hits:
POST /companies/company/_search HTTP/1.1
Host: localhost:8082
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": { "permalink": "runewaker-entertainment" }
}
}
}
}
but I get match with this query:
POST /companies/company/_search HTTP/1.1
Host: localhost:8082
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": { "permalink": "runewaker" }
}
}
}
}
It appears any permalink with a hyphen in it results in a failed query but I was under the impression that if the mapping for a property has the index set to not_analyzed then ElasticSearch wouldn't analyze the field at all.
What should the correct query be?
Thank you
UPDATE:
getMapping result on the Companies index:
{
"companies" : {
"company" : {
"properties" : {
"company_type" : {
"type" : "string"
},
"entity_id" : {
"type" : "long"
},
"entity_type" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"node_id" : {
"type" : "long"
},
"permalink" : {
"type" : "string"
}
}
}
}
}
What you described is correct.
I tested and it works as expected. So you probably have some problem with your index. Maybe you indexed the document before you set the mapping?
Try to do it again -
delete your index or create a new one.
do a putMapping with your mapping.
index the document.
The search should work as expected.

Resources