Elasticsearch: Duplicate properties in a single record - elasticsearch

I have to find every document in Elasticsearch that has duplicate properties. My mapping looks something like this:
"type": {
"properties": {
"thisProperty": {
"properties" : {
"id":{
"type": "keyword"
},
"other_id":{
"type": "keyword"
}
}
}
The documents I have to find have a pattern like this:
"thisProperty": [
{
"other_id": "123",
"id": "456"
},
{
"other_id": "123",
"id": "456"
},
{
"other_id": "4545",
"id": "789"
}]
So, I need to find any document by type that has repeat property fields. Also I cannot search by term because I do not what the value of either Id field is. So far the API hasn't soon a clear way to do this via query and the programmatic approach is possible but cumbersome. Is it possible to get this result set in a elasticsearch query? If so, how?
(The version of Elasticsearch is 5.3)

Related

Merging fields in Elastic Search

I am pretty new to Elastic Search. I have a dataset with multiple fields like name, product_info, description etc., So while searching a document, the search term can come from any of these fields (let us call them as "search core fields").
If I start storing the data in elastic search, should I derive a field which is a concatenated term of all the "search core fields" ? and then index this field alone ?
I came across _all mapping concept and little confused. Does it do the same ?
no, you don't need to create any new field with concatenated terms.
You can just use _all with match query to search a text from any field.
About _all, yes, it searches the text from any field
The _all field has been removed in ES 7, so it would only work in ES 6 and previous versions. The main reason for this is that it used too much storage space.
However, you can define your own all field using the copy_to feature. You basically specify in your mapping which fields should be copied to your custom all field and then you can search on that field.
You can define your mapping like this:
PUT my-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "custom_all"
},
"product_info": {
"type": "text",
"copy_to": "custom_all"
},
"description": {
"type": "text",
"copy_to": "custom_all"
},
"custom_all": {
"type": "text"
}
}
}
}
PUT my-index/_doc/1
{
"name": "XYZ",
"product_info": "ABC product",
"description": "this product does blablabla"
}
And then you can search on your "all" field like this:
POST my-index/_search
{
"query": {
"match": {
"custom_all": {
"query": "ABC",
"operator": "and"
}
}
}
}

ElasticSearch [term] query doesn't support multiple fields

I have the following data structure in ElasticSearch:
"assets": {
"type": "nested",
"properties": {
"assetId": {
"type": "keyword"
},
"assetSource": {
"type": "keyword"
},
}
Say I want to exclude the result that 'assetSource' has value 'Web'
I used Term(field='assets.assetSource', query='web') in the query.exclude, but since assets is multi-field, it complains [term] query doesn't support multiple fields
How do I work around this problem?
Stupid me, I should have used the Term filter like this:
Term(**{'assets.assetSource':'vault'})

Output field in autocomplete suggestion

when I want to index a document in elasticsearch this problem occurring:
message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]
I know that the output field removed from elasticsearch in version 5 but why? and what I have to do for getting single result for inputs?
output field removed from ElasticSearch in version 5, now _source filed returns with the suggestion. Example shown in below.
Mapping
{
"user": {
"properties": {
"name": {
"type": "string"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"search_analyzer": "simple"
}
}
}
}
Data
{
"id": "123",
"name": "Abc",
"suggest":
{
"input": "Abc::123"
},
"output": "Abc::123"
}
Query
POST - http://localhost:9200/user*/_suggest?pretty
{
"type-suggest": {
"text": "Abc",
"completion": {
"field": "suggest"
}
}
}
Elastic mentions the following
As suggestions are document-oriented, suggestion metadata (e.g. output) should now be specified as a field in the document. The support for specifying output when indexing suggestion entries has been removed. Now suggestion result entry’s text is always the un-analyzed value of the suggestion’s input (same as not specifying output while indexing suggestions in pre-5.0 indices).
Source
Update
I was able to get a single output from multiple inputs in ES 5.1.1. You can find the answere here

Mapping in elasticsearch

Good morning, In my code I can't search data which contain separate words. If I search on one word all good. I think problem in mapping. I use postman. When I put in URL http://192.168.1.153:9200/sport_scouts/video/_mapping and use method GET I get:
{
"sport_scouts": {
"mappings": {
"video": {
"properties": {
"hashtag": {
"type": "string"
},
"id": {
"type": "long"
},
"sharing_link": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "long"
},
"video_preview": {
"type": "string"
}
}
}
}
}
}
All good title have type string but if I search on two or more words I get empty massive. My code in Trait:
public function search($data) {
$this->client();
$params['body']['query']['filtered']['filter']['or'][]['term']['title'] = $data;
$search = $this->client->search($params)['hits']['hits'];
dump($search);
}
Then I call it in my Controller. Can you help me with this problem?
The reason that your indexed data can't be found is caused by a mismatch of the analyzing during indexing and a strict term filter when querying the data.
With your mapping configuration, you are using the default analyzing which (besides many other operations) does a tokenizing. So every multi-word data you insert is split at punctuation or whitespaces. If you insert for example "some great sentence", elasticsearch maps the following terms to your document: "some", "great", "sentence", but not the term "great sentence". So if you do a term filter on "great sentence" or any other part of the original value containing a whitespace, you will not get any results.
Please see the elasticsearch docs on how to configure your mapping for indexing without analyzing (https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html#_index_2) or consider doing a match query instead of a term filter on the existing mapping (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html).
Please be aware that if you switch to not_analyzed you will be disabling many of the great fuzzy fulltext query functionality. Of course you can set up a mapping that does both, analyzed and not_analyzed in different fields. Then it's up on you to decide on which field you want to query on.

Elasticsearch indexing homogenous objects under dynamic keys

The kind of document we want to index and query contains variable keys but are grouped into a common root key as follows:
{
"articles": {
"0000000000000000000000000000000000000001": {
"crawled_at": "2016-05-18T19:26:47Z",
"language": "en",
"tags": [
"a",
"b",
"d"
]
},
"0000000000000000000000000000000000000002": {
"crawled_at": "2016-05-18T19:26:47Z",
"language": "en",
"tags": [
"b",
"c",
"d"
]
}
},
"articles_count": 2
}
We want to able to ask: what documents contains articles with tags "b" and "d", with language "en".
The reason why we don't use list for articles, is that elasticsearch can efficiently and automatically merge documents with partial updates. The challenge however is to index the objects inside under the variable keys. One possible way we tried is to use dynamic_templates as follows:
{
"sources": {
"dynamic": "strict",
"dynamic_templates": [
{
"article_template": {
"mapping": {
"fields": {
"crawled_at": {
"format": "dateOptionalTime",
"type": "date"
},
"language": {
"index": "not_analyzed",
"type": "string"
},
"tags": {
"index": "not_analyzed",
"type": "string"
}
}
},
"path_match": "articles.*"
}
}
],
"properties": {
"articles": {
"dynamic": false,
"type": "object"
},
"articles_count": {
"type": "integer"
}
}
}
}
However this dynamic template fails because when documents are inserted, the following can be found in the logs:
[2016-05-30 17:44:45,424][WARN ][index.codec] [node]
[main] no index mapper found for field:
[articles.0000000000000000000000000000000000000001.language] returning
default postings format
Same for the two other fields as well. When I try to query for the existence of a certain article, or even articles it doesn't return any document (no error but empty hits):
curl -LsS -XGET 'localhost:9200/main/sources/_search' -d '{"query":{"exists":{"field":"articles"}}}'
When I query for the existence of articles_count, it returns everything. Is there a minor error in what we are trying to achieve, for example in the schema: the definition of articles as a property and in the dynamic template? What about the types and dynamic false? The path seems correct. Maybe this is not possible to define templates for objects in variable-keys, but it should be according to the documentation.
Otherwise, what alternatives are possible without changing the document if possible?
Notes: we have other types in the same index main that also have these fields like language, I ignore if it could influence. The version of ES we are using is 1.7.5 (we cannot upgrade to 2.X for now).

Resources