Add reusable field type to elasticsearch - elasticsearch

Is it possible to define a custom field type and reuse that definition for multiple fields. I'm trying to do something like a template, but I don't want it to be defined dynamically.
For example, I have something in the system called "keywords" - keywords always have a specific mapping -
'keywords' => [
'type' => 'object',
'properties' => [
'id' => [
'type' => 'integer'
],
'name' => [
'type' => 'string',
'position_offset_gap'=>100,
'analyzer'=>'my_keyword',
]
]
]
I have these throughout the system - post, media, folder, etc. and I have two kinds that are very similar - lets say keywords and categories. It's the same definition, I just keep them separate for business reasons.
Ideally, what I would like to do is define a "keyword" type and then for a field I would just define
'keywords' => [
'type' => 'keyword'
]
or something similar. Then also when I want to change that definition I can do it in one place for all the fields using it.
Is this possible in Elasticsearch? I'd prefer not to use index template because I like having my mappings explicit.

I would recommend you to use dynamic templates + a naming convention for keyword fields. In practice, for example:
(1) Define a dynamic template that maps any field with a name starting by k_ to your custom mapping
{
"mappings": {
"_doc": {
"dynamic_templates": [
"keywords": {
"match": "k_*",
"mapping": {
"type": "keyword",
...
}
}
]
}
}
}
(2) Add the k_ prefix to the name of any field that should apply your custom mapping (e.g., k_post, k_media, ...)
Of course, you can choose any other naming convention for your keyword fields (e.g., *_keywords k*, ...)

Related

Query unknown data structure in GraphQL

I just started to work with GraphQL and I am setting up a server with webonyx/graphql-php at the moment. Since a GraphQL query already has to contain the resulting data structure, I am not quite sure how to get dynamic data. Assumed that I query the content which consists different element types and my final structure should look like this:
{
"data": {
"dataset": {
"uuid": "abc...",
"insertDate": "2018-05-04T12:12:12Z",
// other metadata
"content": [
{
"type": "headline",
"text": "I am a headline"
},
{
"type": "image",
"src": "http://...",
"alt": "I am an image"
},
{
"type": "review",
"rating": 3,
"comment": "I am a review"
},
{
"type": "headline",
"text": "I am another headline"
}
// other content elements
]
}
}
}
How could I write a query for this example?
{
dataset {
uuid
insertDate
content {
????
}
}
}
And how would a type definition for the content section look like? There is a defined set of element types (headline, image, review, many more) but their order and number of elements is unknown and they have only one field, type, in common. While writing the query in my frontend, I don't know anything about the content structure. And what would the graphql-php type definition for the content section look like? I couldn't find any similar example online, so I am not sure if it is even possible to use GraphQL for this use case. As an extra information, I always want to query the whole content section, not a single element or field, always everything.
When you're returning an array of Object types, but each individual item could be one of any number of different Object types, you can use either an Interface or a Union. We can use an Interface here since all the implementing types share a field (type).
use GraphQL\Type\Definition\InterfaceType;
use GraphQL\Type\Definition\Type;
$content = new InterfaceType([
'name' => 'Content',
'description' => 'Available content',
'fields' => [
'type' => [
'type' => Type::nonNull(Type::string()),
'description' => 'The type of content',
]
],
'resolveType' => function ($value) {
if ($value->type === 'headline') {
return MyTypes::headline();
} elseif ($value->type === 'image') {
return MyTypes::image();
} # and so on
}
]);
Types that implement the Interface need to do so explicitly in their definition:
$headline = new ObjectType([
# other properties
'interfaces' => [
$content
]
]);
Now if you change the type of the content field to a List of content, you can query only fields specific to each implementing type by using inline fragments:
query GetDataset {
dataset {
uuid
insertDate
content {
type # this field is shared, so it doesn't need an inline fragment
... on Headline {
text
}
... on Image {
src
alt
}
# and so on
}
}
}
Please see the docs for more details.

Why doesn't elasticsearch allow changes to an indexed data?

I tried to convert some of the fields in a previously indexed data from string to integer. But when i ran logstash again, the fields didn't get converted (checked in Kibana only). Why can't i make changes to an already indexed data and if not, how can i make the required changes to my index?
I've only been making changes in logstash. Here is a piece of logstash.conf:
input {
file {
type => "movie"
path => "C:/TestLogs/Test5.txt"
start_position => "beginning"
}
}
filter {
grok {
match => {"message" => "(?<Movie_Name>[\w.\-\']*)\s(?<Rating>[\d.]+)\s(?<No. Of Downloads>\d+)\s(?<No. of views>\d+)" }
}
mutate {
convert => {"Rating" => "float"}
convert => {"No. of Downloads" => "integer"}
convert => {"No. of views" => "integer"}
}
}
Elasticsearch is using Lucene at its core for indexing and storing data. Lucene uses a read-only datastructure to store data and that's the reason why it is not possible to change data structures for data that is already stored in elasticsearch. It is possible to update the documents with new values, but not to change the structure for an entire index.
If you want to change the mappings, i.e. the data structure then you have to create a new index with a new mapping and store it there.
This of course is not that easy if elasticsearch is the master of the data. To do this you have to create a new index with a new mapping and read data from the old index and put it into the new index. You can do this by using the Scan and Scroll approach.
If you want to make this transparent to the application reading from elasticsearch you can use an alias:
At first the index name is data_v1 and the alias is data:
data -> data_v1
Then you create a new index: data_v2 with the new mapping. Read all data from data_v1 and store it in data_v2. Having done this, change the alias to point to data_v2
data -> data_v2
To change aliases you can use the 'remove' and 'add' functions:
POST /_aliases
{
"actions": [
{ "remove": {
"alias": "items",
"index": "items_v1"
}}
]
}
POST /_aliases
{
"actions": [
{ "add": {
"alias": "items",
"index": "items_v2"
}}
]
}

Multi-field Search with Array Using 'AND' Operator in elasticsearch

I want to query the values of a multi-value field as separate 'fields' in the same way I'm querying the other fields.
I have a data structure like so:
{
name: 'foo one',
alternate_name: 'bar two',
lay_name: 'baz three',
tags: ['stuff like', 'this that']
}
My query looks like this:
{
query:
query: stuff
type: 'best_fields',
fields: ['name', 'alternate_name', 'lay_name', 'tags'],
operator: 'and'
}
The 'type' and 'operator' work perfectly for the single value fields in only matching when the value contains my entire query. For example, querying 'foo two' doesn't return a match.
I'd like the tags field to behave the same way. Right now, querying 'stuff that' will return a match when it shouldn't because no fields or tag values contain both words in a single value. Is there a way to achieve this?
EDIT
Val's assessment was spot on. I've updated my mapping to the following (using elasticsearch-rails/elasticsearch-model):
mapping dynamic: false, include_in_all: true do
... other fields ...
indexes :tags, type: 'nested' do
indexes :tag, type: 'string', include_in_parent: true
end
end
Please show your mapping type, but I suspect your tags field is a simple string field like this:
{
"your_type" : {
"properties" : {
"tags" : {
"type" : "string"
}
}
}
}
In this case ES will "flatten" all your tags under the hood in the tags field at indexing time like this:
tags: "stuff", "like", "this", "that"
i.e. this is why you get results when querying "stuff that", because the tags field contains both words.
The way forward would be to make tags a nested object type, like this
{
"your_type" : {
"properties" : {
"tags" : {
"type" : "nested",
"properties": {
"tag" : {"type": "string" }
}
}
}
}
}
You'll need to reindex your data but at least querying for tags: "stuff that" will not return anything anymore. Your tag tokens will be "kept together" as you expect. Give it a try.

How can I add heterogeneous data to Elasticsearch?

I am trying to add heterogenous data (i.e. of different "types") to Elasticsearch. Each (top-level) object contains a user's settings for an application. A simplified example is:
{
'name':'test',
'settings': [
{
'key':'color',
'value':'blue'
},
{
'key':'isTestingMode',
'value':true
},
{
'visibleColumns',
'value': [
'column1',
'column3',
'column4',
]
},
...
...
}
When I try to add this, the POST fails with an MapperParsingException. Searching around, it seems like this is because the 'value' field has different types.
Is there any way to just store arbitrary data like this?
This is not possible.
Mapping is per field and mapping is not array aware.
This means that you can keep settings.value as string or array but not both.
An easy tweak would be to define all value as array -
{
'name':'test',
'settings': [
{
'key':'color',
'value': [ 'blue' ]
},
{
'key':'isTestingMode',
'value': [ true ]
},
{
'visibleColumns',
'value': [
'column1',
'column3',
'column4',
]
},
...
...
}
If that is not acceptable , then another idea would be to apply source transform which will do this normalization to the settings.value field before it is indexed. This way , the source is kept as it is AND you will get what you want.

Selectively turn off stop words in Elastic Search

So I would like to turn off stop word filtering on the username, title, and tags fields but not the description field.
As you can imagine I do not want to filter out a result called the best but I do want to stop the from affecting the score if it is in the description field (search the on GitHub if you want an example).
Now #Javanna says ( Is there a way to "escape" ElasticSearch stop words? ):
In your case I would disable stopwords for that specific field rather than modifying the stopword list, but you could do the latter too if you wish to.
Failing to provide an example so I searched around and tried the common query: http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/ which didn't work for me either.
So I searched for specifically stopping the filtering stop words however the closest I have come to is stopping it index wide: Can I customize Elastic Search to use my own Stop Word list? by attacking the analyzer directly, or failing that the documentation hints at making my own analyzer :/.
What is the best way selectively disable stop words on certain fields?
I think you already know what to do, which would be to customize your analyzers for certain fields. From what I understand you did not manage to create a valid syntax example for that. This is what we used in a project, I hope that this example points you in the right direction:
{
:settings => {
:analysis => {
:analyzer => {
:analyzer_umlauts => {
:tokenizer => "standard",
:char_filter => ["filter_umlaut_mapping"],
:filter => ["standard", "lowercase"],
}
},
:char_filter => {
:filter_umlaut_mapping => {
:type => 'mapping',
:mappings_path => es_config_file("char_mapping")
}
}
}
},
:mappings => {
:company => {
:properties => {
[...]
:postal_city => { :type => "string", :analyzer => "analyzer_umlauts", :omit_norms => true, :omit_term_freq_and_positions => true, :include_in_all => false },
}
}
}
}

Resources