Elasticsearch does not seem to be working as expected - elasticsearch

I am using elasticsearch and globalize gems for full text searching and what I expect is that I can search for supplier name, localised description using czech/english analyzer.
Example:
Supplier Name: "Bonami.cz"
Supplier Description_CZ: "Test description in czech."
It works when I search for "Bonami.cz", but it does not work (0 results) when I search for:
"Bonami" (part of the word)
"test" (description)
Based on documentation, the below methods should work, but apparently I have missed something. I verified the indexes and data is in ElasticSearch.
Also do I need to install somehow the czech/english analyzer before using it in the model?
require 'elasticsearch/model'
require 'activerecord-import'
class Supplier < ActiveRecord::Base
after_commit lambda { __elasticsearch__.index_document }, on: :create
after_commit lambda { __elasticsearch__.update_document }, on: :update
translates :description, :fallbacks_for_empty_translations => true
accepts_nested_attributes_for :translations
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
include Elasticsearch::Model::Globalize::MultipleFields
mapping do
indexes :id, type: 'integer'
indexes :name, analyzer: 'czech'
indexes :description_ma, analyzer: 'czech'
indexes :description_cs, analyzer: 'czech'
indexes :description_en, analyzer: 'english'
end
def as_indexed_json(options={})
{ id: id,
name: name,
description_ma: description_ma,
description_cs: description_cs,
description_en: description_en
}
end
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^10', 'description_ma', 'description_cs', 'description_en']
}
}
})
end
end
Any idea, what is wrong?
Thanks, Miroslav
UPDATE 1
I inspired with the solution from Rails 4, elasticsearch-rails, but when I try to search now, for any word I always get zero results.
settings index: {
number_of_shards: 1,
analysis: {
filter: {
trigrams_filter: {
type: 'ngram',
min_gram: 2,
max_gram: 10
},
content_filter: {
type: 'ngram',
min_gram: 4,
max_gram: 20
}
},
analyzer: {
index_trigrams_analyzer: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'trigrams_filter']
},
search_trigrams_analyzer: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase']
},
english: {
tokenizer: 'standard',
filter: ['standard', 'lowercase', 'content_filter']
},
czech: {
tokenizer: 'standard',
filter: ['standard','lowercase','content_filter']
}
}
}
} do
mappings dynamic: 'false' do
indexes :name, index_analyzer: 'index_trigrams_analyzer', search_analyzer: 'search_trigrams_analyzer'
indexes :description_en, index_analyzer: 'english', search_analyzer: 'english'
indexes :description_ma, index_analyzer: 'czech', search_analyzer: 'czech'
indexes :description_cs, index_analyzer: 'czech', search_analyzer: 'czech'
end
end
def as_indexed_json(options={})
{ id: id,
name: name,
description_ma: description_ma,
description_cs: description_cs,
description_en: description_en
}
end
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^10', 'description_ma', 'description_cs', 'description_en']
}
},
highlight: {
pre_tags: ['<em>'],
post_tags: ['</em>'],
fields: {
name: {},
description_ma: {},
description_cs: {},
description_en: {}
}
}
}
)
end
This is what I see when I open elastic search URL for the given model:
{"suppliers":{"aliases":{},"mappings":{"supplier":
{"dynamic":"false","properties":{"description_cs":
{"type":"string","analyzer":"czech"},"description_en":
{"type":"string","analyzer":"english"},"description_ma":
{"type":"string","analyzer":"czech"},"name":
{"type":"string","index_analyzer":"index_trigrams_analyzer","search_analyzer":"search_trigrams_analyzer"}}}},"settings":{"index":
{"creation_date":"1445797508427","analysis":{"filter":
{"trigrams_filter":
{"type":"ngram","min_gram":"2","max_gram":"10"},"content_filter":
{"type":"ngram","min_gram":"4","max_gram":"20"}},"analyzer":{"english":
{"filter":["standard","lowercase","content_filter"],"tokenizer":"standard"},"index_trigrams_analyzer":{"type":"custom","filter":["lowercase","trigrams_filter"],"tokenizer":"standard"},"search_trigrams_analyzer":{"type":"custom","filter":["lowercase"],"tokenizer":"whitespace"},"czech":{"filter":["standard","lowercase","content_filter"],"tokenizer":"standard"}}},"number_of_shards":"1","number_of_replicas":"1","version":
{"created":"1060099"},"uuid":"wX9kf3OQSva24Iwk7sZ8AQ"}},"warmers":{}}}
UPDATE 2
Two steps were missing to have it working as expected =>
1. Re-import model data
2. Typo in names of description fields (instead of description_ma/en/cs, I had to use ma/cs/en_description.
settings index: {
number_of_shards: 1,
analysis: {
filter: {
trigrams_filter: {
type: 'ngram',
min_gram: 2,
max_gram: 10
},
content_filter: {
type: 'ngram',
min_gram: 4,
max_gram: 20
}
},
analyzer: {
index_trigrams_analyzer: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'trigrams_filter']
},
search_trigrams_analyzer: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase']
},
english: {
tokenizer: 'standard',
filter: ['standard', 'lowercase', 'content_filter']
},
czech: {
tokenizer: 'standard',
filter: ['standard','lowercase','content_filter' ]
}
}
}
} do
mappings dynamic: 'false' do
indexes :name, index_analyzer: 'index_trigrams_analyzer', search_analyzer: 'search_trigrams_analyzer'
indexes :en_description, index_analyzer: 'english', search_analyzer: 'english'
indexes :ma_description, index_analyzer: 'czech', search_analyzer: 'czech'
indexes :cs_description, index_analyzer: 'czech', search_analyzer: 'czech'
end
end
def as_indexed_json(options={})
{ id: id,
name: name,
ma_description: ma_description,
cs_description: cs_description,
en_description: en_description
}
end
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^10', 'ma_description', 'cs_description', 'en_description']
}
},
highlight: {
pre_tags: ['<em>'],
post_tags: ['</em>'],
fields: {
name: {},
ma_description: {},
cs_description: {},
en_description: {}
}
}
}
)
end

In order to be able to perform the search you are trying to do you'll to use the Ngram analyzer. (as discussed in the comments)

Related

ElasticSearch: My analyzers aren't having an effect on sorting

I'm trying to use analyzers to return alphabetically sorted data, however my changes are always returned in lexographical order. I've tried multiple implementations from here and other sources to no avail. Is the issue in my tokenizer? Or possibly my use of custom analyzers i wrong? Thanks in advance
await client.indices.create({
index: esIndexReport,
body: {
settings: {
analysis: {
filter: {
min_term_length: {
type: 'length',
min: 2,
},
},
analyzer: {
name_analyzer: {
tokenizer: 'whitespace',
filter: [
'lowercase',
'min_term_length',
],
},
min_term_analyzer: {
tokenizer: 'standard',
filter: [
'lowercase',
'min_term_length',
],
},
},
},
},
mappings: {
report: {
properties: {
reportId: {
type: 'text',
analyzer: 'min_term_analyzer',
},
reportName: {
type: 'text',
analyzer: 'name_analyzer',
},
description: {
type: 'text',
analyzer: 'name_analyzer',
},
author: {
type: 'text',
analyzer: 'min_term_analyzer',
},
icType: {
type: 'text',
analyzer: 'min_term_analyzer',
},
status: {
type: 'text',
analyzer: 'min_term_analyzer',
},
lastUpdatedAt: {
type: 'text',
analyzer: 'min_term_analyzer',
},
'sort.reportName': {
type: 'text',
fielddata: true,
},
'sort.description': {
type: 'text',
fielddata: true,
},
'sort.author': {
type: 'text',
fielddata: true,
},
'sort.status': {
type: 'text',
fielddata: true,
},
'sort.lastUpdatedAt': {
type: 'text',
fielddata: true,
},
},
},
},
},
});

how to represent required fields in index mapping in elastic search

mapping: {
dynamic: 'strict',
properties: {
name: { type: 'text', index: false },
email: { type: 'keyword' },
phoneNumber: { type: 'text', index: false },
},
},
How to add the required field for name, email and phone number?

Nested type without referencing another schema in Mongoosastic

I want to give a nested field the elastic mapping type "nested" by using Mongoosastic. I also want to specify the es_type of the fields in the nested field.
My schema looks like this:
const CarOwner = new Schema({
cars: [{
name: {
type: String,
es_indexed: true,
},
price: {
type: Number,
es_indexed: true,
es_type: 'float'
},
}],
});
I want this ElasticSearch mapping:
{
"mappings": {
"carowner": {
"properties": {
"cars": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"price": { "type": "float" },
}
}
}
}
}
}
The only Mongoosastic examples I've found looks like this:
var Car = new Schema({
name: {
type: String,
es_indexed: true,
},
price: {
type: Number,
es_indexed: true,
es_type: 'float'
},
})
var CarOwner = new Schema({
cars: {
type:[Car],
es_indexed: true,
es_type: 'nested',
es_include_in_parent: true
}
})
Do I have to create a subschema or is there any way I could use Mongoosastic to create the mapping I want?

Document contains at least one immense term in field=“errormsg.keyword” (whose UTF8 encoding is longer than the max length 32766

i get this error when logstash try to write in elasticsearch.it create the index but there is no data available on elasticsearch.
Document contains at least one immense term in field="errormsg.keyword" (whose UTF8 encoding is longer than the max length 32766
this is my pipeline.conf.
input {
file {
path => "c:/logstash.log"
start_position => "beginning"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601}"
negate => true
what => "previous"
}
}
}
filter {
grok{
match => { "message" => "%{TIME:timestamp} %{LOGLEVEL:LEVEL} %{GREEDYDATA:errormsg}" }
}
}
output {
if "ERROR" in [LEVEL]
{
elasticsearch {
hosts=>"localhost:9200"
}
}
stdout { codec => rubydebug }
}
---Output of curl -XGET localhost:9200/logstash/_mapping
{
logstash-2017.06.16: {
mappings: {
_default_: {
_all: {
enabled: true,
norms: false
},
dynamic_templates: [
{
message_field: {
path_match: "message",
match_mapping_type: "string",
mapping: {
norms: false,
type: "text"}}},
{
string_fields: {
match: "*",
match_mapping_type: "string",
mapping: {
fields: {
keyword: {
type: "keyword"}},
norms: false,
type: "text"}}}],
properties: {
#timestamp: {
type: "date",
include_in_all: false},
#version: {
type: "keyword",
include_in_all: false},
geoip: {
dynamic: "true",
properties: {
ip: {type: "ip"},
latitude: {
type: "half_float"},
location: {
type: "geo_point"},
longitude: {type: "half_float"}}}}},
logs: {
_all: {
enabled: true,
norms: false},
dynamic_templates: [
{message_field: {
path_match: "message",
match_mapping_type: "string",
mapping: {norms: false,type: "text"}}},
{string_fields: {
match: "*",match_mapping_type: "string",
mapping: {
fields: {
keyword: {
type: "keyword"}},
norms: false,
type: "text"}}}],
properties: {
#timestamp: {
type: "date",
include_in_all: false},
#version: {
type: "keyword",
include_in_all: false},
LEVEL: {
type: "text",
norms: false,
fields: {
keyword: {
type: "keyword"}}},
errormsg: {
type: "text",norms: false,
fields: {
keyword: {
type: "keyword"}}},
geoip: {dynamic: "true",
properties: {
ip: {type: "ip"},
latitude: {type: "half_float"},
location: {type: "geo_point"},
longitude: {type: "half_float"}}},
host: {type: "text",norms: false,
fields: {
keyword: {type: "keyword"}}},
message: {type: "text",norms: false},
path: {type: "text",norms: false,
fields: {
keyword: {
type: "keyword"}}},
tags: {type: "text",norms: false,
fields: {
keyword: {
type: "keyword"}}},
timestamp: {type: "text",norms: false,
fields: {
keyword: {
type: "keyword"}}}}}}}}
and this is for example the error to parse:
17:37:17,103 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web] Servlet.service()java.io.FileNotFoundException:
at org.thymeleaf.templateresource.ServletContextTemplateResource
at org.thymeleaf.templateparser.markup.AbstractMarkupTemplateParser.
17:37:17,104 ERROR.....
thank you so much for your help #xeraa

Mongoid: unique index for embedded documents

I'm trying to create a unique field for embedded documents:
class Chapter
include Mongoid::Document
field :title
end
class Book
include Mongoid::Document
field :name
embeds_many :chapters
index({ 'name' => 1 }, { unique: true })
index({ 'name' => 1, 'chapters.title' => 1 }, { unique: true, sparse: true })
# index({ 'name' => 1, 'chapters.title' => 1 }, { unique: true })
end
I run the task: rake db:mongoid:create_indexes
I, [2017-02-22T08:56:47.087414 #94935] INFO -- : MONGOID: Created indexes on Book:
I, [2017-02-22T08:56:47.087582 #94935] INFO -- : MONGOID: Index: {:name=>1}, Options: {:unique=>true}
I, [2017-02-22T08:56:47.087633 #94935] INFO -- : MONGOID: Index: {:name=>1, :"chapters.title"=>1}, Options: {:unique=>true, :sparse=>true}
But it doesn't work as I would expect...
Book.new( name: 'A book', chapters: [ { title: 'title1' }, { title: 'title1' }, { title: 'title2' } ] ).save # no errors
Book.new( name: 'Another book', chapters: [ { title: 'title2' } ] ).save
b = Book.last
b.chapters.push( Chapter.new( { title: 'title2' } ) )
b.save # no errors
Any idea?
UPDATE: Ruby 2.4.0, Mongo 3.2.10, Mongoid 5.2.0 | 6.0.3 (trying both)
UPDATE2: I add also the tests I made directly with mongo client:
use books
db.books.ensureIndex({ title: 1 }, { unique: true })
db.books.ensureIndex({ "title": 1, "chapters.title": 1 }, { unique: true, sparse: true, drop_dups: true })
db.books.insert({ title: "Book1", chapters: [ { title: "Ch1" }, { title: "Ch1" } ] }) # allowed?!
db.books.insert({ title: "Book1", chapters: [ { title: "Ch1" } ] })
b = db.books.findOne( { title: 'Book1' } )
b.chapters.push( { "title": "Ch1" } )
db.books.save( b ) # allowed?!
db.books.findOne( { title: 'Book1' } )
db.books.insert({ title: "Book2", chapters: [ { title: "Ch1" } ] })
UPDATE3: I made more tests but I didn't succeed, this link helped but the problem remains
You should use drop_dups
class Category
include Mongoid::Document
field :title, type: String
embeds_many :posts
index({"posts.title" => 1}, {unique: true, drop_dups: true, name: 'unique_drop_dulp_idx'})
end
class Post
include Mongoid::Document
field :title, type: String
end
Rails console:
irb(main):032:0> Category.first.posts.create(title: 'Honda S2000')
=> #<Post _id: 58adb923cacaa6f778215a26, title: "Honda S2000">
irb(main):033:0> Category.first.posts.create(title: 'Honda S2000')
Mongo::Error::OperationFailure: E11000 duplicate key error collection: mo_development.posts index: title_1 dup key: { : "Honda S2000" } (11000)

Resources