What is the difference between new search_as_you_type datatype in Elasticsearch and tokenizer type edge_ngram? Which one to prefer in building search-as-you-type search engine?
Documentation of Elasticsearch gives both implementations:
search_as_you_type datatype: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html
tokenizer type edge_ngram: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html (Look at the example of how to set up a field for search-as-you-type.)
UPDATE
Elasticsearch version : 7.6.1
I indexed my data with a data type search_as_you_type according to the latest Elasticsearch documentation and trying to build a simple query via Java API based on the example below:
GET my_index/_search
{
"query": {
"multi_match": {
"query": "brown f",
"type": "bool_prefix",
"fields": [
"my_field",
"my_field._2gram",
"my_field._3gram"
]
}
}
}
The point that I struggle with is adding "type": "bool_prefix".
A) I tried with MultiMatchQueryBuilder
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MatchQuery.Type.BOOLEAN_PREFIX);
and got an exception at the second line of above code:
org.elasticsearch.ElasticsearchParseException: failed to parse [multi_match] query type [boolean_prefix]. unknown type.
B) Then I tried with MatchBoolPrefixQueryBuilder
MatchBoolPrefixQueryBuilder matchBoolPrefixQueryBuilder=new MatchBoolPrefixQueryBuilder(value, fields);
got an exception
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=parsing_exception, reason=[match_bool_prefix] unknown token [START_ARRAY] after [query]]
...
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/my_dictionary/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57}],"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57},"status":400}
at line
SearchResponse searchResponse=restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
What am I doing wrong? Which one should I use and how?
SOLUTION
I solved the issue just by changing the type to:
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type("bool_prefix");
But I don't understand why the type must be hardcoded as "bool_prefix" instead of using MatchQuery.Type.BOOLEAN_PREFIXor why not possible to use MatchBoolPrefixQueryBuilder, there is no much implementation examples of this query.
The two are different things.
edge_ngram is a tokenizer, which means it kicks in at indexing time to tokenize your input data. There is also a edge_ngram token filter. Both are similar but work at different levels. See this thread to learn about the main differences.
search_as_you_type is a field type which contains a few sub-fields, one of which is called _index_prefix and which leverages the edge_ngram tokenizer.
So basically, what you see in the edge_ngram tokenizer documentation has actually been leveraged when they decided to add the new search_as_you_type field type.
UPDATE
You actually need to use
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MultiMatchQueryBuilder.Type.BOOL_PREFIX);
You can see here how that enumeration value is built
I'm using pymongo to insert a document from MongoDB into Elasticsearch, and getting an error below. The problem looks like the '_id'. How can I convert the "_id" to "id" before inserting it into ElasticSearch?
Error:
elasticsearch.exceptions.RequestError: TransportError(400,
u'mapper_parsing_exception', u'Field [_id] is a metadata field and
cannot be added inside a document. Use the index API request
parameters.')
Code:
for doc in documents1.find():
doc_sanitized = json.loads(json_util.dumps(doc))
es.index(index='index-name', doc_type='intel', id=i, body=doc_sanitized)
When I try to insert geo_shape type entry into Elasticsearch from Apache Nifi. This is just a nested JSON field - for eaxmple, in Apache Nifi my FlowFile have this nested content for geo_shape:
{
"location": "{\"type\":\"polygon\",\"coordinates\":[[[3.042514,41.79673582],[3.04182089,41.79738937],[3.04299467,41.79763732],[3.042514,41.79673582]]]}"
}
In Elasticsearch the field is specified as follows:
"location": {
"type": "geo_shape"
}
When I execute PutElasticsearch1.3, I get the following error:
MapperParsingException failed to parse [location]: nested - shape must
be an object consisting of type and coordinates
How can I parse this nested JSON string in order to save it in Elasticsearch from Apache Nifi?
I am new to the ELK stack and have it implemented with elasticsearch version 1.4.4, logstash version 1.4.2, and kibana version 4. I am able to pull a csv file into elasticsearch using logstash and have it display in kibana.
When displaying a date from the file, the values within the date are separated out as if the dash contained within is a separator (ex. value in field is 01-01-2015, when this is displayed in kibana (regardless of display type) there will be three field entries, 01, 01, and 2015). Kibana gives a message that this is due to it being an analyzed field.
Kibana 4 has a feature to use json directly from the dashboard builder, Visualization, to change this to a non-analyzed field so that the entire string will be used, rather than separating it.
I have tried multiple formats, but this is the one that seems it should work as kibana recognizes it as valid syntax:
{ "index" : "not_analyzed" }
but when attempting to apply the change, the dashboard does not change its structure and kibana generates the following exception:
Visualize: Request to Elasticsearch failed: {"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][0]: SearchParseException[[csvtest][0]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][0]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][1]: SearchParseException[[csvtest][1]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][1]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][2]: SearchParseException[[csvtest][2]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][2]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][3]: SearchParseException[[csvtest][3]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][3]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][4]: SearchParseException[[csvtest][4]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][4]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }]"} less
It can be seen within where the index: was changed to not_analyzed from analyzed; also the setting that has wildcard analyzed: true was also changed to false withing the advanced object configuration with the same result.
Try index Mapping and put the date field as non-analyzed.
For Example:
"<index name>": {
"mappings": {
"<Mapping type>": {
"properties": {
"City": {
"type": "string",
"index": "not_analyzed"
},
"Date": {
"type": "string",
"index": "not_analyzed"
}
}
}
I had a similar issue today with the following message:
Parse Failure [Unknown key for a VALUE_STRING in [logTime]: [offset].]]; }]
I was sending a date histogram aggregation request against Elasticsearch 1.4.5 with the following payload:
['logTime'].forEach(function (field) {
body.aggregations[field] = {
date_histogram: {
field: field,
interval: 'week',
time_zone: '+00:00',
offset: '15h',
min_doc_count: 0,
extended_bounds: {
min: 1440946800000,
max: 1441551599999
}
}
};
});
Note the use of offset parameter for the date_histogram. This parameter is introduced in Elasticsearch version 1.5.0 only. So, my 1.4.5 ES was complaining that this offset key was Unknown.
Replacing with post_offset as follows solved the problem though I had to adjust the value of the time_zone parameter as well. As a side note, post_offset is deprecated and replaced with offset since v1.5.
['logTime'].forEach(function (field) {
body.aggregations[field] = {
date_histogram: {
field: field,
interval: 'week',
time_zone: '+09:00',
post_offset: '-9h',
min_doc_count: 0,
extended_bounds: {
min: 1440946800000,
max: 1441551599999
}
}
};
});
We're currently using a Couchbase Plugin (transport-couchbase) to transport and index the data into ElasticSearch (http://docs.couchbase.com/couchbase-elastic-search/)
I've taken a look at ElasticSearch's mapping documentation here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
My understanding is that if you rely on defaults for ElasticSearch, once a document gets indexed, ElasticSearch will create a dynamic mapping for that document type. This is what we've defaulted to.
We ran into issues where after adding a specific document type, and when the transport plugin inserts an "invalid" document (the document's field type is now different -- from string -> array), ElasticSearch throws an exception and essentially breaks the replication from Couchbase to ElasticSearch. The exception looks like this:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property
[xyz]
java.lang.RuntimeException: indexing error MapperParsingException[failed to parse
[doc.myfield]]; nested: ElasticsearchIllegalArgumentException[unknown property[xyz]]
Is there a way we can configure ElasticSearch so that "invalid" documents simply get filtered without throwing exception and breaking the replication?
Thanks.
{
"tweet" : {
"dynamic": "strict",
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html