By Default couchbaseCheckPoint type created for document in elasticsearch sync with couchbase - elasticsearch

I am new to elasticsearch and couchbase.I am using both in my project.My requirement is to sync couchbase bucket with elasticsearch indices by using couchbase XDCR.
In couchbase, Bucket name "Employee" and structure of one this document is
{
"empName":"Stev Jobs",
"dept":"IT",
"company":"xxxx",
"salary":"30000",
"country":"USA"
}
I created index in elasticsearch that is employee and also created cluster reference in couchbase with elasticsearch cluster.
After setting all this I started replication between employee bucket of couchbase and employee index of elasticsearch. It created indices in elasticsearch,but this index contains more than couchbase bucket documents.
My couchbase bucket employee has 182 records but in elasticsearch employee index showing docs 1025.
And In couchbase when sync it showing some error s.this error are like below
2015-05-22 09:07:44 [Vb Rep] Error replicating vbucket 98. Please see logs for details.
2015-05-22 09:07:44 [Vb Rep] Error replicating vbucket 697. Please see logs for details.
In elasticsearch my employee index docs structure like
{
"_index": "employee",
"_type": "couchbaseCheckpoint",
"_id": "vbucket921UUID",
"_score": 1,
"_source": {
"doc": {
"uuid": "ec88aeb16c00427698f079d8a3fa7097"
}
}
}
And i write search query like,I run this query in http://127.0.0.1:9200/_plugin/head/
http://127.0.0.1:9200/employee/_search/
{
"query": {
"match":{
"query":"ec88aeb16c00427698f079d8a3fa7097",
"fields":["uuid"]
}
}
}
It giving error
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[J2CjiG2vQqqrG2h5jlsudg][couchrecords][0]: SearchParseException[[couchrecords][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][1]: SearchParseException[[couchrecords][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][2]: SearchParseException[[couchrecords][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][3]: SearchParseException[[couchrecords][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][4]: SearchParseException[[couchrecords][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }]",
"status": 400
}

The couchbaseCheckpoint document is used by the plugin to save state for each vBucket, so that it can emulate the XDCR protocol correctly. That's why there are 1025 of them - 1024 vBuckets plus one global state doc.
The fact that you have 1025 docs in ElasticSearch means that you ONLY got the state docs and none of the actual docs got replicated. Did you set up mappings in ElasticSearch like it says in the installation guide? This looks like a problem with indexing, so the ElasticSearch log will actually have meaningful errors that will tell you why it can't index any of your documents. The Couchbase log only tells you that it couldn't replicate something

Related

Elasticsearch: search_as_you_type datatype vs. tokenizer edge_ngram

What is the difference between new search_as_you_type datatype in Elasticsearch and tokenizer type edge_ngram? Which one to prefer in building search-as-you-type search engine?
Documentation of Elasticsearch gives both implementations:
search_as_you_type datatype: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html
tokenizer type edge_ngram: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html (Look at the example of how to set up a field for search-as-you-type.)
UPDATE
Elasticsearch version : 7.6.1
I indexed my data with a data type search_as_you_type according to the latest Elasticsearch documentation and trying to build a simple query via Java API based on the example below:
GET my_index/_search
{
"query": {
"multi_match": {
"query": "brown f",
"type": "bool_prefix",
"fields": [
"my_field",
"my_field._2gram",
"my_field._3gram"
]
}
}
}
The point that I struggle with is adding "type": "bool_prefix".
A) I tried with MultiMatchQueryBuilder
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MatchQuery.Type.BOOLEAN_PREFIX);
and got an exception at the second line of above code:
org.elasticsearch.ElasticsearchParseException: failed to parse [multi_match] query type [boolean_prefix]. unknown type.
B) Then I tried with MatchBoolPrefixQueryBuilder
MatchBoolPrefixQueryBuilder matchBoolPrefixQueryBuilder=new MatchBoolPrefixQueryBuilder(value, fields);
got an exception
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=parsing_exception, reason=[match_bool_prefix] unknown token [START_ARRAY] after [query]]
...
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/my_dictionary/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57}],"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57},"status":400}
at line
SearchResponse searchResponse=restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
What am I doing wrong? Which one should I use and how?
SOLUTION
I solved the issue just by changing the type to:
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type("bool_prefix");
But I don't understand why the type must be hardcoded as "bool_prefix" instead of using MatchQuery.Type.BOOLEAN_PREFIXor why not possible to use MatchBoolPrefixQueryBuilder, there is no much implementation examples of this query.
The two are different things.
edge_ngram is a tokenizer, which means it kicks in at indexing time to tokenize your input data. There is also a edge_ngram token filter. Both are similar but work at different levels. See this thread to learn about the main differences.
search_as_you_type is a field type which contains a few sub-fields, one of which is called _index_prefix and which leverages the edge_ngram tokenizer.
So basically, what you see in the edge_ngram tokenizer documentation has actually been leveraged when they decided to add the new search_as_you_type field type.
UPDATE
You actually need to use
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MultiMatchQueryBuilder.Type.BOOL_PREFIX);
You can see here how that enumeration value is built

From MongoDB into ElasticSearch: Converting meta field of _id

I'm using pymongo to insert a document from MongoDB into Elasticsearch, and getting an error below. The problem looks like the '_id'. How can I convert the "_id" to "id" before inserting it into ElasticSearch?
Error:
elasticsearch.exceptions.RequestError: TransportError(400,
u'mapper_parsing_exception', u'Field [_id] is a metadata field and
cannot be added inside a document. Use the index API request
parameters.')
Code:
for doc in documents1.find():
doc_sanitized = json.loads(json_util.dumps(doc))
es.index(index='index-name', doc_type='intel', id=i, body=doc_sanitized)

How to save nested JSON data into Elasticsearch from Apache Nifi?

When I try to insert geo_shape type entry into Elasticsearch from Apache Nifi. This is just a nested JSON field - for eaxmple, in Apache Nifi my FlowFile have this nested content for geo_shape:
{
"location": "{\"type\":\"polygon\",\"coordinates\":[[[3.042514,41.79673582],[3.04182089,41.79738937],[3.04299467,41.79763732],[3.042514,41.79673582]]]}"
}
In Elasticsearch the field is specified as follows:
"location": {
"type": "geo_shape"
}
When I execute PutElasticsearch1.3, I get the following error:
MapperParsingException failed to parse [location]: nested - shape must
be an object consisting of type and coordinates
How can I parse this nested JSON string in order to save it in Elasticsearch from Apache Nifi?

Getting error in Kibana 4 when using json input in Visualization while attempting to customize dashboard

I am new to the ELK stack and have it implemented with elasticsearch version 1.4.4, logstash version 1.4.2, and kibana version 4. I am able to pull a csv file into elasticsearch using logstash and have it display in kibana.
When displaying a date from the file, the values within the date are separated out as if the dash contained within is a separator (ex. value in field is 01-01-2015, when this is displayed in kibana (regardless of display type) there will be three field entries, 01, 01, and 2015). Kibana gives a message that this is due to it being an analyzed field.
Kibana 4 has a feature to use json directly from the dashboard builder, Visualization, to change this to a non-analyzed field so that the entire string will be used, rather than separating it.
I have tried multiple formats, but this is the one that seems it should work as kibana recognizes it as valid syntax:
{ "index" : "not_analyzed" }
but when attempting to apply the change, the dashboard does not change its structure and kibana generates the following exception:
Visualize: Request to Elasticsearch failed: {"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][0]: SearchParseException[[csvtest][0]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][0]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][1]: SearchParseException[[csvtest][1]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][1]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][2]: SearchParseException[[csvtest][2]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][2]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][3]: SearchParseException[[csvtest][3]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][3]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }{[ftpEMbcOTxu0Tdf0e8i-Ig][csvtest][4]: SearchParseException[[csvtest][4]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"#timestamp\":{\"gte\":1420092000000,\"lte\":1451627999999}}}],\"must_not\":[]}}}},\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"Conn Dt\",\"size\":100,\"order\":{\"1\":\"desc\"},\"index\":\"not_analyzed\"},\"aggs\":{\"1\":{\"cardinality\":{\"field\":\"Area Cd\"}}}}}}]]]; nested: SearchParseException[[csvtest][4]: query[ConstantScore(BooleanFilter(+cache(#timestamp:[1420092000000 TO 1451627999999])))],from[-1],size[0]: Parse Failure [Unknown key for a VALUE_STRING in [2]: [index].]]; }]"} less
It can be seen within where the index: was changed to not_analyzed from analyzed; also the setting that has wildcard analyzed: true was also changed to false withing the advanced object configuration with the same result.
Try index Mapping and put the date field as non-analyzed.
For Example:
"<index name>": {
"mappings": {
"<Mapping type>": {
"properties": {
"City": {
"type": "string",
"index": "not_analyzed"
},
"Date": {
"type": "string",
"index": "not_analyzed"
}
}
}
I had a similar issue today with the following message:
Parse Failure [Unknown key for a VALUE_STRING in [logTime]: [offset].]]; }]
I was sending a date histogram aggregation request against Elasticsearch 1.4.5 with the following payload:
['logTime'].forEach(function (field) {
body.aggregations[field] = {
date_histogram: {
field: field,
interval: 'week',
time_zone: '+00:00',
offset: '15h',
min_doc_count: 0,
extended_bounds: {
min: 1440946800000,
max: 1441551599999
}
}
};
});
Note the use of offset parameter for the date_histogram. This parameter is introduced in Elasticsearch version 1.5.0 only. So, my 1.4.5 ES was complaining that this offset key was Unknown.
Replacing with post_offset as follows solved the problem though I had to adjust the value of the time_zone parameter as well. As a side note, post_offset is deprecated and replaced with offset since v1.5.
['logTime'].forEach(function (field) {
body.aggregations[field] = {
date_histogram: {
field: field,
interval: 'week',
time_zone: '+09:00',
post_offset: '-9h',
min_doc_count: 0,
extended_bounds: {
min: 1440946800000,
max: 1441551599999
}
}
};
});

How to reject "invalid" documents in ElasticSearch

We're currently using a Couchbase Plugin (transport-couchbase) to transport and index the data into ElasticSearch (http://docs.couchbase.com/couchbase-elastic-search/)
I've taken a look at ElasticSearch's mapping documentation here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
My understanding is that if you rely on defaults for ElasticSearch, once a document gets indexed, ElasticSearch will create a dynamic mapping for that document type. This is what we've defaulted to.
We ran into issues where after adding a specific document type, and when the transport plugin inserts an "invalid" document (the document's field type is now different -- from string -> array), ElasticSearch throws an exception and essentially breaks the replication from Couchbase to ElasticSearch. The exception looks like this:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property
[xyz]
java.lang.RuntimeException: indexing error MapperParsingException[failed to parse
[doc.myfield]]; nested: ElasticsearchIllegalArgumentException[unknown property[xyz]]
Is there a way we can configure ElasticSearch so that "invalid" documents simply get filtered without throwing exception and breaking the replication?
Thanks.
{
"tweet" : {
"dynamic": "strict",
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html

Resources