From MongoDB into ElasticSearch: Converting meta field of _id - elasticsearch

I'm using pymongo to insert a document from MongoDB into Elasticsearch, and getting an error below. The problem looks like the '_id'. How can I convert the "_id" to "id" before inserting it into ElasticSearch?
Error:
elasticsearch.exceptions.RequestError: TransportError(400,
u'mapper_parsing_exception', u'Field [_id] is a metadata field and
cannot be added inside a document. Use the index API request
parameters.')
Code:
for doc in documents1.find():
doc_sanitized = json.loads(json_util.dumps(doc))
es.index(index='index-name', doc_type='intel', id=i, body=doc_sanitized)

Related

Elasticsearch: search_as_you_type datatype vs. tokenizer edge_ngram

What is the difference between new search_as_you_type datatype in Elasticsearch and tokenizer type edge_ngram? Which one to prefer in building search-as-you-type search engine?
Documentation of Elasticsearch gives both implementations:
search_as_you_type datatype: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html
tokenizer type edge_ngram: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html (Look at the example of how to set up a field for search-as-you-type.)
UPDATE
Elasticsearch version : 7.6.1
I indexed my data with a data type search_as_you_type according to the latest Elasticsearch documentation and trying to build a simple query via Java API based on the example below:
GET my_index/_search
{
"query": {
"multi_match": {
"query": "brown f",
"type": "bool_prefix",
"fields": [
"my_field",
"my_field._2gram",
"my_field._3gram"
]
}
}
}
The point that I struggle with is adding "type": "bool_prefix".
A) I tried with MultiMatchQueryBuilder
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MatchQuery.Type.BOOLEAN_PREFIX);
and got an exception at the second line of above code:
org.elasticsearch.ElasticsearchParseException: failed to parse [multi_match] query type [boolean_prefix]. unknown type.
B) Then I tried with MatchBoolPrefixQueryBuilder
MatchBoolPrefixQueryBuilder matchBoolPrefixQueryBuilder=new MatchBoolPrefixQueryBuilder(value, fields);
got an exception
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=parsing_exception, reason=[match_bool_prefix] unknown token [START_ARRAY] after [query]]
...
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/my_dictionary/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57}],"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57},"status":400}
at line
SearchResponse searchResponse=restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
What am I doing wrong? Which one should I use and how?
SOLUTION
I solved the issue just by changing the type to:
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type("bool_prefix");
But I don't understand why the type must be hardcoded as "bool_prefix" instead of using MatchQuery.Type.BOOLEAN_PREFIXor why not possible to use MatchBoolPrefixQueryBuilder, there is no much implementation examples of this query.
The two are different things.
edge_ngram is a tokenizer, which means it kicks in at indexing time to tokenize your input data. There is also a edge_ngram token filter. Both are similar but work at different levels. See this thread to learn about the main differences.
search_as_you_type is a field type which contains a few sub-fields, one of which is called _index_prefix and which leverages the edge_ngram tokenizer.
So basically, what you see in the edge_ngram tokenizer documentation has actually been leveraged when they decided to add the new search_as_you_type field type.
UPDATE
You actually need to use
MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MultiMatchQueryBuilder.Type.BOOL_PREFIX);
You can see here how that enumeration value is built

Update a document using another field than _id in ElasticSearch

I would like to do a partial update of a document in ElasticSearch 2.3. The documentation shows:
POST /website/blog/1/_update
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
Is there a way to update a document using another field other than the _id (here 1) to identify the document?
Use update_by_query API and run a query which will select the documents that match the other field that you want. Basically, with that query you identify the documents you want to update following your own rules.

By Default couchbaseCheckPoint type created for document in elasticsearch sync with couchbase

I am new to elasticsearch and couchbase.I am using both in my project.My requirement is to sync couchbase bucket with elasticsearch indices by using couchbase XDCR.
In couchbase, Bucket name "Employee" and structure of one this document is
{
"empName":"Stev Jobs",
"dept":"IT",
"company":"xxxx",
"salary":"30000",
"country":"USA"
}
I created index in elasticsearch that is employee and also created cluster reference in couchbase with elasticsearch cluster.
After setting all this I started replication between employee bucket of couchbase and employee index of elasticsearch. It created indices in elasticsearch,but this index contains more than couchbase bucket documents.
My couchbase bucket employee has 182 records but in elasticsearch employee index showing docs 1025.
And In couchbase when sync it showing some error s.this error are like below
2015-05-22 09:07:44 [Vb Rep] Error replicating vbucket 98. Please see logs for details.
2015-05-22 09:07:44 [Vb Rep] Error replicating vbucket 697. Please see logs for details.
In elasticsearch my employee index docs structure like
{
"_index": "employee",
"_type": "couchbaseCheckpoint",
"_id": "vbucket921UUID",
"_score": 1,
"_source": {
"doc": {
"uuid": "ec88aeb16c00427698f079d8a3fa7097"
}
}
}
And i write search query like,I run this query in http://127.0.0.1:9200/_plugin/head/
http://127.0.0.1:9200/employee/_search/
{
"query": {
"match":{
"query":"ec88aeb16c00427698f079d8a3fa7097",
"fields":["uuid"]
}
}
}
It giving error
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[J2CjiG2vQqqrG2h5jlsudg][couchrecords][0]: SearchParseException[[couchrecords][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][1]: SearchParseException[[couchrecords][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][2]: SearchParseException[[couchrecords][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][3]: SearchParseException[[couchrecords][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][4]: SearchParseException[[couchrecords][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }]",
"status": 400
}
The couchbaseCheckpoint document is used by the plugin to save state for each vBucket, so that it can emulate the XDCR protocol correctly. That's why there are 1025 of them - 1024 vBuckets plus one global state doc.
The fact that you have 1025 docs in ElasticSearch means that you ONLY got the state docs and none of the actual docs got replicated. Did you set up mappings in ElasticSearch like it says in the installation guide? This looks like a problem with indexing, so the ElasticSearch log will actually have meaningful errors that will tell you why it can't index any of your documents. The Couchbase log only tells you that it couldn't replicate something

How to add documents to existing index in elasticsearch

Am using Elasticsearch 1.4. My requirement is I will have data every hour and that needs to be uploaded. So the approach that I have taken is to create an index - "demo" and upload the data. So, the first hour data gets inserted. Now, my question is how to append the subsequent hours data into this index.
PUT /demo/userdetails/1
{
"user" : "kimchy",
"message" : "trying out Elastic Search"
}
Now I am trying to add another document
{"user": "swarna","message":"hi"}
You simply need to PUT the additional documents. In your example above you did
PUT /demo/userdetails/1 { "user" : "kimchy", "message" : "trying out Elastic Search" }
Now you would do this:
PUT /demo/userdetails/2 {"user": "swarna","message":"hi"}
In you command there demo is the index, userdetails is the type, and the number is the document id. If you omit the document id ES will make one up for you.

How to reject "invalid" documents in ElasticSearch

We're currently using a Couchbase Plugin (transport-couchbase) to transport and index the data into ElasticSearch (http://docs.couchbase.com/couchbase-elastic-search/)
I've taken a look at ElasticSearch's mapping documentation here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
My understanding is that if you rely on defaults for ElasticSearch, once a document gets indexed, ElasticSearch will create a dynamic mapping for that document type. This is what we've defaulted to.
We ran into issues where after adding a specific document type, and when the transport plugin inserts an "invalid" document (the document's field type is now different -- from string -> array), ElasticSearch throws an exception and essentially breaks the replication from Couchbase to ElasticSearch. The exception looks like this:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property
[xyz]
java.lang.RuntimeException: indexing error MapperParsingException[failed to parse
[doc.myfield]]; nested: ElasticsearchIllegalArgumentException[unknown property[xyz]]
Is there a way we can configure ElasticSearch so that "invalid" documents simply get filtered without throwing exception and breaking the replication?
Thanks.
{
"tweet" : {
"dynamic": "strict",
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html

Resources