I have a couchbase cluster setup as the primary source for data. From this a subset of data is synced to a elasticsearch cluster via the Couchbase Transport Plugin for ElasticSearch(https://github.com/couchbaselabs/elasticsearch-transport-couchbase) which sets up an XDCR stream from couchbase to elasticsearch.
Due to some issues with the elasticsearch cluster all data needs to be synced again from couchbase to elasticsearch. I have tried recreating XDCR but that does not seem to help as it only copies a very small subset of documents. Is there a way by which this can be achieved?
Additional details
Couchbase version: 3.1.0
Number of couchbase documents: 50K+
Documents synced to elasticsearch: around 700 (expected 20K+)
If a document in couchbase is modified it is successfully synced to elasticsearch
The issue you're experiencing is likely in one of the following: XDCR, the Couchbase Transport Plugin for Elasticsearch, or Elasticsearch itself.
Start by checking for XDCR errors. You can find your XDCR logs using these instructions. Be aware that the Transport Plugin uses XDCR v1 and almost everything else in Couchbase uses v2.
Consult the advice in troubleshooting the Couchbase Transport Plugin for Elasticsearch. Instructions should work for you even though they are from the 4.0 docs.
Pay attention to how your documents are being mapped to Elasticsearch. You mention that you're expecting only a subset of documents to be synced to Elasticsearch, so it's possible that you have lost a setting or misconfigured something. You can enable logging and observe a small set of test data. At TRACE level, you should be able to see each document that is inspected.
If all of that fails, make sure the basics are working by indexing the beer sample dataset, following the directions in the Couchbase docs. ES is probably not the issue, but test with a fresh ES instance will rule out problems on that side.
Related
Presently AWS Elasticsearch cluster version is 6.3 and I am planning to upgrade it to 7. reindexing is also have to be done. reindexing is required
to have _doc as type for the indices instead of our custom mapping types.
Below are my queries:
1. What is the end to end process of upgrading AWS ES cluster version.
2. What are the impacts post upgrade.
3. Any specific backup is required?
4. How to perform upgrade in AWS cluster?
5. Post upgrade , Do I need to carry any validtion?
6. when to do reindexing? post cluster upgrade?
What is the end to end process of upgrading AWS ES cluster version.
You can perform an in-place upgrade of an AWS ES cluster from the AWS console. Upgrade triggers a blue green deployment and takes quite a while. For example, We upgraded an ES 6.8 cluster with 4 nodes (10 TB each) to OpenSearch 1.3 recently and it took almost 12 hours to complete.
What are the impacts post upgrade.
By default, AWS migrates all the data and resources (mapping templates, alerts, lifecycle policies etc) into the new upgraded cluster.
If you have some scripts that uses the ES APIs, expect some API paths being changed in the upgraded one. For example, the /_template path in ES 6.8 becomes _index_template in OpenSearch 1.3.
By default, AWS routes all traffic to the new cluster and does not mess around with the ES endpoint. So, if you have some data ingestion pipelines that may use the ES endpoint, it should work automatically. However, I would still recommend you to check the logs of each of your data collectors for any errors.
For example, If you are using kinesis firehose delivery streams, check destination error logs from the AWS console. If you are using logstash or vector, check their logs too.
Any specific backup is required?
It's always a good idea to take periodic snapshots of your AWS ES domain. If something goes wrong, you can always spin up a new domain from a previous working snapshot.
How to perform upgrade in AWS cluster?
Not sure what you mean by this. There's actually no way to manually access the underlying nodes/machines and perform the upgrade yourself. This is because the ES cluster is fully managed by AWS.
Post upgrade , Do I need to carry any validtion?
As mentioned in Question no.2 answer, it's definitely a good idea to check your ingestion pipelines. Check for any warning/errors on the logs. You can also use the Kibana/OpensearchDashboard to visually inspect your data for anything weird.
When to do reindexing? post cluster upgrade?
After you perform the in-place upgrade from AWS console, your existing indices and data are all copied to the newly upgraded cluster.
I'm using Elasticsearch to drive a "search website" feature. I'd like to collect statistics about what people search for (and which search queries are popular).
Elasticsearch is currently running behind Nginx, so I could extract this information from the Nginx access logs - but maybe Elasticsearch can be made to track this iinformation itself?
I found the Index stats API but that seems to be more abstract. It can be used to determne the average time needed to answer a query and such things, but it does not keep track of individual queries.
I am using a similar configuration (ES behind nginx), and I up to now I always just checked nginx' logfiles directly. However, thinking about your question, it makes much sense to route the nginx log files through the Elastic stack to Elastic Search using logstash, this seems to be the cleanest way.
Apparently in deprecated version there were some security auditing options using a plugin termed Shield or Security, but as I said, configuring logstash to ingest nginx logfiles directly seems most endurable way for your purposes.
Further reading and detailed instructions
discuss.elastic.co: How to get elaticsearch access logs
https://sysadmins.co.za/how-to-ingest-nginx-access-logs-to-elasticsearch-using-filebeat-and-logstash/
Elasticsearch Access Log
how to enable ElasticSearch http access log
I am working on an Elasticsearch indexing task.
Currently we are indexing hundreds of thousands of documents to ES cluster (many ES instances) on daily basis. We simply read data from DB and various of data sources, collate and compile them and directly index them to ES using python elasticsearch-dsl lib.
Currently;
Now we want use Logstash between the application server and ES however we don't want to change our current codebase. Thus we could easily switch between ES to Logstash and keep business logic in our application.
My question is how can I use logstash as a transparent middleware. I simply want logstash to forward all rest messages as they are to the ES cluster.
Want to;
I have cross-datacenter replication (XDCR) setup to replicate data from a Couchbase bucket to an Elasticsearch index. Some times, however, Elasticsearch fails and needs to be restarted. During this time any changes to the Couchbase bucket are not replicated. Is there a recommended way of dealing with this? Ideally, once Elasticsearch restarts the data added to Couchbase in the interim gets replcated successfully.
I'm having an issue with my Elasticsearch replication of my Couchbase DB.
Certain documents in my DB are not being replicated correctly. Rather than replicating as full couchbase documents, there is an entry with _type of couchbaseCheckpoint pointing to certain documents. These checkpoints never seem to update or grab the documents correctly, even after a refresh. This seems to occur at random, with some documents replicating correctly and being stored as full couchbaseDocuments, while others stay as these checkpoints. I'm not seeing any errors related to this in my logs at all.
versions:
couchbase 3.0.1
elasticsearch 1.3
couchbase elasticsearch plugin 2.0.0
{
"name":"xxx",
"age": "1",
}