Elasticsearch CouchDB River no hit - elasticsearch

I have a problem with CouchDB and Elasticsearch. i use Docker to realise it. i have a working couchdb container on the default port. Now i use this container:
registry.hub.docker.com/u/jeko/elasticsearch-river-couchdb/
And i insert a new couchdb connection with this:
curl -X PUT '127.0.0.1:9200/_river/testdb/_meta' -d ' { "type" : "couchdb", "couchdb" : { "host" : "couchdb", "port" : 5984, "db" : "articles", "filter" : null }, "index" : { "index" : "articles", "type" : "articles", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'
to have a working elasticsearch with the couchdb river. Now i checked with curl host/articles/articles/_search?pretty=true the documents. The Hits are empty.
{
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
i turned the debugger on and checked the logging file. The output is this: http://pastebin.com/ETkNmJzT
The only conspicuous thing i found is this line: [2015-02-20 14:04:24,554][DEBUG][plugins ] [Arc] [/elasticsearch/plugins/river-couchdb/_site] directory does not exist.
But i doesn't understand why it doesn't work. i can curl the IP

Related

elasticsearch - snapshot creation failed due to RepositoryMissingException

I'm trying to create a snapshot in s3 bucket. After running request to create the new snapshot, i'm checking the status of the new snapshot and i see that snapshot state is PARTIAL, due to RepositoryMissingException.
Why is that happening ?
More information:
snapshot configuration:
$ curl localhost:9200/_cat/repositories
s3_repository s3
creation of new snapshot:
$ curl -XPUT localhost:9200/_snapshot/s3_repository/snap10
{"accepted":true}
get details about created snapshot (here we can see the failure):
$ curl localhost:9200/_snapshot/s3_repository/snap10?pretty
{
"snapshots" : [ {
"snapshot" : "snap10",
"version_id" : 2040699,
"version" : "2.4.6",
"indices" : [ "twitter" ],
"state" : "PARTIAL",
"start_time" : "2018-09-27T08:24:13.431Z",
"start_time_in_millis" : 1538036653431,
"end_time" : "2018-09-27T08:24:13.823Z",
"end_time_in_millis" : 1538036653823,
"duration_in_millis" : 392,
"failures" : [ {
"index" : "twitter",
"shard_id" : 1,
"reason" : "RepositoryMissingException[[s3_repository] missing]",
"node_id" : "0yJw77XwSX62rUnhDAAclw",
"status" : "INTERNAL_SERVER_ERROR"
}, {
"index" : "twitter",
"shard_id" : 0,
"reason" : "RepositoryMissingException[[s3_repository] missing]",
"node_id" : "WEzVGyjXSLWuzfD_w-sBlA",
"status" : "INTERNAL_SERVER_ERROR"
} ],
"shards" : {
"total" : 2,
"failed" : 2,
"successful" : 0
}
} ]
}
Can you please assist with the issue ? why the error says that RepositoryMissingException?
Please let me know if more information is needed.
In the end the issue was that cloud-aws plugin was installed only on master node. Once I installed the plugin on the data nodes - it worked.

Elasticsearch _reindex API not copying documents

I'm trying to upgrade an old 1.5 elastic index to 6.0, according to docs (https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html)
I can create a new index in 6.0 and then use reindex from remote using reindex from remote (https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade-remote.html)
Both of these instances are running inside docker containers I just wanted to test this in local before actually doing it in production
I can see there are documents indexed in my old index.
curl -XGET 'http://localhost:9200/old_index/_search?pretty'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "old_index",
"_type" : "item",
"_id" : "92",
"_score" : 1.0,
"_source":{"user_id":3,"slug":"asdfaisjeilej","name":"lake.jpgasdad","item_type":"image","created_at":"2018-01-23T18:11:30Z","deleted_at":null,"content_length":1252171}
}]}
}
After creating a new index (new_index) in my elasticsearch 6.0 instance, with a slightly diff mapping (change string types to text), I then proceed to reindex from remote using the following command. (note than my other instance is running in port 9400)
curl -XPOST 'localhost:9400/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
"source": {
"remote": {
"host": "http://localhost:9200"
},
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
I get the following response
{
"took" : 136,
"timed_out" : false,
"total" : 0,
"updated" : 0,
"created" : 0,
"deleted" : 0,
"batches" : 0,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
So basically, documents from old_index are not being copied to new_index, and I have no idea why this is happening. Is there a step I'm missing, I'm following elasticsearch docs exactly as they read apparently.
As I mentioned, I also had the same issue while migrating from Elasticsearch-2 to Elasticsearch-6 after I tested the remote-reindexing in staging environment without dockers.
My workaround was to create an instance of the old version (not on docker), load it from backup and reindex from it to elasticsearch 6 instance that not running on docker.
If you still want to run elasticsearch 6 on docker you can always mount the data to your container.
Hope you find it helpful.

elasticsearch doesn't update documents

I'm facing up with a trouble related with document updatings.
I'm able to index(create) documents and they are correctly added on index.
Nevertheless, when I'm trying to update one of them, the operation is not made, the document is not updated.
When I first time add the document it's like:
{
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
This is the code I'm using in order to build UpdateRequest:
this.elasticsearchResources.getElasticsearchClient()
.prepareUpdate()
.setIndex(this.user.getMe().getUser())
.setType(type)
.setId(id.toString())
.setDoc(source)
.setUpsert(source)
.setDetectNoop(true);
I've also been able to debug which's the content of this request begore sending it to elasticsearch. The document is:
{
"user":"user4",
"timestamp":"2016-12-16T15:00:22.645Z",
"startTimestamp":"2016-12-16T15:00:22.645Z",
"dueTimestamp":null,
"closingTimestamp":null,
"matter":"F1",
"comment":null,
"status":0,
"backlogStatus":20,
"metainfos":{
},
"resources":[
],
"notes":null
}
As you can see the only difference is metainfos is empty when I try to update the document.
After having performed this update request the document is not updated. I mean the content of metainfos keeps as before:
#curl -XGET 'http://localhost:9200/user4/fuas/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "327c9435-c394-11e6-aa90-02420a011808",
"_score" : 1.0,
"_routing" : "user4",
"_source" : {
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
>>>>>>>> "ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
} ]
}
}
I don't quite figure out what's wrong. Any ideas?
ElasticSearch will not update an empty object. You can try with:
null "metainfos":null
or
"metainfos":"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa":[]
to clean the field.

Entries of ElasticSearch hits

I am performing an ElasticSearch query through
curl -XGET 'http://localhost:9200/_search?pretty' -d '{\"_source\":[\"data.js_event\", \"data.timestamp\",\"data.uid\",\"data.element\"],\"query\":{\"match\":{\"event\":\"first_time_user_event\"}}}'
I am ONLY interested in the ouput of _source but the retrieval leads to something like
{"took" : 46,
"timed_out" : false,
"_shards" : {
"total" : 71,
"successful" : 71,
"failed" : 0
},
"hits" : {
"total" : 2062326,
"max_score" : 4.8918204,
"hits" : [ {
"_index" : "logstash-2015.11.22",
"_type" : "fluentd",
"_id" : "AVEv_blDT1yShMIEDDmv",
"_score" : 4.8918204,
"_source":{"data":{"js_event":"leave_page","timestamp":"2015-11-22T16:18:47.088Z","uid":"l4Eys1T/rAETpysn7E/Jog==","element":"body"}}
}, {
"_index" : "logstash-2015.11.21",
"_type" : "fluentd",
"_id" : "AVEnZa5nT1yShMIEDDW8",
"_score" : 4.0081544,
"_source":{"data":{"js_event":"hover","timestamp":"2015-11-21T00:15:15.097Z","uid":"E/4Fdl5uETvhQeX/FZIWfQ==","element":"infographic-new-format-selector"}}
},
...
Is there a way to get rid of _index, _type, _id and _score? The reason is that the query is performed on a remote server and I would like to limit the size of downloaded data.
Thanks.
Yes, you can use response filtering (only available since ES 1.7) by specifying what you want in the response (i.e. filter_path=hits.hits._source) and ES will filter it out for you:
curl -XGET 'http://localhost:9200/_search?filter_path=hits.hits._source&pretty' -d '{\"_source\":[\"data.js_event\", \"data.timestamp\",\"data.uid\",\"data.element\"],\"query\":{\"match\":{\"event\":\"first_time_user_event\"}}}'

Unable to search attachment type field in an ElasticSearch indexed document

Search does not return any results although I do have a document that should match the query.
I do have the ElasticSearch mapper-attachments plugin installed per https://github.com/elasticsearch/elasticsearch-mapper-attachments. I have also googled the topic as well as browsed similar questions in stack overflow, but have not found an answer.
Here's what I typed into a windows 7 command prompt:
c:\Java\elasticsearch-1.3.4>curl -XDELETE localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/_mapping -d{\"
contact\":{\"properties\":{\"my_attachment\":{\"type\":\"attachment\"}}}}
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/1 -d{\"my_atta
chment\":\"SGVsbG8=\"}
{"_index":"tce","_type":"contact","_id":"1","_version":1,"created":true}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "tce",
"_type" : "contact",
"_id" : "1",
"_score" : 1.0,
"_source":{"my_attachment":"SGVsbG8="}
} ]
}
}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty -d{\"
query\":{\"term\":{\"my_attachment\":\"Hello\"}}}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Note that the base64 encoded value of "Hello" is "SGVsbG8=", which is the value I have inserted into the "my_attachment" field of the document.
I am assuming that the mapper-attachments plugin has been deployed correctly because I don't get an error executing the mapping command above.
Any help would be greatly appreciated.
What analyzer is running against the my_attachment field?
if it's the standard analyser (can't see any listed) then the Hello in the text will be made lowercase in the index.
i.e. when doing a term search (which doesn't have an analyzer on it) - try searching for hello
curl localhost:9200/tce/contact/_search?pretty -d'
{"query":
{"term":
{"my_attachment":"hello"
}}}'
you can also see which terms have been added to the index:
curl 'http://localhost:9200/tce/contact/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_attachment"
}
}
}
}'

Resources