Autocompletion elasticsearch - elasticsearch

I'm following along with the tutorial for elasticsearch's completion suggester here. It's pretty easy to get going. But I'm unable to get completions for more than one word. In the example single incomplete words give great results, e.g
"Nir" -> "options":[{"text":"Nevermind Nirvana..."
"Nev" -> "options":[{"text":"Nevermind Nirvana..."
But the following fail:
"Nirvana Nev" -> Nothing!
"Nevermind Nir" -> Nothing!
I can get it to work by populating combinatorial options e.g
curl -X PUT "localhost:9200/music/_doc/1?refresh" -H 'Content-Type: application/json' -d'
{
"suggest" : {
"input": [ "Nevermind", "Nirvana", "Nirvana Nevermind", "Nevermind Nirvana" ],
"weight" : 34
},
"title" : "Nevermind by Nirvana"
}
'
But this approach will soon lead to massive variants of text added to the input.
There must be a better way?

Related

Restore elasticsearch cluster onto another cluster

Hello i have 3 node elasticsearch cluster ( source ) and i have snapshot called
snapshot-1 which taken from source cluster
and i have another 6 node elasticsearch cluster ( destination ) cluster
and when i restore my destinatition cluster from snapshot-1 using this command
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
> {
> "indices": "*",
> "ignore_unavailable": true,
> "include_global_state": false,
> "rename_pattern": ".security(.+)",
> "rename_replacement": "delete_$1",
> "include_aliases": false
> }
> '
{
and i got this error
"error" : {
"root_cause" : [
{
"type" : "snapshot_restore_exception",
"reason" : "[snapshot:snapshot-1 yjg/mHsYhycHQsKiEhWVhBywxQ] cannot restore index [.ilm-history-0003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
so as you can see the index .ilm-history-0003 already exists in the cluster, but how can i do rename replacement for security,.ilm,.slm,.transfrom indices using only 1 rename_pattern?
like this one
"rename_pattern": ".security(.+)",
From my experiences the rename pattern doesn't need to be super fancy because you will probably
a) delete the index (as your renaming pattern suggests) or
b) reindex data from the restored index to new indices. In this case the naming of the restored index is insignificant.
So this is what I would suggest:
Use the following renaming pattern to include all indices. Again, from my experience, your first aim is to get the old data restored. After that you have to manage the reindexing etc.
POST /_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore
{
"indices": "*",
"ignore_unavailable": true,
"include_aliases": false,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}
This will prepend restored_ to the actual index name resulting in the following restored indices:
restored_security
restored_.ilm*
restored_.slm*
restored_.transfrom*
I hope I could help you.
solve it using this way
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
with response:
{
"indices": "*,-.slm*,-,ilm*,-.transfrom*,-security*",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": false
}

Index with ! in their name cant be filtered for recovering

I have an ES cluster whith indices name like web.analytics.data.api!monthly!2018-07_v0 and doing regular snapshots/backups
Now, when I want to restore all of them, all works pretty well. If I want to restore just a specific index however, es wont do it. The command I use:
curl -X POST "localhost:9200/_snapshot/s3_backups/20191218_060001/_restore?pretty&wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "web.analytics.data.api!monthly!2018-07_v0",
"index_settings": {
"index.number_of_replicas": 0
}
}
'
The result I get is:
{
"snapshot" : {
"snapshot" : "20191218_060001",
"indices" : [ ],
"shards" : {
"total" : 0,
"failed" : 0,
"successful" : 0
}
}
}
Please note, that If I use index without ! in its name (e.g. .kibana), it works well. Any ideas of how I can solve that? Preferably without telling developers to rename the indices. The ES in question has version 1.7.3 I am aware it is EOL, but it is what I have to work with right now.
So it was my bad in the end. The index I got did not exist (typo in it) but I was told ! is problematic so i did not double check and the test indices were picked by me, so of course they were correct...

Backup and restore some records of an elasticsearch index

I wish to take a backup of some records(eg latest 1 million records only) of an Elasticsearch index and restore this backup on a different machine. It would be better if this could be done using available/built-in Elasticsearch features.
I've tried Elasticsearch snapshot and restore (following code), but looks like it takes a backup of the whole index, and not selective records.
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
{
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
{
"indices" : "index_name",
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
The format of backup could be anything, as long as it can be successfully restored on a different machine.
you can use _reinex API. it can take any query. after reindex, you have a new index as backup, which contains requested records. easily copy it where ever you want.
complete information is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
In the end, I fetched the required data using python driver because that is what I found the easiest for the given use case.
For that, I ran an Elasticsearch query and stored its response in a file in newline-separated format and then I later restored data from it using another python script. A maximum of 10000 entries are returned this way along with the scroll ID to be used to fetch next 10000 entries and so on.
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
es_data = page['hits']['hits'] #Store this as you like
scrollId = page['_scroll_id']
page = es.scroll(scroll_id=scrollId, scroll='5m')

Error handling with curl and elasticsearch

I'm currently developing bash scripts that use elasticsearch and I need a good error-handling.
In this situation I try to add a document to elasticsearch and check if the operation succeeded.
At first I naively tried this :
response=$(curl -XPOST 'http://localhost:9200/indexation/document' -d '
{
"content":"'"$txt"'",,
"date_treatment":"'"$(date +%Y-%m-%d)"'"
}') && echo ok || echo fail
But curl doesn't work that way and still returns success (0 - which is actually logical) even though the json request is obviously incorrect (note the double comma on line 3) and elasticsearch displays errors.
So the answer isn't there. Now I think I should analyze the variable $response to catch errors (grep ?). I post this question to get hints or solutions on the way to do this in a reliable way and to make sure I'm not missing an obvious solution (maybe a curl option I don't know ?).
Additional useful things
Parsing JSON with Unix tools
Examples of the content of $response :
success :
{
"_id": "AVQz7Fg0nF90YvJIX_2C",
"_index": "indexation",
"_shards": {
"failed": 0,
"successful": 1,
"total": 1
},
"_type": "document",
"_version": 1,
"created": true
}
error :
{
"error": {
"caused_by": {
"reason": "json_parse_exception: Unexpected character (',' (code 44)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name\n at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput#139163f; line: 3, column: 17]",
"type": "json_parse_exception"
},
"reason": "failed to parse",
"root_cause": [
{
"reason": "json_parse_exception: Unexpected character (',' (code 44)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name\n at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput#139163f; line: 3, column: 17]",
"type": "json_parse_exception"
}
],
"type": "mapper_parsing_exception"
},
"status": 400
}
A simple workaround is to use the -f/--fail option.
As per documentation :
(HTTP) Fail silently (no output at all) on server errors. This is
mostly done to better enable scripts etc to better deal with failed
attempts. In normal cases when an HTTP server fails to deliver a
document, it returns an HTML document stating so (which often also
describes why and more). This flag will prevent curl from outputting
that and return error 22.
This method is not fail-safe and there are occasions where
non-successful response codes will slip through, especially when
authentication is involved (response codes 401 and 407).
example:
response=$(curl -XPOST 'http://localhost:9200/indexation/document' -d '
{
"content":"'"$txt"'",,
"date_treatment":"'"$(date +%Y-%m-%d)"'"
}' -f ) && echo ok || echo fail

Couchdb view Queries

Could you please help me in creating a view. Below is the requirement
select * from personaccount where name="srini" and user="pup" order by lastloggedin
I have to send name and user as input to the view and the data should be sorted by lastloggedin.
Below is the view I have created but it is not working
{
"language": "javascript",
"views": {
"sortdatetimefunc": {
"map": "function(doc) {
emit({
lastloggedin: doc.lastloggedin,
name: doc.name,
user: doc.user
},doc);
}"
}
}
}
And this the curl command iam using:
http://uta:password#localhost:5984/personaccount/_design/checkdatesorting/_view/sortdatetimefunc?key={\"name:srini\",\"user:pup\"}
My Questions are
As sorting will be done on key and I want it on lastloggedin so I have given that also in emit function.
But iam passing name and user only as parameters. Do we need to pass all the parameters which we give it in key?
First of all I want to convey to you for the reply, I have done the same and iam getting errors. Please help
Could you please try this on your PC, iam posting all the commands :
curl -X PUT http://uta:password#localhost:5984/person-data
curl -X PUT http://uta:password#localhost:5984/person-data/srini -d '{"Name":"SRINI", "Idnum":"383896", "Format":"NTSC", "Studio":"Disney", "Year":"2009", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:44:37+0100"}'
curl -X PUT http://uta:password#localhost:5984/person-data/raju -d '{"Name":"RAJU", "Idnum":"456787", "Format":"FAT", "Studio":"VFX", "Year":"2010", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:50:37+0100"}'
curl -X PUT http://uta:password#localhost:5984/person-data/vihar -d '{"Name":"BALA", "Idnum":"567876", "Format":"FAT32", "Studio":"YELL", "Year":"2011", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:55:37+0100"}'
Here's the view as you said I created :
{
"_id": "_design/persondestwo",
"_rev": "1-0d3b4857b8e6c9e47cc9af771c433571",
"language": "javascript",
"views": {
"personviewtwo": {
"map": "function (doc) {\u000a emit([ doc.Name, doc.Idnum, doc.lastTimeOfCall ], null);\u000a}"
}
}
}
I have fired this command from curl command :
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]&endkey=["SRINI","383896",{}]descending=true&include_docs=true
I got this error :
[4] 3000
curl: (3) [globbing] error: bad range specification after pos 99
[5] 1776
[6] 2736
[3] Done descending=true
[4] Done(3) curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]
[5] Done endkey=["SRINI","383896"]
I am not knowing what this error is.
I have also tried passing the parameters the below way and it is not helping
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?key={\"Name\":\"SRINI\",\"Idnum\": \"383896\"}&descending=true
But I get different errors on escape sequences
Overall I just want this query to be satisfied through the view :
select * from person-data where Name="SRINI" and Idnum="383896" orderby lastTimeOfCall
My concern is how to pass the multiple parameters from curl command as I get lot of errors if I do the above way.
First off, you need to use an array as your key. I would use:
function (doc) {
emit([ doc.name, doc.user, doc.lastLoggedIn ], null);
}
This basically outputs all the documents in order by name, then user, then lastLoggedIn. You can use the following URL to query.
/_design/checkdatesorting/_view/sortdatetimefunc?startkey=["srini","pup"]&endkey=["srini","pup",{}]&include_docs=true
Second, notice I did not output doc as the value of your query. It takes up much more disk space, especially if your documents are fairly large. Just use include_docs=true.
Lastly, refer to the CouchDB Wiki, it's pretty helpful.
I just stumbled upon this question. The errors you are getting are caused by not escaping this command:
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]&endkey=["SRINI","383896",{}]descending=true&include_docs=true
The & character has a special meaning on the command-line and should be escaped when part of an actual parameter.
So you should put quotes around the big URL, and escape the quotes inside it:
curl -X GET "http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=[\"SRINI\",\"383896\"]&endkey=[\"SRINI\",\"383896\",{}]descending=true&include_docs=true"

Resources