Elasticsearch create doc with custom _id - elasticsearch

is there any way to create document without indicating the _id in the URL? I understand that there is option to create document using http://localhost:9200/[index_name]/[index_type]/[_id], I do have issues creating a document using this option as my _id is not auto-generated and the _id might have special characters such as # or &. I am currently using elasticsearch version 1.3.

One way to create documents without having to add the URL-encoded ID in the URL is to use the _bulk API. It simply goes like this:
POST index/doc/_bulk
{"index": {"_id": "1234#5678"}}
{"field": "value", "number": 34}
{"index": {"_id": "5555#7896"}}
{"field": "another", "number": 45}
As you can see, you can index several documents and their IDs is simply given within quotes inside the body of the bulk request. The URL itself simply invokes the _bulk endpoint.

Related

How to make _source field dynamic in elasticsearch search template?

While using search query in elastic search we define what fields we required in the response
"_source": ["name", "age"]
And while working with search templates we have to set _source fields value while inserting search template to ES Cluster.
"_source": ["name", "age"]
but the problem with the search template is that it will always return us name and age and to get other fields we have to change our search template accordingly.
Is there any way we can pass search fields from the client so that it will only return fields in response to which the user asked?
I have achieved that just for one field like if you do this
"_source": "{{field}}"
then while search index via template you can do this
POST index_name/_search/template
{
"id": template_id,
"params": {
"field": "name"
}
}
This search query returning the name field in response but I could not find a way to pass it as in array or in another format so I can get multiple fields.
Absolutely!!
Your search template should look like this:
"_source": {{#toJson}}fields{{/toJson}}
And then you can call it like this:
POST index_name/_search/template
{
"id": template_id,
"params": {
"fields": ["name"]
}
}
What it's going to do is to transform the params.fields array into JSON and so the generated query will look like this:
"_source": ["name"]

Elasticsearch query to get results irrespective of spaces in search text

I am trying to fetch data from Elasticsearch matching from a field name. I have following two records
{
"_index": "sam_index",
"_type": "doc",
"_id": "key",
"_version": 1,
"_score": 2,
"_source": {
"name": "Sample Name"
}
}
and
{
"_index": "sam_index",
"_type": "doc",
"_id": "key1",
"_version": 1,
"_score": 2,
"_source": {
"name": "Sample Name"
}
}
When I try to search using texts like sam, sample, Sa, etc, I able fetch both records by using match_phrase_prefix query. The query I tried with match_phrase_prefix is
GET sam_index/doc/_search
{
"query": {
"match_phrase_prefix" : {
"name": "sample"
}
}
}
I am not able to fetch the records when I try to search with string samplen. I need search and get results irrespective of spaces between texts. How can I achieve this in Elasticsearch?
First, you need to understand how Elasticsearch works and why it gives the result and doesn't give the result.
ES works on the token match, Documents which you index in ES goes through the analysis process and creates and stores the tokens generated from this process to inverted index which is used for searching.
Now when you make a query then that query also generates the search tokens, these can be as it is in the search query in case of term query or tokens based on the analyzer defined on the search field in case of match query. Hence it's very important to understand the internals of your search query.
Also, it's very important to understand the mapping of your index, ES uses the standard analyzer by default on the text fields.
You can use the Explain API to understand the internals of the query like which search tokens are generated by your search query, how documents matched to it and on what basis score is calculated.
In your case, I created the name field as text, which uses the word joined analyzer explained in Ignore spaces in Elasticsearch and I was able to get the document which consists of sample name when searched for samplen.
Let us know if you also want to achieve the same and if it solves your issue.

How does a JSON object gets tokenized and indexed in Elasticsearch

I recently started working on Elasticsearch and could not figure out how a JSON object gets tokenized and gets stored in the inverted index.
Consider below JSON has been inserted.
{
"city": "Seattle",
"state": "WA",
"location": {
"lat": "47.6062095",
"lon": "-122.3320708"
}
}
I can perform an URI search like this
GET /my_index/_search?q=city:seattle
This search would return the above document, but how does Elasticsearch could be able to search 'seattle' only in the 'city' field. If it tokenizes the complete JSON, all the keys and values would be separated, then how the mapping between key token and value token would be maintained.
Because the indexed tokens point to the original document, which is also stored.
Have a look at Inverted Index in the elastic docs.

Elasticsearch: Use match query along with autocomplete

I want to use match query along with autocomplete suggestion in ES5. Basically I want to restrict my autocomplete result based on an attribute, like autocomplete should return result within a city only.
MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("cityName", city);
SuggestBuilder suggestBuilder = new SuggestBuilder()
.addSuggestion("region", SuggestBuilders.completionSuggestion("region").text(text));
SearchResponse response = client.prepareSearch(index).setTypes(type)
.suggest(suggestBuilder)
.setQuery(queryBuilder)
.execute()
.actionGet();
The above doesn't seem to work correctly. I am getting both the results in the response both independent of each other.
Any suggestion?
It looks like the suggestion builder is creating a completion suggester. Completion suggesters are stored in a specialized structure that is separate from the main index, which means it has no access to your filter fields like cityName. To filter suggestions you need to explicitly define those same filter values when you create the suggestion, separate to the attributes you are indexing for the document to which the suggestion is attached. These suggester filters are called context. More information can be found in the docs.
The docs linked to above are going to explain this better than I can, but here is a short example. Using a mapping like the following:
"auto_suggest": {
"type": "completion",
"analyzer": "simple",
"contexts": [
{
"name": "cityName",
"type": "category",
"path": "cityName"
}
]
}
This section of the index settings defines a completion suggester called auto_suggest with a cityName context that can be used to filter the suggestions. Note that the path value is set, which means this context filter gets its value from the cityName attribute in your main index. You can remove the path value if you want to explicitly set the context to something that isn't already in the main index.
To request suggestions while providing context, something like this in combination with the settings above should work:
"suggest": {
"auto_complete":{
"text":"Silv",
"completion": {
"field" : "auto_suggest",
"size": 10,
"fuzzy" : {
"fuzziness" : 2
},
"contexts": {
"cityName": [ "Los Angeles" ]
}
}
}
}
Note that this request also allows for fuzziness, to make it a little resilient to spelling mistakes. It also restricts the number of suggestions returned to 10.
It's also worth noting that in ES 5.x completion suggester are document centric, so if multiple documents have the same suggestion, you will receive duplicates of that suggestion if it matches the characters entered. There's an option in ES 6 to de-duplicate suggestions, but nothing similar in 5.x. Again it's best to think of completion suggesters existing in their own index, specifically an FST, which is explained in more detail here.

Can I create a document with the update API if the document doesn't exist yet

I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!
This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}
AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end
I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.

Resources