Elasticsearch Multi Get working through curl, but no results are returned through Java API - elasticsearch

I am running elasticsearch 2.3.4, but the syntax does not seem to have changed in 5.x.
Multiget over curl is working just fine. Here is what my curl looks like:
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "logs-2017-04-30",
"_id" : "e72927c2-751c-4b33-86de-44a494abf78f"
}
]
}'
And when I want to pull the "message" field off that response, I use this request:
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "logs-2017-04-30",
"_id" : "e72927c2-751c-4b33-86de-44a494abf78f",
"fields" : ["message"]
}
]
}'
Both of the above queries return the log and information that I am looking for.
But when I try to translate it to Java like this:
MultiGetRequestBuilder request = client.prepareMultiGet();
request.add("logs-2017-04-30", null, "e72927c2-751c-4b33-86de-44a494abf78f");
MultiGetResponse mGetResponse = request.get();
for (MultiGetItemResponse itemResponse : mGetResponse.getResponses()) {
GetResponse response = itemResponse.getResponse();
logger.debug("Outputing object: " + ToStringBuilder.reflectionToString(response));
}
I appear to be getting null objects back. When I try to grab the message field off the null-looking GetResponse object, nothing is there:
GetField field = response.getField("message"); <--- returns null
What am I doing wrong? doing a rest call to elasticsearch proves the log exists, but my Java call is wrong somehow.

The documentation page for the Java multi get completely skips over the extra syntax required to retrieve data beyond the _source field. Just like the REST API, doing a multi get with the minimum information required to locate a log gets very limited information about it. In order to get specific fields from a log in a multi get call through the Java API, you must pass in a MultiGetRequest.Item to the builder. This item needs to have the fields you want specified in it before you execute the request.
Here is the code change (broken into multiple lines for clarity) that results in the fields I want being present when I make the query:
MultiGetRequestBuilder request = client.prepareMultiGet();
MultiGetRequest.Item item = new MultiGetRequest.Item("logs-2017-04-30", "null", "e72927c2-751c-4b33-86de-44a494abf78f");
item.fields("message");
request.add(item);
MultiGetResponse mGetResponse = request.get();
Now I can ask for the field I specified earlier:
GetField field = response.getField("message");

Related

how to use Postman to Insert Document in Elastic Search

What i need
i need to Insert Test document.
i have followed link https://www.elastic.co/guide/en/elasticsearch/reference/current/_index_and_query_a_document.html
Here im trying
PUT /customer/_doc/1?pretty
{
"name": "John Doe"
}
Error request body is required
snapshot when i try to run url
1.http://localhost:9200/customer?pretty
2.http://localhost:9200/customer/_doc/1?pretty&pretty
3.http://localhost:9200/customer/_doc/1?pretty&pretty&name=test
can anyone suggest how to Insert document in Elastic search.
any suggestion is most welcome.
Pass the Json in body and change the URL as shown below:
Use the PUT method for the same.
The url would be "http://localhost:9200/movies/"
movies is the name of the index.
The json to post is :
{
"id" : 1,
"title" : "Toy Story 1",
"year" : 1995,
"category" : "Adventure"
}
Here is the screenshot of the postman pushing the data to elasticseach :
You can verify the data as below :
Use the GET method and the url as http://localhost:9200/movies/_search?

Bulk indexing using elastic search

Till now i was indexing data to elastic document by document and now as the data started increasing it has become very slow and not an optimized approach. So i was searching for a bulk insert thing and found Elastic Bulk API. From the documents in their official site i got confused. The approach i am using is by passing the data as WebRequest and executing them in the elastic server. So while creating a batch/bulk insert request the API wants us to form a template like
localhost:9200/_bulk as URL and
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
to index a document with id 1 and field1 values as value 1. Also the API suggests to send the data as JSON (unpretty, to maintain a non escaping character or so). So to pass multiple document with multiple properties how can i structure my data.
I tried like this in FF RestClient , with POST and header as JSON , but RestClient is throwing some error and i know its not a valid JSON
{ "index" : { "_index" : "indexName", "_type" : "type1", "_id" : "111" },
{ "Name" : "CHRIS","Age" : "23" },"Gender" : "M"}
Your data is not well-formed:
You don't need the comma after the first line
You're missing a closing } on the first line
You have a closing } in the middle of your second line you need to remove it as well.
The correct way of formatting your data for a bulk insert look like this:
curl -XPOST localhost:9200/_bulk -d '
{ "index" : { "_index" : "indexName", "_type" : "type1", "_id" : "111" }}
{ "Name" : "CHRIS","Age" : "23" ,"Gender" : "M"}
-H 'Content-Type: application/x-ndjson'
This will work.
UPDATE
Using Postman on Chrome it looks like this. Make sure to add a new line after line 2:
Using the elasticsearch 7.9.2
In order to send the bulk update I was getting the error of new line as below
Failed update without new line
This is wierd but after adding the new line in the last of the all the operations it is working fine with postman, notice line number 5 in below screenshot
bulk update success after adding newline in last of all the commands in postman

Retrieving specific fields using the Elasticsearch Java API

I am using the Java API for Elasticsearch.
Having saved entities into indexes, it is possible to retrieve them together with the complete source. However, I only want to retrieve selected fields, and that is not working.
The folowing sample code:
SearchResponse response = client.prepareSearch("my-index")
.setTypes("my-type")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setFetchSource(true)
.setQuery(QueryBuilders.termsQuery("field1", "1234"))
.addFields("field1")
.execute()
.actionGet();
for (SearchHit hit : response.getHits()){
Map<String, SearchHitField> fields = hit.getFields();
System.out.println(fields.size());
Map map = hit.getSource();
map.toString();
}
will retrieve the correct entities from the index, including the complete source.
For example, this is a snippet of the response :
"hits" : {
"total" : 1301,
"max_score" : 0.99614644,
"hits" : [ {
"_index" : "my-index",
"_type" : "my-type",
"_id" : "AU2P68COpzIypBTd80np",
"_score" : 0.99614644,
"_source":{"field1":"1234", ...}]}
}, {
However, while response.getHits() returns the expected number of hits, the fields and source within each hit is empty.
I am expecting each hit to contain the field specified in the line:
.addFields("field1")
Commenting out the line
.setFetchSource(true)
will cause the response not to include the source at all.
The version of Elasticsearch is 1.5.0
The following is the maven dependency the Java API:
<dependency>
<groupId>com.sksamuel.elastic4s</groupId>
<artifactId>elastic4s_2.11</artifactId>
<version>1.5.5</version>
</dependency>
Obiously, for performance reasons, I don't want to have to retrieve the complete entity.
Does anyone know how to limit the retrieval to selected fields?
Thanks
You can specify the fields you need using the setFetchSource(String[] includes, String[] excludes) method. Try this instead
SearchResponse response = client.prepareSearch("my-index")
.setTypes("my-type")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setFetchSource(new String[]{"field1"}, null)
.setQuery(QueryBuilders.termsQuery("field1", "1234"))
.execute()
.actionGet();
for (SearchHit hit : response.getHits()){
Map map = hit.getSource();
map.toString();
}
map will only contain the fields you've specified.
Note that .setFetchSource("field1", null) (if you need a single field) or .setFetchSource("field*", null) (if you need several wildcarded fields) would work, too.

Get all results of elasticsearch without pagination

Elastic-search has from/size parameter and this size is 10 by default. How to get all the results without pagination?
first get count of results . you can get it by count api. the put n into following method of QueryBuilder.
CountResponse response = client.prepareCount("test")
.setQuery(termQuery("_type", "type1"))
.execute()
.actionGet();
then call
n=response.getCount();
you can use setSize(n).
for non java use like this curl request.
curl -XGET 'http://localhost:9200/twitter/tweet/_count' -d '
{
"query" : {
"term" : { "user" : "kimchy" }
}
}'
more on this can be found on this link.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html

Can I create a document with the update API if the document doesn't exist yet

I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!
This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}
AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end
I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.

Resources