Retrieving specific fields using the Elasticsearch Java API - elasticsearch

I am using the Java API for Elasticsearch.
Having saved entities into indexes, it is possible to retrieve them together with the complete source. However, I only want to retrieve selected fields, and that is not working.
The folowing sample code:
SearchResponse response = client.prepareSearch("my-index")
.setTypes("my-type")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setFetchSource(true)
.setQuery(QueryBuilders.termsQuery("field1", "1234"))
.addFields("field1")
.execute()
.actionGet();
for (SearchHit hit : response.getHits()){
Map<String, SearchHitField> fields = hit.getFields();
System.out.println(fields.size());
Map map = hit.getSource();
map.toString();
}
will retrieve the correct entities from the index, including the complete source.
For example, this is a snippet of the response :
"hits" : {
"total" : 1301,
"max_score" : 0.99614644,
"hits" : [ {
"_index" : "my-index",
"_type" : "my-type",
"_id" : "AU2P68COpzIypBTd80np",
"_score" : 0.99614644,
"_source":{"field1":"1234", ...}]}
}, {
However, while response.getHits() returns the expected number of hits, the fields and source within each hit is empty.
I am expecting each hit to contain the field specified in the line:
.addFields("field1")
Commenting out the line
.setFetchSource(true)
will cause the response not to include the source at all.
The version of Elasticsearch is 1.5.0
The following is the maven dependency the Java API:
<dependency>
<groupId>com.sksamuel.elastic4s</groupId>
<artifactId>elastic4s_2.11</artifactId>
<version>1.5.5</version>
</dependency>
Obiously, for performance reasons, I don't want to have to retrieve the complete entity.
Does anyone know how to limit the retrieval to selected fields?
Thanks

You can specify the fields you need using the setFetchSource(String[] includes, String[] excludes) method. Try this instead
SearchResponse response = client.prepareSearch("my-index")
.setTypes("my-type")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setFetchSource(new String[]{"field1"}, null)
.setQuery(QueryBuilders.termsQuery("field1", "1234"))
.execute()
.actionGet();
for (SearchHit hit : response.getHits()){
Map map = hit.getSource();
map.toString();
}
map will only contain the fields you've specified.
Note that .setFetchSource("field1", null) (if you need a single field) or .setFetchSource("field*", null) (if you need several wildcarded fields) would work, too.

Related

Elasticsearch Multi Get working through curl, but no results are returned through Java API

I am running elasticsearch 2.3.4, but the syntax does not seem to have changed in 5.x.
Multiget over curl is working just fine. Here is what my curl looks like:
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "logs-2017-04-30",
"_id" : "e72927c2-751c-4b33-86de-44a494abf78f"
}
]
}'
And when I want to pull the "message" field off that response, I use this request:
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "logs-2017-04-30",
"_id" : "e72927c2-751c-4b33-86de-44a494abf78f",
"fields" : ["message"]
}
]
}'
Both of the above queries return the log and information that I am looking for.
But when I try to translate it to Java like this:
MultiGetRequestBuilder request = client.prepareMultiGet();
request.add("logs-2017-04-30", null, "e72927c2-751c-4b33-86de-44a494abf78f");
MultiGetResponse mGetResponse = request.get();
for (MultiGetItemResponse itemResponse : mGetResponse.getResponses()) {
GetResponse response = itemResponse.getResponse();
logger.debug("Outputing object: " + ToStringBuilder.reflectionToString(response));
}
I appear to be getting null objects back. When I try to grab the message field off the null-looking GetResponse object, nothing is there:
GetField field = response.getField("message"); <--- returns null
What am I doing wrong? doing a rest call to elasticsearch proves the log exists, but my Java call is wrong somehow.
The documentation page for the Java multi get completely skips over the extra syntax required to retrieve data beyond the _source field. Just like the REST API, doing a multi get with the minimum information required to locate a log gets very limited information about it. In order to get specific fields from a log in a multi get call through the Java API, you must pass in a MultiGetRequest.Item to the builder. This item needs to have the fields you want specified in it before you execute the request.
Here is the code change (broken into multiple lines for clarity) that results in the fields I want being present when I make the query:
MultiGetRequestBuilder request = client.prepareMultiGet();
MultiGetRequest.Item item = new MultiGetRequest.Item("logs-2017-04-30", "null", "e72927c2-751c-4b33-86de-44a494abf78f");
item.fields("message");
request.add(item);
MultiGetResponse mGetResponse = request.get();
Now I can ask for the field I specified earlier:
GetField field = response.getField("message");

Bulk indexing using elastic search

Till now i was indexing data to elastic document by document and now as the data started increasing it has become very slow and not an optimized approach. So i was searching for a bulk insert thing and found Elastic Bulk API. From the documents in their official site i got confused. The approach i am using is by passing the data as WebRequest and executing them in the elastic server. So while creating a batch/bulk insert request the API wants us to form a template like
localhost:9200/_bulk as URL and
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
to index a document with id 1 and field1 values as value 1. Also the API suggests to send the data as JSON (unpretty, to maintain a non escaping character or so). So to pass multiple document with multiple properties how can i structure my data.
I tried like this in FF RestClient , with POST and header as JSON , but RestClient is throwing some error and i know its not a valid JSON
{ "index" : { "_index" : "indexName", "_type" : "type1", "_id" : "111" },
{ "Name" : "CHRIS","Age" : "23" },"Gender" : "M"}
Your data is not well-formed:
You don't need the comma after the first line
You're missing a closing } on the first line
You have a closing } in the middle of your second line you need to remove it as well.
The correct way of formatting your data for a bulk insert look like this:
curl -XPOST localhost:9200/_bulk -d '
{ "index" : { "_index" : "indexName", "_type" : "type1", "_id" : "111" }}
{ "Name" : "CHRIS","Age" : "23" ,"Gender" : "M"}
-H 'Content-Type: application/x-ndjson'
This will work.
UPDATE
Using Postman on Chrome it looks like this. Make sure to add a new line after line 2:
Using the elasticsearch 7.9.2
In order to send the bulk update I was getting the error of new line as below
Failed update without new line
This is wierd but after adding the new line in the last of the all the operations it is working fine with postman, notice line number 5 in below screenshot
bulk update success after adding newline in last of all the commands in postman

File Download through Rest Call after querying from MongoDB and creating a csv Through Java

Mongo Collection has the following data...
{ "_id" : "Sims", "count" : 32 }
{ "_id" : "Autumn", "count" : 35 }
{ "_id" : "Becker", "count" : 35 }
{ "_id" : "Cecile", "count" : 40 }
{ "_id" : "Poole", "count" : 32 }
{ "_id" : "Nanette", "count" : 31 }
Through rest call, taking id from the url,can i query From mongo DB through Java and Download the csv file containing all the data of id1?
What can be the API's used for File Download.And the file is not located on some server.
Flow should be as follows:
http://localhost:8080/Application/Poole
Now my Java code would be something like
MongoClient mongoClient = new MongoClient(“localhost”,27017);
MongoDatabase mongoDatabase = mongoClient.getDatabase(test123);
MongoCollection mongoCollection = mongoDatabase.getCollection(testcoll);
mongoCollection.find(id);
The query returned should write the result to CSV File and the file download should happen.
Also any ApI to convert the result of Mongo query result to csv file
Best solution could be that you use the plugin for saving a file and write a custom code which will convert you JSON data (coming from Mongodb collection) to csv. I have made a code for myself but its on Javascript. Let me know if you need it. Thanks.

How to perform Sum on a Map Key in the Mongo DB document within Spring

My MongoDB document looks something like as following:
{
"_class" : "com.foo.foo.FooClass",
"_id" : ObjectId("5441948f3004e65fbda72d9c"),
"actionType" : "LOGIN",
"actor" : "bolt",
"extraDataMap" : {
"workHours" : NumberLong(11869)
},
}
Where extraDataMap is a HashMap stored from the java code. I have to get all the documents where "actionType" is "Login", group on "actor" and sum all the "workHours" for those individual actors
If I do below query on MongoDB directly it works:
db.activityLog.aggregate([
{$match : { actionType : "LOGIN" }},
{$group : { "_id" : "$actor", "hours" : { "$sum" : "$extraDataMap.workHours" } } },
{$sort : {_id : 1}}
]);
But If I run the query from Java Code
TypedAggregation<ActivityLog> agg = Aggregation.newAggregation(ActivityLog.class,
buildCriteria(),
group("actor").sum("extraDataMap.workHours").as("hours"),
sort(Sort.Direction.ASC, MongoActivityLogRepository.DOCUMENT_ID_FIELD_NAME)
);
AggregationResults<ActivityLog> result = mongoOperations.aggregate(agg, ActivityLog.class);
List<ActivityLog> results = result.getMappedResults();
It gives below error:
Caused by: org.springframework.data.mapping.PropertyReferenceException: No property work found for type java.lang.String
at org.springframework.data.mapping.PropertyPath.<init>(PropertyPath.java:75)
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:327)
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:353)
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:307)
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:290)
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:274)
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:245)
at org.springframework.data.mongodb.core.aggregation.TypeBasedAggregationOperationContext.getReference(TypeBasedAggregationOperationContext.java:91)
at org.springframework.data.mongodb.core.aggregation.GroupOperation$Operation.getValue(GroupOperation.java:359)
at org.springframework.data.mongodb.core.aggregation.GroupOperation$Operation.toDBObject(GroupOperation.java:355)
at org.springframework.data.mongodb.core.aggregation.GroupOperation.toDBObject(GroupOperation.java:300)
at org.springframework.data.mongodb.core.aggregation.Aggregation.toDbObject(Aggregation.java:228)
at org.springframework.data.mongodb.core.MongoTemplate.aggregate(MongoTemplate.java:1287)
at org.springframework.data.mongodb.core.MongoTemplate.aggregate(MongoTemplate.java:1264)
at org.springframework.data.mongodb.core.MongoTemplate.aggregate(MongoTemplate.java:1253)
Really appreciate all the prompt responses :)
I had the same problem than you and I found this solution
Instead of using TypedAggregation, use a plain Aggregation. This way, spring data won't perform a type checking.
It would be as follows:
Aggregation agg = Aggregation.newAggregation(
buildCriteria(),
group("actor").sum("extraDataMap.workHours").as("hours"),
sort(Sort.Direction.ASC, MongoActivityLogRepository.DOCUMENT_ID_FIELD_NAME)
);
List<ActivityLog> results = mongoOperations.aggregate(agg, mongoOperations.getCollectionName(ActivityLog.class), ActivityLog.class).getMappedResults();
See that I used a different mongoOperations.aggregate signature, because since we are not using a TypedAggregation, we have to indicate over which collection we are performing the aggregation.
I hope this helps you.

Elasticsearch - how to return only data, not meta information?

When doing a search, Elasticsearch returns a data structure that contains various meta information.
The actual result set is contained within a "hits" field within the JSON result returned from the database.
Is it possible for Elasticsearch to return only the needed data (the contents of then "hits" field) without being embedded within all the other meta data?
I know I could parse the result into JSON and extract it, but I don't want the complexity, hassle, performance hit.
thanks!
Here is an example of the data structure that Elasticsearch returns.
{
"_shards":{
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits":{
"total" : 1,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
]
}
}
You can at least filter the results, even if you cannot extract them. The "common options" page of the REST API explains the "filter_path" option. This lets you filter only the portions of the tree you are interested in. The tree structure is still the same, but without the extra metadata.
I generally add the query option:
&filter_path=hits.hits.*,aggregations.*
The documentation doesn't say anything about this making your query any faster (I doubt that it does), but at least you could return only the interesting parts.
Corrected to show only hits.hits.*, since the top level "hits" has metadata as well.
No, it's not possible at this moment. If performance and complexity of parsing are the main concerns, you might want to consider using different clients: java client or Thrift plugin, for example.

Resources