Elasticearch and Spark: Updating existing entities - elasticsearch

What is the correct way, when using Elasticsearch with Spark, to update existing entities?
I wanted to something like the following:
Get existing data as a map.
Create a new map, and populate it with the updated fields.
Persist the new map.
However, there are several issues:
The list of returned fields cannot contain the _id, as it is not part of the source.
If, for testing, I hardcode an existing _id in the map of new values, the following exception is thrown:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest
How should the _id be retrieved, and how should it be passed back to Spark?
I include the following code below to better illustrate what I was trying to do:
JavaRDD<Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, INDEX_NAME+"/"+TYPE_NAME,
"?source=,field1,field2).values();
Iterator<Map<String, Object>> iter = esRDD.toLocalIterator();
List<Map<String, Object>> listToPersist = new ArrayList<Map<String, Object>>();
while(iter.hasNext()){
Map<String, Object> map = iter.next();
// Get existing values, and do transformation logic
Map<String, Object> newMap = new HashMap<String, Object>();
newMap.put("_id", ??????);
newMap.put("field1", new_value);
listToPersist.add(newMap);
}
JavaRDD javaRDD = jsc.parallelize(ImmutableList.copyOf(listToPersist));
JavaEsSpark.saveToEs(javaRDD, INDEX_NAME+"/"+TYPE_NAME);
Ideally, I would want to update the existing map in place, rather than create a new one.
Does anyone have any example code to show, when using Spark, the correct way to update existing entities in elasticsearch?
Thanks

This is how I've done it (Scala/Spark 2.3/Elastic-Hadoop v6.5).
To read (id or other metadata):
spark
.read
.format("org.elasticsearch.spark.sql")
.option("es.read.metadata",true) // allow to read metadata
.load("yourindex/yourtype")
.select(col("_metadata._id").as("myId"),...)
To update particular columns in ES:
myDataFrame
.select("myId","columnToUpdate")
.saveToEs(
"yourindex/yourtype",
Map(
"es.mapping.id" -> "myId",
"es.write.operation" -> "update", // important to change operation to partial update
"es.mapping.exclude" -> "myId"
)
)

Try adding this upsert to your Spark:
.config("es.write.operation", "upsert")
that will let you add new fields to existing documents

According to Elasticsearch Configuration you can get document metadata like _id by set read metadata option to true:
.config("es.read.metadata", "true")
And i think you cannot use '_id' as field name.
But you can create new field with different name like:
newMap.put("idfield", yourId);
then set name of the new field as a value for mapping id option to inform elastic that this field has the document id:
.config("es.mapping.id", "idfield")
BTW don't forget to set write operation as update:
.config("es.write.operation", "update")

Related

Spring Data elastic search with out entity fields

I'm using spring data elastic search, Now my document do not have any static fields, and it is accumulated data per qtr, I will be getting ~6GB/qtr (we call them as versions). Lets say we get 5GB of data in Jan 2021 with 140 columns, in the next version I may get 130 / 120 columns, which we do not know, The end user requirement is to get the information from the database and show it in a tabular format, and he can filter the data. In MongoDB we have BasicDBObject, do we have anything in springboot elasticsearch
I can provide, let say 4-5 columns which are common in every version record and apart from that, I need to retrieve the data without mentioning the column names in the pojo, and I need to use filters on them just like I can do in MongoDB
List<BaseClass> getMultiSearch(#RequestBody Map<String, Object>[] attributes) {
Query orQuery = new Query();
Criteria orCriteria = new Criteria();
List<Criteria> orExpression = new ArrayList<>();
for (Map<String, Object> accounts : attributes) {
Criteria expression = new Criteria();
accounts.forEach((key, value) -> expression.and(key).is(value));
orExpression.add(expression);
}
orQuery.addCriteria(orCriteria.orOperator(orExpression.toArray(new Criteria[orExpression.size()])));
return mongoOperations.find(orQuery, BaseClass.class);
}
You can define an entity class for example like this:
public class GenericEntity extends LinkedHashMap<String, Object> {
}
To have that returned in your calling site:
public SearchHits<GenericEntity> allGeneric() {
var criteria = Criteria.where("fieldname").is("value");
Query query = new CriteriaQuery(criteria);
return operations.search(query, GenericEntity.class, IndexCoordinates.of("indexname"));
}
But notice: when writing data into Elasticsearch, the mapping for new fields/properties in that index will be dynamically updated. And there is a limit as to how man entries a mapping can have (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). So take care not to run into that limit.

Update data in mongo Db using spring (MongoTemplate)

I have a collection (A) which has two fields (String, integer) in mongoDB. I want to update the collection by adding some value to sting
Ex. Lets say i have a document A[field1 : ABC,field2 : 25]. I want to update it by adding ,say 5, to it so it will look like A[field1 :ABC,field2 : 30] after updation.
The code I have used for this is as follows:
Query query = new Query();
query.addCriteria(Criteria.where("field1").is("ABC));
BeanName beanName = template.findOne(query, BeanName.class,collectionName);
if(null != beanName){
Update update = new Update();
update.set("field1", "ABC");
update.set("field2", beanName.getField2() + 5)
template.updateFirst(query, update, BeanName.class,collectionName);
}
else{
template.save(beanName, collectionName); // the value of filed1 and field 2 is populated in a bean with instance 'beanName'
}
The code is woriking fine with expected results but the performance is very slow. Is there any other efficient way for it.
I am working on large amount of data to update.
I would suggest you to use findAndModify() method, in combination with the upsert = true feature. Please find the official documentation below :
https://docs.mongodb.org/manual/reference/method/db.collection.findAndModify/

saving & updating full json document with Spring data MongoTemplate

I'm using Spring data MongoTemplate to manage mongo operations. I'm trying to save & update json full documents (using String.class in java).
Example:
String content = "{MyId": "1","code":"UG","variables":[1,2,3,4,5]}";
String updatedContent = "{MyId": "1","code":"XX","variables":[6,7,8,9,10]}";
I know that I can update code & variables independently using:
Query query = new Query(where("MyId").is("1"));
Update update1 = new Update().set("code", "XX");
getMongoTemplate().upsert(query, update1, collectionId);
Update update2 = new Update().set("variables", "[6,7,8,9,10]");
getMongoTemplate().upsert(query, update2, collectionId);
But due to our application architecture, it could be more useful for us to directly replace the full object. As I know:
getMongoTemplate().save(content,collectionId)
getMongoTemplate().save(updatedContent,collectionId)
implements saveOrUpdate functionality, but this creates two objects, do not update anything.
I'm missing something? Any approach? Thanks
You can use Following Code :
Query query = new Query();
query.addCriteria(Criteria.where("MyId").is("1"));
Update update = new Update();
Iterator<String> iterator = json.keys();
while(iterator.hasNext()) {
String key = iterator.next();
if(!key.equals("MyId")) {
Object value = json.get(key);
update.set(key, value);
}
}
mongoTemplate.updateFirst(query, update, entityClass);
There may be some other way to get keyset from json, you can use according to your convenience.
You can use BasicDbObject to get keyset.
you can get BasicDbObject using mongoTemplate.getConverter().

How to iterate through Elasticsearch source using Apache Spark?

I am trying to build a recommendation system by integrating Elasticsearch with Apache Spark. I am using Java. I am using movilens dataset as example data. I have indexed the data to Elasticsearch as well. So far, I have been able to read the input from Elasticsearch index as follows:
SparkConf conf = new SparkConf().setAppName("Example App").setMaster("local");
conf.set("spark.serializer", org.apache.spark.serializer.KryoSerializer.class.getName());
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(sc, "movielens/recommendation");
Using esRDD.collect() function, I can see that I am retrieving the data from elastic search correctly. Now I need to feed the user id, item id and preference from the Elasticsearch result to Spark's recommendation. If I am using a csv file, I would be able to do it as follows:
String path = "resources/user_data.data";
JavaRDD<String> data = sc.textFile(path);
JavaRDD<Rating> ratings = data.map(
new Function<String, Rating>() {
public Rating call(String s) {
String[] sarray = s.split(" ");
return new Rating(Integer.parseInt(sarray[0]), Integer.parseInt(sarray[1]),
Double.parseDouble(sarray[2]));
}
}
);
What could be an equivalent mapping if I need to iterate through the elastic search output stored in esRDD and create a similar map as above? If there is any example code that I could refer to, that would be of great help.
Apologies for not answering the Spark question directly, but in case you missed it, there is a description of doing recommendations on MovieLens data using elasticsearch here: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_significant_terms_demo.html
You have not specified the format of the data in ElasticSearch. But let's assume it has fields userId, movieId and rating so an example document looks something like {"userId":1,"movieId":1,"rating":4}.
Then you should be able to do (ignoring null checks etc):
JavaRDD<Rating> ratings = esRDD.map(
new Function<Map<String, Object>, Rating>() {
public Rating call(Map<String, Object> m) {
Int userId = Integer.parseInt(m.get("userId"));
Int movieId = Integer.parseInt(m.get("movieId"));
Double rating = Double.parseDouble(m.get("rating"));
return new Rating(userId, movieId, rating);
}
}
);

How to update multiple fields using java api elasticsearch script

I am trying to update multiple value in index using Java Api through Elastic Search Script. But not able to update fields.
Sample code :-
1:
UpdateResponse response = request.setScript("ctx._source").setScriptParams(scriptParams).execute().actionGet();
2:
UpdateResponse response = request.setScript("ctx._source.").setScriptParams(scriptParams).execute().actionGet();
if I mentioned .(dot) in ("ctx._source.") getting illegalArgument Exception and if i do not use dot, not getting any exception but values not getting updated in Index.
Can any one tell me the solutions to resolve this.
First of all, your script (ctx._source) doesn't do anything, as one of the commenters already pointed out. If you want to update, say, field "a", then you would need a script like:
ctx._source.a = "foobar"
This would assign the string "foobar" to field "a". You can do more than simple assignment, though. Check out the docs for more details and examples:
http://www.elasticsearch.org/guide/reference/api/update/
Updating multiple fields with one script is also possible. You can use semicolons to separate different MVEL instructions. E.g.:
ctx._source.a = "foo"; ctx._source.b = "bar"
In Elastic search have an Update Java API. Look at the following code
client.prepareUpdate("index","typw","1153")
.addScriptParam("assignee", assign)
.addScriptParam("newobject", responsearray)
.setScript("ctx._source.assignee=assignee;ctx._source.responsearray=newobject ").execute().actionGet();
Here, assign variable contains object value and response array variable contains list of data.
You can do the same using spring java client using the following code. I am also listing the dependencies used in the code.
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.index.query.QueryBuilder;
import org.springframework.data.elasticsearch.core.query.UpdateQuery;
import org.springframework.data.elasticsearch.core.query.UpdateQueryBuilder;
private UpdateQuery updateExistingDocument(String Id) {
// Add updatedDateTime, CreatedDateTime, CreateBy, UpdatedBy field in existing documents in Elastic Search Engine
UpdateRequest updateRequest = new UpdateRequest().doc("UpdatedDateTime", new Date(), "CreatedDateTime", new Date(), "CreatedBy", "admin", "UpdatedBy", "admin");
// Create updateQuery
UpdateQuery updateQuery = new UpdateQueryBuilder().withId(Id).withClass(ElasticSearchDocument.class).build();
updateQuery.setUpdateRequest(updateRequest);
// Execute update
elasticsearchTemplate.update(updateQuery);
}
XContentType contentType =
org.elasticsearch.client.Requests.INDEX_CONTENT_TYPE;
public XContentBuilder getBuilder(User assign){
try {
XContentBuilder builder = XContentFactory.contentBuilder(contentType);
builder.startObject();
Map<String,?> assignMap=objectMap.convertValue(assign, Map.class);
builder.field("assignee",assignMap);
return builder;
} catch (IOException e) {
log.error("custom field index",e);
}
IndexRequest indexRequest = new IndexRequest();
indexRequest.source(getBuilder(assign));
UpdateQuery updateQuery = new UpdateQueryBuilder()
.withType(<IndexType>)
.withIndexName(<IndexName>)
.withId(String.valueOf(id))
.withClass(<IndexClass>)
.withIndexRequest(indexRequest)
.build();

Resources