Spring Data elastic search with out entity fields - elasticsearch

I'm using spring data elastic search, Now my document do not have any static fields, and it is accumulated data per qtr, I will be getting ~6GB/qtr (we call them as versions). Lets say we get 5GB of data in Jan 2021 with 140 columns, in the next version I may get 130 / 120 columns, which we do not know, The end user requirement is to get the information from the database and show it in a tabular format, and he can filter the data. In MongoDB we have BasicDBObject, do we have anything in springboot elasticsearch
I can provide, let say 4-5 columns which are common in every version record and apart from that, I need to retrieve the data without mentioning the column names in the pojo, and I need to use filters on them just like I can do in MongoDB
List<BaseClass> getMultiSearch(#RequestBody Map<String, Object>[] attributes) {
Query orQuery = new Query();
Criteria orCriteria = new Criteria();
List<Criteria> orExpression = new ArrayList<>();
for (Map<String, Object> accounts : attributes) {
Criteria expression = new Criteria();
accounts.forEach((key, value) -> expression.and(key).is(value));
orExpression.add(expression);
}
orQuery.addCriteria(orCriteria.orOperator(orExpression.toArray(new Criteria[orExpression.size()])));
return mongoOperations.find(orQuery, BaseClass.class);
}

You can define an entity class for example like this:
public class GenericEntity extends LinkedHashMap<String, Object> {
}
To have that returned in your calling site:
public SearchHits<GenericEntity> allGeneric() {
var criteria = Criteria.where("fieldname").is("value");
Query query = new CriteriaQuery(criteria);
return operations.search(query, GenericEntity.class, IndexCoordinates.of("indexname"));
}
But notice: when writing data into Elasticsearch, the mapping for new fields/properties in that index will be dynamically updated. And there is a limit as to how man entries a mapping can have (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). So take care not to run into that limit.

Related

Mongo Aggregation of $where: function() using MongoTemplate in SpringBoot [duplicate]

i'm trying to compare two string in mongoDB spring Data.
My Code:
#GET
#Path("/reqvolatility")
#Produces(MediaType.APPLICATION_JSON)
public long getReqVolatility() throws JsonParseException, JsonMappingException, IOException{
String query = "{},{_id:0,levelId:1,reqID:1,creationTime:1,lastModified:1}";
Query query1 = new BasicQuery(query);
query1.addCriteria(Criteria.where("creationTime").ne("lastModified"));
long reqvolatility = getMongoOperation().count(query1,RequirmentVO.class);
return reqvolatility;
}
In the above code "creationTime" & "lastModified" columns in mongoDB.I'm comparing those two fields, but its not giving correct count.
Is this correct? if it is wrong, How can i compare two fileds?
Standard query operations do not compare the values of one field against another. In order to do this, you need to employ the JavaScript evaluation server side which can actually compare the two field values:
Assuming both fields are ISODate instances
BasicQuery query = new BasicQuery(
new BasicDBObject("$where", "this.creationTime.getTime() != this.lastModified.getTime()")
);

Spring jdbcTemplate -add filter to query

I write a queries in jdbcTemplate to create reports, Now I want to add filter to the reports. for example if i have query to create report of all contacts per day, now I want to filter it that be just between two dates not all
What the best way to do it ?
There is a special way to do it in Spring jdbcTemplate?
See following tutorial and example code taken from that one below.
public Person select(String name){
Map<String, Object> parameters = new HashMap<String, Object>();
parameters.put("name", name);
String selectAllSql = "SELECT * FROM PERSON where name = :name";
List<Person> persons = getJdbcTemplate().query(selectAllSql, new PersonRowMapper(),parameters);
return persons.get(0);
}
}
To answer your comment:
If some times i dont use the filter what i do with the parmeters what i have to send
You have two options
1) You will need to use if statements to construct your sql and parameters.
2) Use another library for this purpose. I used ElSql in production before.

saving & updating full json document with Spring data MongoTemplate

I'm using Spring data MongoTemplate to manage mongo operations. I'm trying to save & update json full documents (using String.class in java).
Example:
String content = "{MyId": "1","code":"UG","variables":[1,2,3,4,5]}";
String updatedContent = "{MyId": "1","code":"XX","variables":[6,7,8,9,10]}";
I know that I can update code & variables independently using:
Query query = new Query(where("MyId").is("1"));
Update update1 = new Update().set("code", "XX");
getMongoTemplate().upsert(query, update1, collectionId);
Update update2 = new Update().set("variables", "[6,7,8,9,10]");
getMongoTemplate().upsert(query, update2, collectionId);
But due to our application architecture, it could be more useful for us to directly replace the full object. As I know:
getMongoTemplate().save(content,collectionId)
getMongoTemplate().save(updatedContent,collectionId)
implements saveOrUpdate functionality, but this creates two objects, do not update anything.
I'm missing something? Any approach? Thanks
You can use Following Code :
Query query = new Query();
query.addCriteria(Criteria.where("MyId").is("1"));
Update update = new Update();
Iterator<String> iterator = json.keys();
while(iterator.hasNext()) {
String key = iterator.next();
if(!key.equals("MyId")) {
Object value = json.get(key);
update.set(key, value);
}
}
mongoTemplate.updateFirst(query, update, entityClass);
There may be some other way to get keyset from json, you can use according to your convenience.
You can use BasicDbObject to get keyset.
you can get BasicDbObject using mongoTemplate.getConverter().

Elasticearch and Spark: Updating existing entities

What is the correct way, when using Elasticsearch with Spark, to update existing entities?
I wanted to something like the following:
Get existing data as a map.
Create a new map, and populate it with the updated fields.
Persist the new map.
However, there are several issues:
The list of returned fields cannot contain the _id, as it is not part of the source.
If, for testing, I hardcode an existing _id in the map of new values, the following exception is thrown:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest
How should the _id be retrieved, and how should it be passed back to Spark?
I include the following code below to better illustrate what I was trying to do:
JavaRDD<Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, INDEX_NAME+"/"+TYPE_NAME,
"?source=,field1,field2).values();
Iterator<Map<String, Object>> iter = esRDD.toLocalIterator();
List<Map<String, Object>> listToPersist = new ArrayList<Map<String, Object>>();
while(iter.hasNext()){
Map<String, Object> map = iter.next();
// Get existing values, and do transformation logic
Map<String, Object> newMap = new HashMap<String, Object>();
newMap.put("_id", ??????);
newMap.put("field1", new_value);
listToPersist.add(newMap);
}
JavaRDD javaRDD = jsc.parallelize(ImmutableList.copyOf(listToPersist));
JavaEsSpark.saveToEs(javaRDD, INDEX_NAME+"/"+TYPE_NAME);
Ideally, I would want to update the existing map in place, rather than create a new one.
Does anyone have any example code to show, when using Spark, the correct way to update existing entities in elasticsearch?
Thanks
This is how I've done it (Scala/Spark 2.3/Elastic-Hadoop v6.5).
To read (id or other metadata):
spark
.read
.format("org.elasticsearch.spark.sql")
.option("es.read.metadata",true) // allow to read metadata
.load("yourindex/yourtype")
.select(col("_metadata._id").as("myId"),...)
To update particular columns in ES:
myDataFrame
.select("myId","columnToUpdate")
.saveToEs(
"yourindex/yourtype",
Map(
"es.mapping.id" -> "myId",
"es.write.operation" -> "update", // important to change operation to partial update
"es.mapping.exclude" -> "myId"
)
)
Try adding this upsert to your Spark:
.config("es.write.operation", "upsert")
that will let you add new fields to existing documents
According to Elasticsearch Configuration you can get document metadata like _id by set read metadata option to true:
.config("es.read.metadata", "true")
And i think you cannot use '_id' as field name.
But you can create new field with different name like:
newMap.put("idfield", yourId);
then set name of the new field as a value for mapping id option to inform elastic that this field has the document id:
.config("es.mapping.id", "idfield")
BTW don't forget to set write operation as update:
.config("es.write.operation", "update")

How to iterate through Elasticsearch source using Apache Spark?

I am trying to build a recommendation system by integrating Elasticsearch with Apache Spark. I am using Java. I am using movilens dataset as example data. I have indexed the data to Elasticsearch as well. So far, I have been able to read the input from Elasticsearch index as follows:
SparkConf conf = new SparkConf().setAppName("Example App").setMaster("local");
conf.set("spark.serializer", org.apache.spark.serializer.KryoSerializer.class.getName());
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(sc, "movielens/recommendation");
Using esRDD.collect() function, I can see that I am retrieving the data from elastic search correctly. Now I need to feed the user id, item id and preference from the Elasticsearch result to Spark's recommendation. If I am using a csv file, I would be able to do it as follows:
String path = "resources/user_data.data";
JavaRDD<String> data = sc.textFile(path);
JavaRDD<Rating> ratings = data.map(
new Function<String, Rating>() {
public Rating call(String s) {
String[] sarray = s.split(" ");
return new Rating(Integer.parseInt(sarray[0]), Integer.parseInt(sarray[1]),
Double.parseDouble(sarray[2]));
}
}
);
What could be an equivalent mapping if I need to iterate through the elastic search output stored in esRDD and create a similar map as above? If there is any example code that I could refer to, that would be of great help.
Apologies for not answering the Spark question directly, but in case you missed it, there is a description of doing recommendations on MovieLens data using elasticsearch here: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_significant_terms_demo.html
You have not specified the format of the data in ElasticSearch. But let's assume it has fields userId, movieId and rating so an example document looks something like {"userId":1,"movieId":1,"rating":4}.
Then you should be able to do (ignoring null checks etc):
JavaRDD<Rating> ratings = esRDD.map(
new Function<Map<String, Object>, Rating>() {
public Rating call(Map<String, Object> m) {
Int userId = Integer.parseInt(m.get("userId"));
Int movieId = Integer.parseInt(m.get("movieId"));
Double rating = Double.parseDouble(m.get("rating"));
return new Rating(userId, movieId, rating);
}
}
);

Resources