Fetching data between two ISODate from mongodb-spring - spring

Hi in my spring with mongodb project i want to fetch the data between two ISODate .How it is possible? please help me.
The data in the mongodb is like following:
collection name is mycollection and there is a field name creationTime like this:
"creationTime" : {
"logtime" : ISODate("2013-09-12T08:39:07.227Z"),
"logtimeStr" : "12-09-2013 02:09:07",
"day" : 12,
"month" : 9,
"year" : 2013,
"hour" : 14,
"min" : 9,
"second" : 7
}
and now i want to retrieve data from this collection on between two logtime by using spring.
Please help

I solved it :
Take date from user in dd-mm-yyyy and remove "-" from this and convert it into date by using SimpleDateFormat and Calendar class (also set hr,min and sec by using calendar) and make a query like this:
Query query = new Query().addCriteria(Criteria
.where("creationTime.logtime").gte(startDate).lte(endDate));

Related

field type mismatches in elasticsearch

I am dumping data in to ElasticSearch from a json file which is exported from mongodb. I am facing an issue where my data from json with array fields converted into a string.
"_source" : {
"CITIES" : [
"ABC"
],
"CITY_AREAS" : """["COLONY (AIT)"]""",
"INTERESTS" : [
"CARS"
]}
I am not doing any mapping and I know the elastic is using its default mapping on the basis of very first document which got inserted in ES.
I want to find a solution where I run the update command to update the fields with array type for all the documents containing "CITY_AREAS"
E.G :
"CITY_AREAS" : ["COLONY (AIT)"]
P.S : Some documents are having "CITY _AREAS" key and some don;t have.
you will need to reindex for this, so that it uses the correct mapping type. an update will not work

Spring Data mongoDB : how to fetch one document per name based on date

I have a mongoDB request to perform which initially sounded pretty easy to me but made me sweat a lot for a result I’m not too convinced with.
I searched a lot for similar questions but found none really answering my need.
I’m using Spring Boot framework for my backend.
The objective is rather simple.
I have a collection of documents in database as in the example below:
[{“name” : “Bob”,
“contract” : false,
”ranking” : 3,
“timestamp”: 1606867200},
{“name” : “Bob”,
“contract” : true,
”ranking” : 5,
“timestamp”: 1606953600},
{“name” : “Roger”,
“contract” : true,
”ranking” : 25,
“timestamp”: 1607040000},
{“name” : “Bob”,
“contract” : true,
”ranking” : 5,
“timestamp”: 1607040000}]
I want to fetch one document per unique name where the timestamp (i.e. the date in epoch milliseconds format) is the highest (so latest date).
So in the example above I would have two results:
[{“name” : “Roger”,
“contract” : true,
”ranking” : 25,
“timestamp”: 1607040000},
{“name” : “Bob”,
“contract” : true,
”ranking” : 5,
“timestamp”: 1607040000}]
To do this, the only solution I found was to use an Aggreation.
This can be done in Spring using mongoTemplate or as in my case using Aggregation Repository Methods which works fine also see here for details
What I want is to fetch one document per name in mongo the document picked for each name being the one with the max timestamp.
This from what I understand is not doable in a straightforward way.
We have to create a new document using the Aggregation pipeline telling him to group based on the names _id: '$name' and to pick the highest timestamp : timestamp: {$max: '$timestamp'}. However doing so creates a document with only two fields : _ID and timestamp where my POJO in Spring needs (name, contract, ranking, timestamp). The only Way I found to add the rest of the fields was by doing this :
String myAggregation = "{ $group: {" +
"_id: '$name'," +
"name: {$first: '$name'}," +
"contract: {$first: '$contract'}," +
"ranking: {$first: '$ranking'}," +
"timestamp: {$max: '$timestamp'}," +
"}}";
// find the latest scan status for each individual UID
#Aggregation(myAggregation)
List<HotlistStatusEntity> findLastStatusOfEachName();
But I feel that this could lead to undesired results in some cases with the $first picking the value from a document different than the one where the $max picked its value.
I lack a clear way of explaining my concern but doing what I do, I do not fetch a list of existing documents but recreate documents by picking values "here and there".
Although my tests never lead to unexpected results I’m still not confident with this.
So would there be a more secured or a cleaner way to achieve this?
Thanks in advance for your help.

Sum field and sort on Solr

I'm implementing a grouped search in Solr. I'm looking for a way of summing one field and sort the results by this sum. With the following data example I hope it will be clearer.
{
[
{
"id" : 1,
"parent_id" : 22,
"valueToBeSummed": 3
},
{
"id" : 2,
"parent_id" : 22,
"valueToBeSummed": 1
},
{
"id" : 3,
"parent_id" : 33,
"valueToBeSummed": 1
},
{
"id" : 4,
"parent_id" : 5,
"valueToBeSummed": 21
}
]
}
If the search is made over this data I'd like to obtain
{
[
{
"numFound": 1,
"summedValue" : 21,
"parent_id" : 5
},
{
"numFound": 2,
"summedValue" : 4,
"parent_id" : 22
},
{
"numFound": 1,
"summedValue" : 1,
"parent_id" : 33
}
]
}
Do you have any advice on this ?
Solr 5.1+ (and 5.3) introduces Solr Facet functions to solve this exact issue.
From Yonik's introduction of the feature:
$ curl http://localhost:8983/solr/query -d 'q=*:*&
json.facet={
categories:{
type : terms,
field : cat,
sort : "x desc", // can also use sort:{x:desc}
facet:{
x : "avg(price)",
y : "sum(price)"
}
}
}
'
So the suggestion would be to upgrade to the newest version of Solr (the most recent version is currently 5.2.1, be advised that some of the syntax that's on the above link will be landed in 5.3 - the current release target).
So you want to group your results on the field parent_id and inside each group you want to sum up the fields valueToBeSummed and then you want to sort the entire results (the groups) by this new summedvalue field. That is a very interesting use case...
Unfortunately, I don't think there is a built in way of doing what you have asked.
There are function queries which you can use to sort, there is a group.func parameter also, but they will not do what you have asked.
Have you already indexed this data? Or are you still in the process of charting out how to store this data? If its the latter then one possible way would be to have a summedvalue field for each documents and calculate this as and when a document gets indexed. For example, given the sample documents in your question, the first document will be indexed as
{
"id" : 1,
"parent_id" : 22,
"valueToBeSummed": 3
"summedvalue": 3
"timestamp": current-timestamp
},
Before indexing the second document id:2 with parent_id:22 you will run a solr query to get the last indexed document with parent_id:22
Solr Query q=parent_id:22&sort=timestamp desc&rows=1
and add the summedvalue of id:1 with valueToBeSummed of id:2
So the next document will be indexed as
{
"id" : 2,
"parent_id" : 22,
"valueToBeSummed": 1
"summedvalue": 4
"timestamp": current-timestamp
}
and so on.
Once you have documents indexed this way, you can run a regular solr query with &group=true&group.field=parent_id&sort=summedValue.
Please do let us know how you decide to implement it. Like I said its a very interesting use case! :)
You can add the below query
select?q=*:*&stats=true&stats.field={!tag=piv1 sum=true}valueToBeSummed&facet=true&facet.pivot={!stats=piv1 facet.sort=index}parent_id&wt=json&indent=true
You need to use Stats Component for the requirement. You can get more information here. The idea is first define on what you need to have stats on. Here it is valueToBeSummed, and then we need to group on parent_id. We use facet.pivot for this functionality.
Regarding sort, when we do grouping, the default sorting order is based on count in each group. We can define based on the value too. I have done this above using facet.sort=index. So it sorted on parent_id which is the one we used for grouping. But your requirement is to sort on valueToBeSummed which is different from the grouping attribute.
As of now not sure, if we can achieve that. But will look into it and let you know.
In short, you got the grouping, you got the sum above. Just sort is pending

MongoDB complex find

I need to grab the top 3 results for each of the 8 users. Currently I am looping through for each user and making 8 calls the the db. Is there a way to structure the query to pull the same 8X3 dataset in a single db pull?
selected_users = users.sample(8)
cur = 0
while cur <= selected_users .count-1
cursor = status_store.find({'user' => selected_users[cur]},{:fields =>params}).sort('score', -1).limit(3)
*do something*
cur+=1
end
The collection I am pulling from looks like the below. Each user can have an unbound number of tweets so I have not embedded them within within a user document.
{
"_id" : ObjectId("51e92cc8e1ce7219e40003eb"),
"id_str" : "57915476419948544",
"score" : 904,
"text" : "Yesterday we had a bald eagle on the show. Oddly enough, he was in the country illegally.",
"timestamp" : "19/07/2013 08:10",
"user" : {
"id_str" : "115485051",
"name" : "Conan O'Brien",
"screen_name" : "ConanOBrien",
"description" : "The voice of the people. Sorry, people.",
}
}
Thanks in advance.
Yes you can do this using the aggregation framework.
Another way would be to keep track of the top 3 scores for in the user documents. If this is faster or not depends on how often you write to scores vs read to top scores by users.

jprante elasticsearch jdbc river changing the date value

I am trying to index mysql records in elasticsearch using the jprante's elasticsearch jdbc river. I just noticed that the value in the date field is getting changed in the index.
Mapping:
content_date:{
"type":"date"
}
content_date field for a record in mysql -> 2012-10-06 02:11:30
after running the jdbc river....
content_date field for same record in elasticsearch -> 2012-10-05T20:41:30Z
River:
curl -XPUT 'localhost:9200/_riv_index/_riv_type/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/db",
"user" : "user",
"password" : "password",
"sql" : "select * from table where id=2409",
"poll" : "1d",
"versioning" : false
},
"index" : {
"index" : "myindex",
"type" : "mytype"
}
}'
Change in date format is acceptable, but why is the date value getting changed?
The river is adding utc time difference to the mysql record's date and saving it in elasticsearch. How do I stop this time conversion?
From the Elasticsearch POV, here's what docs said :
The date type is a special type which maps to JSON string type. It follows a specific format that can be explicitly set. All dates are UTC. Internally, a date maps to a number type long, with the added parsing stage from string to long and from long to string.
Not sure that you can change it.
Solution for this issue is to use timezone in jdbc block
"timezone" : "TimeZone.getDefault()"
Also I am saving date and time in separate field in mysql DB
| date | date | YES | | NULL | |
| time | time | YES | | NULL | |
Elasticsearch uses Joda timeformat to save date. Hence it's automatically converting my date to datetime.
In the date field, since I don't have time, it is automatically adding zero's to it.
Since I need to display data via Kibana that why I need this split..I converted format of date and time as varchar(20) as a workaround(bad idea I know) and its working fine now ..

Resources