I am working on some dataset whose json format for one object is given below.
{
_id: ............,
code : G12220,
type : etf,
volume : 13,
modified_time:..................
.
.
.
}
This dataset gets updated very frequently (every 1 minute) and there are few thousands unique codes. I want to write a query to fetch the set of documents for each of the most latest distinct "codes" available. Eg: If there are two documents each having same code the result should be the most latest. I am using Spring Data.
I started writing my query and the given below is a sample.
#Query("{type : ?0}......")
public List<ProductEntities> getLatestProductsSet(String type);
I am not very sure how to write a complex query on this. Would be grateful if you can help me.
Thanks in advance,
You might want to have a look at the following mongo query to achieve this.
db.collection.aggregate([
{
"$group": {
"_id": {
"code": "$code",
"modifiedTime": "$modified_time"
},
"docs": {
"$push": "$$ROOT"
}
}
},
{
"$sort": {
"_id.modifiedTime": -1
}
},
{
"$limit": 1
}
]);
Following are the links which will help you understand the above query.
http://docs.mongodb.org/manual/reference/operator/aggregation/group/
http://docs.mongodb.org/manual/reference/operator/aggregation/sort/
http://docs.mongodb.org/manual/reference/operator/aggregation/push/
http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
Spring part follows.
http://www.mkyong.com/mongodb/spring-data-mongodb-aggregation-grouping-example/
Let me know, if this helps.
Related
I am in the process of attempting to create a method that will compose a query using Spring Data and I have a couple of questions. I am trying to perform a query using top level attributes of a document (i.e. the id field) as well as attributes of an subarray.
To do so I am using a query similar to this:
db.getCollection("journeys").find({ "_id._id": "0104", "journeyDates": { $elemMatch: { "period": { $in: [ 1,2 ] } } } })
As you can see I would also like to filter using $in for the values of the subarray. Running the above query though result in wrong results, as if the $elemMatch is ignored completely.
Running a similiar but slightly different query like this:
db.getCollection("journeys").find({ "_id._id": { $in: [ "0104" ] } }, { journeyDates: { $elemMatch: { period: { $in: [ 1, 2 ] } } } })
does seem to yield better results but it returns the only first found element matching the $in of the subarray filter.
Now my question is, how can I query using both top level attributes as well subarrays using $in. Preferably I would like to avoid aggregations. Secondly, how can I translate this native Mongo query to a Spring data Query object?
As per documentation it is possible to provide a hint to an update.
Now I'm using the java mongo client and mongo collection to do an update.
For this update I cannot find any way to provide a hint which index to use.
I see for the update I'm doing a COLSCAN in the logs, so wanting to provide the hint.
this.collection.updateOne(
or(eq("_id", "someId"), eq("array1.id", "someId")),
and(
addToSet("array1", new Document()),
addToSet("array2", new Document())
)
);
Indexes are available for both _id and array1.id
I found out in the logs the query for this update is using a COLSCAN to find the document.
Anyone who can point me in the right direction?
Using AWS DocumentDB, which is MongoDB v3.6
Lets consider a document with an array of embedded documents:
{ _id: 1, arr: [ { fld1: "x", fld2: 43 }, { fld1: "r", fld2: 80 } ] }
I created an index on arr.fld1; this is a Multikey index (indexes on arrays are called as so). The _id field already has the default unique index.
The following query uses the indexes on both fields - arr.fld1 and the _id. The query plan generated using explain() on the query showed an index scan (IXSCAN) for both fields.
db.test.find( { $or: [ { _id: 2 }, { "arr.fld1": "m" } ] } )
Now the same query filter is used for the update operation also. So, the update where we add two sub-documents to the array:
db.test.update(
{ $or: [ { _id: 1 }, { "arr.fld1": "m" } ] },
{ $addToSet: { arr: { $each: [ { "fld1": "xx" }, { "fld1": "zz" } ] } } }
)
Again, the query plan showed that both the indexes are used for the update operation. Note, I have not used the hint for the find or the update query.
I cannot come to conclusion about what the issue is with your code or indexes (see point Notes: 1, below).
NOTES:
The above observations are based on queries run on a MongoDB server
version 4.0 (valid for version 3.6 also, as I know).
The
explain
method is used as follows for find and update:
db.collection.explain().find( ... ) and
db.collection.explain().update( ... ).
Note that you cannot generate a query plan using explain() for
updateOne method; it is only available for findAndModify() and
update() methods. You can get a list of methods that can generate a
query plan by using the command at mongo shell:
db.collection.explain().help().
Note on Java Code:
The Java code to update an array field with multiple sub-document add, is as follows:
collection.updateOne(
or(eq("_id", new Integer(1)), eq("arr.fld1", "m")),
addEachToSet("arr", Arrays.asList(new Document("fld1", "value-1"), new Document("fld1", "value-2"))
);
I am new to GraphQL and I wonder how I can explore an API without a possible wildcard (*) (https://github.com/graphql/graphql-spec/issues/127).
I am currently setting up a headless Craft CMS with GraphQL and I don't really know how my data is nested.
Event with the REST API I have no chance of just getting all the data, because I have to setup all the endpoints and therefore I have to know all field names as well.
So how could I easily explore my CraftCMS data structure?
Thanks for any hints on this.
Cheers
merc
------ Edit -------
If I use #simonpedro s suggestion:
{
__schema {
types {
name
kind
fields {
name
}
}
}
}
I can see a lot of types (?)/fields (?)...
For example I see:
{
"name": "FlexibleContentTeaser",
"kind": "OBJECT",
"fields": [
{
"name": "id"
},
{
"name": "enabled"
},
{
"name": "teaserTitle"
},
{
"name": "text"
},
{
"name": "teaserLink"
},
{
"name": "teaserLinkConnection"
}
]
But now I would like to know how a teaserLink ist structured.
I somehow found out that the teaserLink (it is a field with the type Entries, where I can link to another page) has the properties url & title.
But how would I set up query to explore the properties available within teaserLink?
I tried all sorts of queries, but I am always confrontend with messages like this:
I would be really glad if somebody could give me another pointer how I can find out which properties I can actually query...
Thank you
As far as I'm concerned currently there is no graphql implementation with that capability. However, if what you want to do is to explore the "data structure", i.e, the schema, you should use schema instrospection, which was thought for that (explore the graphql schema). For example, a simple graphql instrospection query would be like this:
{
__schema {
types {
name
kind
fields {
name
}
}
}
}
References:
- https://graphql.org/learn/introspection/
UPDATE for edit:
What you want to do I think is the following:
Make a query like this
{
__schema {
types {
name
kind
fields {
name
type {
fields {
name
}
}
}
}
}
}
And then find the wished type field to grab more information (the fields) from it. Something like this (I don't know if this works, just an idea):
const typeFlexibleContentTeaser = data.__schema.types.find(t => t === "FlexibleContentTeaser")
const teaserLinkField = typeFlexibleContentTeaser.fields.find(f => f.name === "teaserLink")
const teaserLinkField = teaserLinkField.type.fields;
i.e, you have to transverse recursively through the type field.
I have external ES instance which I need to query for documents older than 6 months. Problem is they store timestamp like that:
"timestamp": {
"year": 2018,
"monthValue": 5,
"dayValue": 1,
}
Is it possible to create a range query combining these fields and getting documents "lt" "now-6m" or something like that?
You should be able to accomplish this using a Script Query. That would enable you to create a date object using the field values, and then compare that date with the current date.
Notional example
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"params": {
"monthRange": 6
},
"source": """
def today = new Date();
def timestamp = new Date(doc['timestamp']['year'].value, doc['timestamp']['monthValue'].value, doc['timestamp']['dayValue'].value);
/* Date comparison magic (I don't know Java, so you're on your own here) */
/* return result of comparison */
""",
"lang": "painless"
}
}
}
}
}
}
I've only used Painless once before, so I'm not familiar enough to give a perfect answer. But this may help you get started. If you get stuck, just ask another question specific to the issue you're having, and someone who's more familiar with Java/Painless can help you out.
Hi Elasticsearch experts.
I have a problem which might be realted to the fact I am indexing DB relational data.
My scenario is the following:
I have two entities:
documents and meetings.
Documents and meetings are independent entities. Although it is possible to assign documents to meetings in a given order.
We are using a join table for this in the DB.
meetings(id,name,date)
document(id,title,author)
meeting_document(doc_id,meeting_id,order)
In elasticsearch I am indexing the documents_id as NESTED property of the meeting
meeting example:
{
id: 25
name:"test",
documents: [22,12,24,55]
}
I will fetch the meeting, after this I would like to send a request to the documents filtering on document.id and asking elasticsearch to return the list in the same order I passed in the list of ids to the filter.
What is the best way to implement this ?
Thanks
Nice Question,
I've spent some time figuring a solution for you and come up with a solution, It might be tricky one but works.
Lets have a look to my query,
I've used script score, for sorting by user defined list.
POST index/type/_search
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "ar.size()-ar.indexOf(doc['docid'].value)",
"params": {
"ar": [
"1",
"2",
"4",
"3"
]
}
}
}
]
}
},
"filter": {
"terms": {
"docid": [
"1",
"2",
"4",
"3"
]
}
}
}
The thing you have to take care is,
send, same value for filter and in params. Like in the above query.
This returns me hits with doc ids, 1, 2, 4, 3 .
You have to change field name inside script and in filter, and you can use termQuery inside query object.
I've tested the code, Hope this helps!!
Thanks