I wanted to see how many unique link that a user has posted for every user. Here is what I have come up so far
s.aggs.bucket('user_term', A('terms', field='user__id')).metric('url_count', A('value_count', field='link'))
However, I have yet found a way to iterate through that result. Is there a way for this?
This will not give you a unique count, just a number of docs with a value for that field, you want to use a cardinality instead:
s.aggs.bucket('users', 'terms', field='user.id').metric('url_count', 'cardinality', field='link')
r = s.execute()
for user in r.aggregations.users.buckets:
print(f'User {user.key} posted {user.url_count.value} links')
Hope this helps
Related
I've noticed a slight problem with how my API is working where I'm using Spring Data JPA.
My query looks something along the lines of:
#Query("SELECT p.id AS id, COUNT(l) AS likes FROM Post p LEFT JOIN Like l ON l.post = p WHERE p.location.id = ?1")
My actual query is bigger, this this contains everything necessary to explain what the issue is. This query will return a list, but assume the location does not exist, it should return null or an empty list, correct? Oh, how wrong you are, my sweet summer child!
This query will instead always return a list of at least one element, regardless of whether or not there are any posts linked to said location.
[{"id": null, "likes": 0}]
That is what the result looks like when serialized to JSON. I am not quite sure what to do about this little predicament, as I obviously don't want to return a list with faulty data, but needing to use processing to filter out duds also seems dumb and unnecessary.
Is there any way to prevent this that I've yet to find? If it is of any relevance, I am using projections currently for my responses.
What I've tried so far:
Adding a not null condition for fields. Does not work, ignored by COUNT.
Adding constraints to all fields #NotNull. Does not work, will still become null.
For what it's worth, I've tried different kinds of joins, though anything but LEFT JOIN doesn't make much sense.
I haven't been able to find any other case which resembles this either, although it most likely exists, but is drowned out by everything else. I'm not quite sure what can be done in this regard, so I'm curious if it's just a quirk with the framework, or if there is an actual solution.
It might be possible to solve through native queries, but I would prefer not to use them.
I'm no SQL expert but I believe that a left join will give you this result if the ID does not exist.
Have you run the query in your DB? Doesn't it give you one row in your result set for IDs that do not exist?
I believe this is intended to say there is a 0 match.
You might want to validate your query before running it. Meaning checking that the location exists first.
As the issue is inherently due to a COUNT and CASE keyword in my real query, resulting in there always being at least one row, and I can't find any method of doing this automatically, the solution I've used is the following:
List<Item> items = repository.customQuery(id);
if (0 < items.size() && null == items.get(0).getId()) {
items.remove(0);
}
The first condition is arbitrary as I know there is always at least one entry, but is done just as a safety measure. A try-catch block would do the trick as well. In the case where you use a primitive int instead of Integer, you'd need to initialize the value in the constructor to something which would normally never be present in the database, such as -1.
If anyone knows of a better method, I'd love to know about it.
I have been struggling to return the count of courses from this XML file that contain "Cross-listed" as their description. The problem I encounter is because I am using for, it iterates and gives me "1 1" instead of "2". When I try using let instead I get 13 which means it counts all without condition even when I point return count($c["Cross-listed"]. What am I doing wrong and how can I fix it? Thanks in advance
for $c in doc("courses.xml")//Department/Course
where some $desc in $c/Description
satisfies contains($desc, "Cross-listed")
return count($c)
The problem I encounter is because I am using for
You are quite correct. You don't need to process items individually in order to count them.
You've made things much too difficult. You want
count(doc("courses.xml")//Department/Course[Description[contains(., "Cross-listed"]])
The key thing here is: you want a count, so call the count() function, and give it an argument which selects the set of things you want to include in the count.
First of all, I know title of my question is ambiguous, but I don't know how to say. You'll know when you see my example.
In SQL queries, you can apply dynamic conditions with '1=1'.
Let me give example.
select * from PROD.EMPLY
WHERE 1=1
AND ( NAME = 'JOHN' OR NAME = 'MARY' OR NAME = 'KEVIN') ;
As above, I want to apply filters like below.
BooleanFilter BF = new BooleanFilter();
TermsFilter tf1 = new TermsFilter();
TermsFilter tf2 = new TermsFilter();
//tf1 => 1=1
//tf2 => name condition
tf2.addTerm("name", "JOHN");
tf2.addTerm("name", "MARY");
tf2.addTerm("name", "KEVIN");
BF.add(new FilterClause(tf1, Occur.MUST)); // 1=1 condition
BF.add(new FilterClause(tf2, Occur.MUST)); // name condition
But the results aren't matched as I expected.(no hits)
As I know, it's because there's no null/empty search method in Lucene.
I also checked whether I can switch BooleanFilter/TermsFilter with MatchAllDocsQuery.
However in my case, my Lucene version is so old(2.3.2) that there's no MatchAllDocsQuery in my version, and I don't want filter's conditions to affect to lucene's scoring.
I searched within stackoverflow and google, but there's no case which matches my situation.
Any help would be very appreciated and thanks for reading.
I didn't want to answer myself, but I hope that my poor solution may help/inspire someone who have same trouble.
In my case, as above, my Lucene version is so lower version that many filters are restricted.
However, I didn't want to use query, because I didn't want filters to affect to scoring.
So, I cleared myself by "manual" filter.
My development language was Java, so I used contains method.
To brief my solution, like below.
Search Lucene with AS-IS method.
Within result, compare with contains method (I added this step.)
example :
if (result.contains("KEVIN") ) {
isFilteredResult = true;
} else if (result.contains("JOHN") ) {
isFilteredResult = true;
}
...
The only results which contains condition keywords are displayed.
I was worried about this solution because this way is done with not Lucene but Java step. So, I guessed so much memory would be consumed.
However, the perfomance isn't bad as I was worried, and the users are satisfied with both performance and results.
I know this solution isn't good way, but if the condition is so restricted, this way would also a solution.
Thanks for reading.
And if there's better solution, please let me know.
We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can't figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I'm currently using the pymongodb as I'm python developer.
You should create an index on tags:
db.posts.ensure_index([('tags', 1)])
and search for posts that share at least one tag with postA:
posts = list(db.posts.find({_id: {$ne: postA['_id']}, 'tags': {'$in': postA['tags']}}))
and finally, sort by intersection in Python:
key = lambda post: len(tag for tag in post['tags'] if tag in postA['tags'])
posts.sort(key=key, reverse=True)
Note that if postA shares at least one tag with a large number of other posts this won't perform well, because you'll send so much data from Mongo to your application; unfortunately there's no way to sort and limit by the size of the intersection using Mongo itself.
How would I use SolrNet to execute a GREATER THAN/LESS THAN query?
Example:
My documents have a field called "minimumDays" and I only want to return docs where that field is LESS THAN OR EQUAL TO the number I pass into the query.
I currently have this, but am not sure it's correct.
int requestedDays = 3;
var minimumNightsQuery = new SolrQueryByRange<int>("minimumDays", 0, requestedDays, true);
Am I on the right track?
The second part here is if there is some way to better understand the query that is being passed into Solr from SolrNet? Debugging value or something where I can inspect the "q" variable for instance.
Thanks again for your help
You can use SolrQueryByRange for the first part of your question. Your code does look good. debugging your query and results might help. I have found that SolrNet does some odd things. - http://code.google.com/p/solrnet/wiki/Facets#Arbitrary_facet_queries
For the second part, You can intercept the ISolrConnection and put in your own in between. For a good start check this out: http://code.google.com/p/solrnet/source/browse/trunk/SampleSolrApp/LoggingConnection.cs?r=513
I have one that logs the query and the results, and if a config setting is on it appends the debug param and logs that result also. Its great info to have.... and one of the only ways to get it.