I have the following method that creates a BooleanPredicateClausesStep to do a query with.
private BooleanPredicateClausesStep<?> getJournalAndSpatialSearchCriteria(GeoFilter geoFilter, SearchPredicateFactory factory, Boolean includeJournalsWithStatusFinished) {
SearchPredicate journalLocationMustResideWithinRadius = getJournalsContainedWithinRadiusPredicate(geoFilter, factory);
SearchPredicate mustOrShouldBeOfStatus = getSubmissionStatusConditionPredicate(includeJournalsWithStatusFinished, factory);
return factory.bool()
.filter( journalLocationMustResideWithinRadius )
.filter( factory.match().field( "deleted" ).matching( "false" ) )
.filter( mustOrShouldBeOfStatus )
.filter( factory.match().field( "containsHarvestEntry" ).matching( "true" ) )
.filter( factory.match().field( "grownOutdoors" ).matching( "true" ) );
}
It contains one spatial search predicate that checks whether journals fall within a predefined circular geographical area or not. All the other filters are simple ones that only check whether a certain field matches a value or not.
My question is: Do all these filters get implemented sequentially or all at once? Or to put it differently; would lucene first fetch all of the objects that fall within the defined geographical area before it checks whether they are deleted or does it check both simultaneously? The hibernate search documentation doesn't say anything about the order in which filters are processed.
In short: no, the order in which filters are declared within a single boolean predicate doesn't matter. The results will be the same regardless of order, and performance will most likely be the same regardless of order.
Detailed answer:
The order of clauses in a given boolean predicate doesn't matter in the sense that the predicate will match the same documents regardless of order.
Regarding implementation, it's a bit complex, but roughly speaking filters are turned into DocIdSetIterators which are then combined, so that the combined iterator only goes through documents returned by all iterators. That means each iterator will be incremented one after the other.
However, there are optimizations that allow iterators to "skip through" to the document matched by the previous iterator, so order might matter, but for performance only: if you have a filter that is quick and matches very few documents, it's better to execute it first.
But... Lucene often has knowledge of the "cost" of each filter/iterator, and will automatically change the order of iterators to execute the less costly ones first (see org.apache.lucene.search.ConjunctionDISI#createConjunction in Lucene 8.11).
All that to say: even for performance, while order matters internally, it shouldn't matter to you. So, don't even think about it :)
I've been playing a little bit with the Wikidata SPARQL page (here), and I was wondering if we can shuffle the order of the results set randomly.
I tried the following example, and I left the random part as a comment cause I couldn't find the correct syntax for it:
SELECT ?partyLabel ?countryLabel ?count
WHERE{
?party wdt:P31 wd:Q7278.
?party wdt:P17 ?country.
?party wdt:P2124 ?count
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
order by #random
limit 15
Did anyone try the random ordering option? and does it exist?
Possible candidates are:
RAND()
BNODE()
UUID()
STRUUID()
However, neither ORDER BY RAND() nor ORDER BY BNODE() works on Wikidata even with hint:Query hint:optimizer "None" (that's why).
As for UUID() and STRUUID(), it seems that there is no significant performance difference, so you can use any of them.
Please note that both browser and server cache results, hence, add a few spaces or commented current time to every query.
Update
With a single triple pattern, one can use the sample service.
I've recently swapped from using Lucene for Sitecore to Solr.
For the most part it has been smooth, but the way I was writing some queries (using Sitecore.ContentSearch.Linq) abstraction now don't seem to be compatible.
Specifically, I have a situation where I've got "global" content and "regional" content, like so:
Home (000)
X
Y
Z
Regions (ID: 111)
Region 1 (ID: 221)
A
B
Region 2 (ID: 222)
D
My code worked on Lucene, but now doesn't on Solr. It should find all "global" and a single region's content, excluding all other region's content. So as an example, if the user's current region was Region 1, I'd want the query to return content X, Y, Z, A, B.
Sitecore's Item Crawler has a field for each item in the index called "_path" which is a multivalued string field of IDs, so as an example, Region 1's _path field value would be [000, 111, 221 ].
When I write this using the Linq abstraction it comes out as below which doesn't return results.
-_path:(111) OR _path:(221)
But _path:(111) does return result. Mind blown.
When I use the Solr interface and wrap each side of the OR in extra brackets like below (which I'd consider redundant) it works! Mind blown v2.
(-_path:(111)) OR (_path:(221))
Firstly, what's the difference between those queries?
Secondly, my real problem is I can't add these extra brackets as I'm working in an abstraction Linq so the brackets will be "optimized" out.
Any advice would be awesome! Cheers.
The problem here is, lucene's negative queries don't work like you think they do. They only remove results from what has been found. -_path:111 doesn't find all documents which aren't in 111, it doesn't find anything at all. It only removes results. So you are finding all results with path "221", then removing any that also have path "111", which from your heirarchy, I assume is all of them. See my answer here for a bit more on that topic.
The OR makes it seem like it ought to work, but really -_path:(111) OR _path:(221) is the same as -_path:(111) _path:(221). The moral here is: Don't use Lucene's AND/OR/NOT syntax, if you can help it. Use +/-. +/- syntax actually expresses how the query operates, AND/OR/NOT doesn't. It attempts to shoehorn it into a different, SQL-like retrieval model and leads to some unexpected behavior like this.
So, what about: (-_path:(111)) OR (_path:(221))
Well, first, does it actually work? Or does it just get some results?
If it just gets some results, but just seems to get the same results as _path:221: The reason is -_path:111 gets no results, so your query is, in practice, something like: (nothing) OR (_path:221), which is equivalent to _path:221
If it really does get the results you expect (I'm guessing it probably does): Something is translating your query into something like: (*:* -_path:111) (_path:221). Solr does have some logic along these lines, though I'm not quite sure in this case. Essentially, it puts a match-all in front of any lonely negative queries it finds, allowing them to do what you were expecting. If the implicit *:* makes you nervous about performance, well, it should. But lucene is an inverted index, it does well with finding matches on a term quickly. Getting everything that doesn't match goes against the grain of that retrieval model, and will pretty much have to do a full scan of the index.
As an exercise in python lambdas (just so I can learn how to use them more properly) I gave myself an assignment to sort some strings based on something other than their natural string order.
I scraped apache for version number strings and then came up with a lambda to sort them based on numbers I extracted with regexes. It works, but I think it can be better I just don't know how to improve it so it's more robust.
from lxml import html
import requests
import re
# Send GET request to page and parse it into a list of html links
jmeter_archive_url='https://archive.apache.org/dist/jmeter/binaries/'
jmeter_archive_get=requests.get(url=jmeter_archive_url)
page_tree=html.fromstring(jmeter_archive_get.text)
list_of_links=page_tree.xpath('//a[#href]/text()')
# Filter out all the non-md5s. There are a lot of links, and ultimately
# it's more data than needed for his exercise
jmeter_md5_list=list(filter(lambda x: x.endswith('.tgz.md5'), list_of_links))
# Here's where the 'magic' happens. We use two different regexes to rip the first
# and then the second number out of the string and turn them into integers. We
# then return them in the order we grabbed them, allowing us to tie break.
jmeter_md5_list.sort(key=lambda val: (int(re.search('(\d+)\.\d+', val).group(1)), int(re.search('\d+\.(\d+)', val).group(1))))
print(jmeter_md5_list)
This does have the desired effect, The output is:
['jakarta-jmeter-2.5.1.tgz.md5', 'apache-jmeter-2.6.tgz.md5', 'apache-jmeter-2.7.tgz.md5', 'apache-jmeter-2.8.tgz.md5', 'apache-jmeter-2.9.tgz.md5', 'apache-jmeter-2.10.tgz.md5', 'apache-jmeter-2.11.tgz.md5', 'apache-jmeter-2.12.tgz.md5', 'apache-jmeter-2.13.tgz.md5']
So we can see that the strings are sorted into an order that makes sense. Lowest version first and highest version last. Immediate problems that I see with my solution are two-fold.
First, we have to create two different regexes to get the numbers we want instead of just capturing groups 1 and 2. Mainly because I know there are no multiline lambdas, I don't know how to reuse a single regex object instead of creating a second.
Secondly, this only works as long as the version numbers are two numbers separated by a single period. The first element is 2.5.1, which is sorted into the correct place but the current method wouldn't know how to tie break for 2.5.2, or 2.5.3, or for any string with an arbitrary number of version points.
So it works, but there's got to be a better way to do it. How can I improve this?
This is not a full answer, but it will get you far along the road to one.
The return value of the key function can be a tuple, and tuples sort naturally. You want the output from the key function to be:
((2, 5, 1), 'jakarta-jmeter')
((2, 6), 'apache-jmeter')
etc.
Do note that this is a poor use case for a lambda regardless.
Originally, I came up with this:
jmeter_md5_list.sort(key=lambda val: list(map(int, re.compile('(\d+(?!$))').findall(val))))
However, based on Ignacio Vazquez-Abrams's answer, I made the following changes.
def sortable_key_from_string(value):
version_tuple = tuple(map(int, re.compile('(\d+(?!$))').findall(value)))
match = re.match('^(\D+)', value)
version_name = ''
if match:
version_name = match.group(1)
return (version_tuple, version_name)
and this:
jmeter_md5_list.sort(key = lambda val: sortable_key_from_string(val))
Query's throwing an ORA-00907 Error when I try to paste a list of values into a criteria.
Background: I'm not a developer, I'm just an end user that's studied enough to where I can write queries using PS/Query within Peoplesoft,
for my company's implementation. I work with Peoplesoft's FSCM module
(Financials and Supply Chain Management), currently on Version FSCM
8.90.08.024, using I think Oracle 11g as the base database.
I'm mostly self-taught, and the technical experts we have are busy
with database/application stuff, or they aren't familiar with my
section's specific data needs.
I should point out that I'm unable to directly write SQL statements to
Query the database. I have to use a built-in program called "PS/Query"
(also known as Query Manager) with a GUI that writes the SQL for you
and saves it as a Query that you can run to the database to extract
data. This is relevant to my question only in that:
1. I cannot create or alter views/tables
2. I cannot perform any type of SQL Statement except "SELECT"
3. I can embed PL/SQL, MetaSQL and plain SQL into Expressions
4. At this point, Query Manager is the only option I have.
PS/Query is my only experience with SQL so far, aside from Oracle's
documentation and sites like this. From my research, it's considered
extremely confining by "actual" SQL programmers.The restrictions on it
require you to do things in a manner that violates what seem to be
best practices of SQL coding.
Query Request: I have a query I've been requested to write that pulls out spend (on Vouchers and POs) against certain system-defined
Category Codes. What I'm trying to do is pull in Voucher IDs, sum the
merchandise amounts on them by Vendor and Category Code, and display
the results. Or in other words, for every unique combination of
Vendor/Category, add up all the Voucher Amounts that have that
Vendor/Category combination.
Using the SUM (Fieldname) OVER (PARTITION BY fieldname, fieldname)
syntax.
So the end result should look something like...
Code Vendor Amount
123-45 Acme $5000.00
123-45 Apple $4200.00
123-46 Acme $750.00
With that said, here's the SQL that Query Manager is displaying to get the result set I showed above:
SELECT DISTINCT D.CATEGORY_CD, D.TN_DESCR1000, C.VENDOR_ID, E.NAME1, SUM ( A.MERCH_AMT_VCHR) OVER (PARTITION BY D.CATEGORY_CD, C.VENDOR_ID),E.SETID,E.VENDOR_ID
FROM PS_PO_LINE_MATCHED A, PS_PO_LINE B, PS_PO_HDR C, PS_ITM_CAT_TBL D, PS_VENDOR E, PS_PYMNT_VCHR_XREF F
WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT
AND A.PO_ID = B.PO_ID
AND A.LINE_NBR = B.LINE_NBR
AND B.BUSINESS_UNIT = C.BUSINESS_UNIT
AND B.PO_ID = C.PO_ID
AND D.CATEGORY_ID = B.CATEGORY_ID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_ITM_CAT_TBL D_ED
WHERE D.SETID = D_ED.SETID
AND D.CATEGORY_TYPE = D_ED.CATEGORY_TYPE
AND D.CATEGORY_CD = D_ED.CATEGORY_CD
AND D.CATEGORY_ID = D_ED.CATEGORY_ID
AND D_ED.EFFDT <= SYSDATE)
AND ( F.SCHEDULED_PAY_DT >= TO_DATE('2010-07-01','YYYY-MM-DD')
AND F.SCHEDULED_PAY_DT <= TO_DATE('2011-06-30','YYYY-MM-DD'))
AND D.CATEGORY_CD LIKE :1
AND E.VENDOR_ID = C.VENDOR_ID
AND A.BUSINESS_UNIT = F.BUSINESS_UNIT
AND A.VOUCHER_ID = F.VOUCHER_ID
ORDER BY 1
Underlying Issue: This works fine, but it can only prompt on one
Category Code at a time. Category Codes are 5 digits, a 3-digit
"Class" followed by a dash and then a 2-digit "subclass. I have a list
of 375 Category Codes I need to get this Query result for.
I've set up a prompt on this version that allows entry of a Wildcard
(So 123-%%), but that's still about a hundred separate runs of the
Query. Query Manager allows use of an "In List" expression type in
Criteria, but it requires you to manually enter each entry in the
list.
I'm trying to set it up to where I can paste a plaintext copy of the
Code list into an Expression, with proper quotes/commas, and have it
evaluate that to give me a combined list of all the NIGP codes
specified. The Prompt field created by Query Manager doesn't allow
pasting of lists (as far as I know).
Attempted Solution: I viewed the page at http://peoplesoft.ittoolbox.com/groups/technical-functional/peoplesoft-other-l/create-an-expression-in-psoft-90-query-to-paste-a-list-of-emplids-2808427 and I've tried some of the answers given there, but none of them worked. That page led to me trying this modified SQL (obviously the list of codes is truncated a bit for display here):
SELECT DISTINCT D.CATEGORY_CD, D.TN_DESCR1000, C.VENDOR_ID, E.NAME1, SUM ( A.MERCH_AMT_VCHR) OVER (PARTITION BY D.CATEGORY_CD, C.VENDOR_ID),E.SETID,E.VENDOR_ID
FROM PS_PO_LINE_MATCHED A, PS_PO_LINE B, PS_PO_HDR C, PS_ITM_CAT_TBL D, PS_VENDOR E, PS_PYMNT_VCHR_XREF F
WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT
AND A.PO_ID = B.PO_ID
AND A.LINE_NBR = B.LINE_NBR
AND B.BUSINESS_UNIT = C.BUSINESS_UNIT
AND B.PO_ID = C.PO_ID
AND D.CATEGORY_ID = B.CATEGORY_ID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_ITM_CAT_TBL D_ED
WHERE D.SETID = D_ED.SETID
AND D.CATEGORY_TYPE = D_ED.CATEGORY_TYPE
AND D.CATEGORY_CD = D_ED.CATEGORY_CD
AND D.CATEGORY_ID = D_ED.CATEGORY_ID
AND D_ED.EFFDT <= SYSDATE)
AND ( F.SCHEDULED_PAY_DT >= TO_DATE('2010-07-01','YYYY-MM-DD')
AND F.SCHEDULED_PAY_DT <= TO_DATE('2011-06-30','YYYY-MM-DD'))
AND D.CATEGORY_CD = '005-00' OR D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
AND E.VENDOR_ID = C.VENDOR_ID
AND A.BUSINESS_UNIT = F.BUSINESS_UNIT
AND A.VOUCHER_ID = F.VOUCHER_ID
ORDER BY 1
And the SQL above is what's giving me the ORA-00907 error. Has anyone ran into this problem before? Massive wall of text, I know. My apologies. This is my first post here and I'm trying not to leave anything relevant out.
I've got the immediate problem that spurred this question fixed,but that request is just the tip of a very large iceberg, and at some point I need to figure out a way to be able to paste plaintext lists in as criteria using Query Manager, preferably in a way that plays nice with Analytic Grouping.
TL;DR version:
Using Peoplesoft Query Manager to do an Analytic SUM with grouping using OVER, PARTITION BY. When I try to paste a list into the criteria, it throws an ORA-00907 Error.
Any help would be greatly appreciated. Thanks!
Ok, after a bit more tweaking with this, I've found what I think is the underlying issue.
The error, in this case, is two-fold. Part of it was my fault (I didn't check for Peoplesoft mangling the quotation marks I pulled from Word), and part of it was the way Query Manager interprets some kinds of functions (you have to wrap some stuff in a Case When statement to get it to evaluate properly).
First, the "My Fault" part:
Every time I was pasting in my list of test NIGP Codes, I was doing it from a file I kept saved in Microsoft Word.
Which has the probably-handy "replace straight quotes with smart quotes" feature. Peoplesoft goes bonkers when its presented a "smart quote", and will display them as upside-down question marks (there's probably a technical term, I don't know it).
So when I'd test suggestions (such as fixing the quote/comma order as suggested by #Rene Nyffenegger and #WayneH) I'd start with my base test query, add in the expressions and test it, saving it as a separate query. If they didn't work, I'd go back to the base query. That way I could iterate changes and save potential tests as different versions.
My mistake was in not saving the different versions, leaving the application and going back in. It's when you save the query, leave the page, go somewhere else in Peoplesoft, then go back to open Query Manager that it actually shows you that it's doing the character conversion. You can't see it unless you do that. Even though Query Manager is doing it. So it was throwing a character Query Manager wouldn't recognize, but not showing me the character it wouldn't recognize.
I got a new work PC recently, and I've now disabled the Smart Quotes auto-replace for future use.
Second, the "Query Manager: part:
On the version of this that I got to work, I made use of wrapping the "IN" function inside a Case statement. I've found that a lot of SQL functions, when used "plain" (as I'd define them by just copy-pasting from Oracle's definitions pages and filling in the appropriate variables) tend to give PS/Query (Query Manager) heartburn. But if you wrap them inside a CASE...WHEN...END statement that evaluates the result of the function and then build a criteria that selects based on certain values of that result, the function will work and properly display a result.
So for an example, set up this expression (like in the example from #qyb2zm302). I'm using different codes from what was in my original example, but they work the same (they're all five-digit, character-typed codes consisting of three digits, a dash, then two digits)
Case when E.CATEGORY_CD IN
('375-15', '375-30', '375-54', '375-60', '380-30','938-63')
then 'true'
else 'false'
end
And then set a criteria:
AND
Case when E.CATEGORY_CD IN
('375-15', '375-30', '375-54', '375-60', '380-30','938-63')
then 'true'
else 'false'
end
= 'true'
It'll run to completion and return any rows that have that Category Code.
If you don't want to do that, you can do like in #qyb2zm302's Method 2. The only downside to that in Query Manager is that you have to enter them into individual rows in the "List", and if you can only copy-paste 25 at a time.
Wrapping it in a Case Statement lets you paste it directly into an Expression, which is far better for larger lists.
Solutions:
The above is the code I went with that worked. It's simplifying a bit for brevity's sake, but it works.
In List works through the native Query Manager option as long as you manually-populate the list
D.CATEGORY_CD = '005-00' OR works as long as you wrap it in a Case Statement
D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07') works as long as you wrap it in a Case Statement
Peoplesoft hates Smart Quotes. None of the above will work if you're copying quotation marks directly from Word, but you won't see it unless you save, leave and come back to the same query in edit mode
Formatting is important. All of the above require the proper comma/quotation formatting, as pointed out by Rene and Wayne. Meaning: ('xxx-xx', 'xxx-01','xxx-02') etc
Thanks to everyone who helped on this! I don't think I've head-desked this hard before on any question, but I guess that's part of the learning process. Since all the answers posted are valid and correct (or at least a portion of the larger "correct"), I'm going to flag them all.
The
D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
part looks fishy to me
Since a '' within a string "evaluates" to a single ' the first string is
'015-00,'' '
followed by (the non-string)
015-06,
The following '' is probably the thing that the parser stumbles upon since it's pretty meaningless.
Edit try it with a D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07').
Following the link you posted, I see 2 methods for doing what you are trying to accomplish.
I also notice that you tried a 3rd method.
Method 1
Criteria > Add Criteria
Expression Type: Character
Length: 255
Expression Text: D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07') AND 1
Condition Type: equal to
Constant: 1
Method 2
Criteria > Add Criteria
Field: D.CATEGORY_CD
Condition Type: in list
Value: 015-00','015-06','015-10','615-07
Method 3 (Your Method)
Criteria > Add Criteria
Field: D.CATEGORY_CD
Condition Type: equal to
Define Expression: '015-00' OR D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07')
Question) Does the below exactly match the text you are putting the Expression box?
'015-00' OR D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07')
If not, what are you putting in that box?
I think the D.CATEGORY_CD criteria are giving you the problems, I changed the double quotes to single quotes and then it still looked strange to me. I then notice the commas are inside your quotes and not between them, try making the one criteria line look like this:
before:
OR D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
after:
OR D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07')
Also, the "IN" is an implied "OR" and I am not sure if you have parenthesis around the two D.CATEGORY_CD,
I would just put the one additional code into the IN criteria and remove the "D.CATEGORY_CD =" line:
before:
AND D.CATEGORY_CD = '005-00' OR D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07')
after:
AND D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07', '005-00')
Of course, you are already ordering by CATEGORY_CD, you could remove this criteria and pull all categories in one run (that is unless there are too many rows for excel), and then you might also want to include either VENDOR_ID or NAME1 in the ORDER BY clause.
Hope that helps you.