Oracle Text catsearch left-truncated search - oracle

So according to The oracle documentation on catsearch it's not possible to use left-truncated searches with a catsearch text-query.
However, I've found that using double wildcards does have the desired effect of left-truncated queries.
E.g. select * from foo where CATSEARCH(bar, '**zing', '')>0; would find records with bar having the value of "this is amazing" while
select * from foo where CATSEARCH(bar, '*zing', '')>0; wouldn't
For the configuration of my word list, I have enabled substring_index and prefix_index. In the substring_index description, it does mention left-truncated (and double-truncated) wildcard usage.
I cannot find any reason as to why/how the double wildcard usage works, how optimized it is and whether or not it has any other side-effects.
So the ultimate questions:
Why does this syntax work, is there anywhere that this is described?
Is it equal, performance-wise?
Are there any other side-effects people should be aware of when using this syntax?

Related

How to alias the ClassDescription property in a FileNet P8 saved search?

Given this SQL statement written in ACCE:
SELECT
This,
D.ClassDescription,
D.Id myId,
D.MimeType myMimeType
FROM
Document D
WHERE
D.Id= '{00093471-0100-C09E-828F-6B22177A2922}'
How do I properly alias the ClassDescription field?, it does not work like the other columns (properties).
I've tried using 'AS' and Brackets '[]' but no such luck, the column is silently removed from the results of the query.
Thank you,
Stephen
The aliasing of columns is straight-forward, you can use the following:
foo AS bar
foo [bar]
foo bar
As a side note to anyone who may peruse this question and answer, the installation of the ACCE instance I was using was somewhat problematic (or a bug), it would NOT allow me to alias the ClassDescription Property. If I programmatically create the same query using the SearchSQL API, the aliasing works as expected as long as it conforms to SQL-92 dialect.

Solr query conundrum

I've recently swapped from using Lucene for Sitecore to Solr.
For the most part it has been smooth, but the way I was writing some queries (using Sitecore.ContentSearch.Linq) abstraction now don't seem to be compatible.
Specifically, I have a situation where I've got "global" content and "regional" content, like so:
Home (000)
X
Y
Z
Regions (ID: 111)
Region 1 (ID: 221)
A
B
Region 2 (ID: 222)
D
My code worked on Lucene, but now doesn't on Solr. It should find all "global" and a single region's content, excluding all other region's content. So as an example, if the user's current region was Region 1, I'd want the query to return content X, Y, Z, A, B.
Sitecore's Item Crawler has a field for each item in the index called "_path" which is a multivalued string field of IDs, so as an example, Region 1's _path field value would be [000, 111, 221 ].
When I write this using the Linq abstraction it comes out as below which doesn't return results.
-_path:(111) OR _path:(221)
But _path:(111) does return result. Mind blown.
When I use the Solr interface and wrap each side of the OR in extra brackets like below (which I'd consider redundant) it works! Mind blown v2.
(-_path:(111)) OR (_path:(221))
Firstly, what's the difference between those queries?
Secondly, my real problem is I can't add these extra brackets as I'm working in an abstraction Linq so the brackets will be "optimized" out.
Any advice would be awesome! Cheers.
The problem here is, lucene's negative queries don't work like you think they do. They only remove results from what has been found. -_path:111 doesn't find all documents which aren't in 111, it doesn't find anything at all. It only removes results. So you are finding all results with path "221", then removing any that also have path "111", which from your heirarchy, I assume is all of them. See my answer here for a bit more on that topic.
The OR makes it seem like it ought to work, but really -_path:(111) OR _path:(221) is the same as -_path:(111) _path:(221). The moral here is: Don't use Lucene's AND/OR/NOT syntax, if you can help it. Use +/-. +/- syntax actually expresses how the query operates, AND/OR/NOT doesn't. It attempts to shoehorn it into a different, SQL-like retrieval model and leads to some unexpected behavior like this.
So, what about: (-_path:(111)) OR (_path:(221))
Well, first, does it actually work? Or does it just get some results?
If it just gets some results, but just seems to get the same results as _path:221: The reason is -_path:111 gets no results, so your query is, in practice, something like: (nothing) OR (_path:221), which is equivalent to _path:221
If it really does get the results you expect (I'm guessing it probably does): Something is translating your query into something like: (*:* -_path:111) (_path:221). Solr does have some logic along these lines, though I'm not quite sure in this case. Essentially, it puts a match-all in front of any lonely negative queries it finds, allowing them to do what you were expecting. If the implicit *:* makes you nervous about performance, well, it should. But lucene is an inverted index, it does well with finding matches on a term quickly. Getting everything that doesn't match goes against the grain of that retrieval model, and will pretty much have to do a full scan of the index.

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

Selecting multiple results from XQUERY query

I am trying to select multiple columns from a query, but so far, I can only manage to select one. So I'm basically stuck with either selecting one, or all of them.
Here's my expression, what I got so far, which select only (1) column:
let $y := doc("http://en.wikipedia.org/wiki/List_of_deaths_on_eight-thousanders")//table[preceding-sibling::h2//span[string() = "K2"]][1]
return $y/tr/td[2]/string()
I would love some explanation of how one would go about doing this, since there's almost no documentation of this lovely language.
How would you like the result to be returned? You could construct new elements, or concatenate strings. There are many ways that this could be accomplished.
Here's one way to get comma-separated values:
return $y/tr/fn:string-join( (td[2] | td[4]), ", " )
You can try it on zorb.io.
Update
(td[2] | td[4]) selects both elements, and passes them, as a sequence, to fn:string-join(). | is the XQuery union operator (and can be substituted for the keyword).
As far as documention, the functx site documents the standard library (all fn-prefixed functions), and has useful examples. And the specs are surprisingly readable.

ORA-00907 Error when using Analytic Function in a Query (PS/Query, Peopletools 8.51.12)

Query's throwing an ORA-00907 Error when I try to paste a list of values into a criteria.
Background: I'm not a developer, I'm just an end user that's studied enough to where I can write queries using PS/Query within Peoplesoft,
for my company's implementation. I work with Peoplesoft's FSCM module
(Financials and Supply Chain Management), currently on Version FSCM
8.90.08.024, using I think Oracle 11g as the base database.
I'm mostly self-taught, and the technical experts we have are busy
with database/application stuff, or they aren't familiar with my
section's specific data needs.
I should point out that I'm unable to directly write SQL statements to
Query the database. I have to use a built-in program called "PS/Query"
(also known as Query Manager) with a GUI that writes the SQL for you
and saves it as a Query that you can run to the database to extract
data. This is relevant to my question only in that:
1. I cannot create or alter views/tables
2. I cannot perform any type of SQL Statement except "SELECT"
3. I can embed PL/SQL, MetaSQL and plain SQL into Expressions
4. At this point, Query Manager is the only option I have.
PS/Query is my only experience with SQL so far, aside from Oracle's
documentation and sites like this. From my research, it's considered
extremely confining by "actual" SQL programmers.The restrictions on it
require you to do things in a manner that violates what seem to be
best practices of SQL coding.
Query Request: I have a query I've been requested to write that pulls out spend (on Vouchers and POs) against certain system-defined
Category Codes. What I'm trying to do is pull in Voucher IDs, sum the
merchandise amounts on them by Vendor and Category Code, and display
the results. Or in other words, for every unique combination of
Vendor/Category, add up all the Voucher Amounts that have that
Vendor/Category combination.
Using the SUM (Fieldname) OVER (PARTITION BY fieldname, fieldname)
syntax.
So the end result should look something like...
Code Vendor Amount
123-45 Acme $5000.00
123-45 Apple $4200.00
123-46 Acme $750.00
With that said, here's the SQL that Query Manager is displaying to get the result set I showed above:
SELECT DISTINCT D.CATEGORY_CD, D.TN_DESCR1000, C.VENDOR_ID, E.NAME1, SUM ( A.MERCH_AMT_VCHR) OVER (PARTITION BY D.CATEGORY_CD, C.VENDOR_ID),E.SETID,E.VENDOR_ID
FROM PS_PO_LINE_MATCHED A, PS_PO_LINE B, PS_PO_HDR C, PS_ITM_CAT_TBL D, PS_VENDOR E, PS_PYMNT_VCHR_XREF F
WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT
AND A.PO_ID = B.PO_ID
AND A.LINE_NBR = B.LINE_NBR
AND B.BUSINESS_UNIT = C.BUSINESS_UNIT
AND B.PO_ID = C.PO_ID
AND D.CATEGORY_ID = B.CATEGORY_ID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_ITM_CAT_TBL D_ED
WHERE D.SETID = D_ED.SETID
AND D.CATEGORY_TYPE = D_ED.CATEGORY_TYPE
AND D.CATEGORY_CD = D_ED.CATEGORY_CD
AND D.CATEGORY_ID = D_ED.CATEGORY_ID
AND D_ED.EFFDT <= SYSDATE)
AND ( F.SCHEDULED_PAY_DT >= TO_DATE('2010-07-01','YYYY-MM-DD')
AND F.SCHEDULED_PAY_DT <= TO_DATE('2011-06-30','YYYY-MM-DD'))
AND D.CATEGORY_CD LIKE :1
AND E.VENDOR_ID = C.VENDOR_ID
AND A.BUSINESS_UNIT = F.BUSINESS_UNIT
AND A.VOUCHER_ID = F.VOUCHER_ID
ORDER BY 1
Underlying Issue: This works fine, but it can only prompt on one
Category Code at a time. Category Codes are 5 digits, a 3-digit
"Class" followed by a dash and then a 2-digit "subclass. I have a list
of 375 Category Codes I need to get this Query result for.
I've set up a prompt on this version that allows entry of a Wildcard
(So 123-%%), but that's still about a hundred separate runs of the
Query. Query Manager allows use of an "In List" expression type in
Criteria, but it requires you to manually enter each entry in the
list.
I'm trying to set it up to where I can paste a plaintext copy of the
Code list into an Expression, with proper quotes/commas, and have it
evaluate that to give me a combined list of all the NIGP codes
specified. The Prompt field created by Query Manager doesn't allow
pasting of lists (as far as I know).
Attempted Solution: I viewed the page at http://peoplesoft.ittoolbox.com/groups/technical-functional/peoplesoft-other-l/create-an-expression-in-psoft-90-query-to-paste-a-list-of-emplids-2808427 and I've tried some of the answers given there, but none of them worked. That page led to me trying this modified SQL (obviously the list of codes is truncated a bit for display here):
SELECT DISTINCT D.CATEGORY_CD, D.TN_DESCR1000, C.VENDOR_ID, E.NAME1, SUM ( A.MERCH_AMT_VCHR) OVER (PARTITION BY D.CATEGORY_CD, C.VENDOR_ID),E.SETID,E.VENDOR_ID
FROM PS_PO_LINE_MATCHED A, PS_PO_LINE B, PS_PO_HDR C, PS_ITM_CAT_TBL D, PS_VENDOR E, PS_PYMNT_VCHR_XREF F
WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT
AND A.PO_ID = B.PO_ID
AND A.LINE_NBR = B.LINE_NBR
AND B.BUSINESS_UNIT = C.BUSINESS_UNIT
AND B.PO_ID = C.PO_ID
AND D.CATEGORY_ID = B.CATEGORY_ID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_ITM_CAT_TBL D_ED
WHERE D.SETID = D_ED.SETID
AND D.CATEGORY_TYPE = D_ED.CATEGORY_TYPE
AND D.CATEGORY_CD = D_ED.CATEGORY_CD
AND D.CATEGORY_ID = D_ED.CATEGORY_ID
AND D_ED.EFFDT <= SYSDATE)
AND ( F.SCHEDULED_PAY_DT >= TO_DATE('2010-07-01','YYYY-MM-DD')
AND F.SCHEDULED_PAY_DT <= TO_DATE('2011-06-30','YYYY-MM-DD'))
AND D.CATEGORY_CD = '005-00' OR D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
AND E.VENDOR_ID = C.VENDOR_ID
AND A.BUSINESS_UNIT = F.BUSINESS_UNIT
AND A.VOUCHER_ID = F.VOUCHER_ID
ORDER BY 1
And the SQL above is what's giving me the ORA-00907 error. Has anyone ran into this problem before? Massive wall of text, I know. My apologies. This is my first post here and I'm trying not to leave anything relevant out.
I've got the immediate problem that spurred this question fixed,but that request is just the tip of a very large iceberg, and at some point I need to figure out a way to be able to paste plaintext lists in as criteria using Query Manager, preferably in a way that plays nice with Analytic Grouping.
TL;DR version:
Using Peoplesoft Query Manager to do an Analytic SUM with grouping using OVER, PARTITION BY. When I try to paste a list into the criteria, it throws an ORA-00907 Error.
Any help would be greatly appreciated. Thanks!
Ok, after a bit more tweaking with this, I've found what I think is the underlying issue.
The error, in this case, is two-fold. Part of it was my fault (I didn't check for Peoplesoft mangling the quotation marks I pulled from Word), and part of it was the way Query Manager interprets some kinds of functions (you have to wrap some stuff in a Case When statement to get it to evaluate properly).
First, the "My Fault" part:
Every time I was pasting in my list of test NIGP Codes, I was doing it from a file I kept saved in Microsoft Word.
Which has the probably-handy "replace straight quotes with smart quotes" feature. Peoplesoft goes bonkers when its presented a "smart quote", and will display them as upside-down question marks (there's probably a technical term, I don't know it).
So when I'd test suggestions (such as fixing the quote/comma order as suggested by #Rene Nyffenegger and #WayneH) I'd start with my base test query, add in the expressions and test it, saving it as a separate query. If they didn't work, I'd go back to the base query. That way I could iterate changes and save potential tests as different versions.
My mistake was in not saving the different versions, leaving the application and going back in. It's when you save the query, leave the page, go somewhere else in Peoplesoft, then go back to open Query Manager that it actually shows you that it's doing the character conversion. You can't see it unless you do that. Even though Query Manager is doing it. So it was throwing a character Query Manager wouldn't recognize, but not showing me the character it wouldn't recognize.
I got a new work PC recently, and I've now disabled the Smart Quotes auto-replace for future use.
Second, the "Query Manager: part:
On the version of this that I got to work, I made use of wrapping the "IN" function inside a Case statement. I've found that a lot of SQL functions, when used "plain" (as I'd define them by just copy-pasting from Oracle's definitions pages and filling in the appropriate variables) tend to give PS/Query (Query Manager) heartburn. But if you wrap them inside a CASE...WHEN...END statement that evaluates the result of the function and then build a criteria that selects based on certain values of that result, the function will work and properly display a result.
So for an example, set up this expression (like in the example from #qyb2zm302). I'm using different codes from what was in my original example, but they work the same (they're all five-digit, character-typed codes consisting of three digits, a dash, then two digits)
Case when E.CATEGORY_CD IN
('375-15', '375-30', '375-54', '375-60', '380-30','938-63')
then 'true'
else 'false'
end
And then set a criteria:
AND
Case when E.CATEGORY_CD IN
('375-15', '375-30', '375-54', '375-60', '380-30','938-63')
then 'true'
else 'false'
end
= 'true'
It'll run to completion and return any rows that have that Category Code.
If you don't want to do that, you can do like in #qyb2zm302's Method 2. The only downside to that in Query Manager is that you have to enter them into individual rows in the "List", and if you can only copy-paste 25 at a time.
Wrapping it in a Case Statement lets you paste it directly into an Expression, which is far better for larger lists.
Solutions:
The above is the code I went with that worked. It's simplifying a bit for brevity's sake, but it works.
In List works through the native Query Manager option as long as you manually-populate the list
D.CATEGORY_CD = '005-00' OR works as long as you wrap it in a Case Statement
D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07') works as long as you wrap it in a Case Statement
Peoplesoft hates Smart Quotes. None of the above will work if you're copying quotation marks directly from Word, but you won't see it unless you save, leave and come back to the same query in edit mode
Formatting is important. All of the above require the proper comma/quotation formatting, as pointed out by Rene and Wayne. Meaning: ('xxx-xx', 'xxx-01','xxx-02') etc
Thanks to everyone who helped on this! I don't think I've head-desked this hard before on any question, but I guess that's part of the learning process. Since all the answers posted are valid and correct (or at least a portion of the larger "correct"), I'm going to flag them all.
The
D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
part looks fishy to me
Since a '' within a string "evaluates" to a single ' the first string is
'015-00,'' '
followed by (the non-string)
015-06,
The following '' is probably the thing that the parser stumbles upon since it's pretty meaningless.
Edit try it with a D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07').
Following the link you posted, I see 2 methods for doing what you are trying to accomplish.
I also notice that you tried a 3rd method.
Method 1
Criteria > Add Criteria
Expression Type: Character
Length: 255
Expression Text: D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07') AND 1
Condition Type: equal to
Constant: 1
Method 2
Criteria > Add Criteria
Field: D.CATEGORY_CD
Condition Type: in list
Value: 015-00','015-06','015-10','615-07
Method 3 (Your Method)
Criteria > Add Criteria
Field: D.CATEGORY_CD
Condition Type: equal to
Define Expression: '015-00' OR D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07')
Question) Does the below exactly match the text you are putting the Expression box?
'015-00' OR D.CATEGORY_CD IN ('015-00','015-06','015-10','615-07')
If not, what are you putting in that box?
I think the D.CATEGORY_CD criteria are giving you the problems, I changed the double quotes to single quotes and then it still looked strange to me. I then notice the commas are inside your quotes and not between them, try making the one criteria line look like this:
before:
OR D.CATEGORY_CD IN ('015-00,'' '015-06,'' '015-10,'' '615-07'')
after:
OR D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07')
Also, the "IN" is an implied "OR" and I am not sure if you have parenthesis around the two D.CATEGORY_CD,
I would just put the one additional code into the IN criteria and remove the "D.CATEGORY_CD =" line:
before:
AND D.CATEGORY_CD = '005-00' OR D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07')
after:
AND D.CATEGORY_CD IN ('015-00', '015-06', '015-10', '615-07', '005-00')
Of course, you are already ordering by CATEGORY_CD, you could remove this criteria and pull all categories in one run (that is unless there are too many rows for excel), and then you might also want to include either VENDOR_ID or NAME1 in the ORDER BY clause.
Hope that helps you.

Resources