Oracle text curly braces behavior - oracle

I'm using oracle text to do a readahead (according to the spec writer) in the search bar.
Basically, a user can start typing text and we fill the suggestions bar with likely matches.
I tried using oracle text for this, and ran into some issues, and the latest one being:
Table contains this entry for answertext: ... we offer many pricing options ...
SELECT
questiontext as qtext,
answertext as text,
questionid FROM question
WHERE contains(answertext, '{pric}', 1) > 0
;
This query returns nothing. But using {pricing} will return the correct result.
And suggestion why this is happening would be great!
Edit: just wanted to add that using stemming does not work for me because the user wants to differentiate between "report" and "reporting" and they want the matching substring to be highlighted which can be done if I can find the substring among the returned results.
Edit 2: I have my guess, that oracle tokenizes each word using word boundary of some sort in the index, and thus without any wildcards it looks for a token that equals = 'pric' and therefore does not find it (because there is a token 'pricing'). So, if that guess is correct I would love if someone can chime in for how I can make the query above work with the example entry while still maintaining whitespace so if type 'pricing options' it should return but if i type 'many options' it should not...

CONTAINS operator supports wildcards and fuzzy text search. Try:
SELECT * FROM question WHERE contains(answertext, '{pric%}', 1) > 0;
or
SELECT * FROM question WHERE contains(answertext, 'fuzzy({pric})', 1) > 0;
But with fuzzy "prize" will also match your search criteria.
To highlight found substrings you can use CTX_DOC.MARKUP.

Related

Maching two words as a single word

Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L
A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.
I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).
So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?
I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,
"output_unigrams":"true"
}
Note that this is also already the default configuration, I just added it for clarity.

ElasticSearch Nest AutoComplete based on words split by whitespace

I have AutoComplete working with ElasticSearch (Nest) and it's fine when the user types in the letters from the begining of the phrase but I would like to be able to use a specialized type of auto complete if it's possible that caters for words in a sentence.
To clarify further, my requirement is to be able to "auto complete" like such:
Imagine the full indexed string is "this is some title". When the user types in "th", this comes back as a suggestion with my current code.
I would also like the same thing to be returned if the user types in "som" or "title" or any letters that form a word (word being classified as a string between two spaces or the start/end of the string).
The code I have is:
var result = _client.Search<ContentIndexable>(
body => body
.Index(indexName)
.SuggestCompletion("content-suggest" + Guid.NewGuid(),
descriptor =>
descriptor
.OnField(t => t.Title.Suffix("completion"))
.Text(searchTerm)
.Size(size)));
And I would like to see if it would be possible to write something that matches my requirement using SuggestCompletion (and not by doing a match query).
Many thanks,
Update:
This question already has an answer here but I leave it here since the title/description is probably a little easier to search by search engines.
The correct solution to this problem can be found here:
Elasticsearch NEST client creating multi-field fields with completion
#Kha i think it's better to use the NGram Tokenizer
So you should use this tokenizer when you create the mapping.
If you want more info, and maybe an example write back.

Amazon Cloudsearch not searching with partial string

I'm testing Amazon Cloudsearch for my web application and i'm running into some strange issues.
I have the following domain indexes: name, email, id.
For example, I have data such as: John Doe, John#example.com, 1
When I search for jo I get nothing. If I search for joh I still get nothing, But if I search for john then I get the above document as a hit. Why is it not getting when I put partial strings? I even put suggestors on name and email with fuzzy matching enabled. Is there something else i'm missing? I read the below on this:
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching.html
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-compound-queries.html
I'm doing the searches using boto as well as with the form on AWS page.
What you're trying to do -- finding "john" by searching "jo" -- is a called a prefix search.
You can accomplish this either by searching
(prefix field=name 'jo')
or
q=jo*
Note that if you use the q=jo* method of appending * to all your queries, you may want to do something like q=jo* |jo because john* will not match john.
This can seem a little confusing but imagine if google gave back results for prefix matches: if you searched for tort and got back a mess of results about tortoises and torture instead of tort (a legal term), you would be very confused (and frustrated).
A suggester is also a viable approach but that's going to give you back suggestions (like john, jordan and jostle rather than results) that you would then need to search for; it does not return matching documents to you.
See "Searching for Prefixes in Amazon CloudSearch" at http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html
Are your index field types "Text"? If they are just "Literals", they have to be an exact match.
I think you must have your name and email fields set as the literal type instead of the text type, otherwise a simple text search of 'jo' or 'Joh' should've found the example document.
While using a prefix search may have solved your problem (and that makes sense if the fields are set as the literal type), the accepted answer isn't really correct. The notion that it's "like a google search" isn't based on anything in the documentation. It actually contradicts the example they use, and in general muddies up what's possible with the service. From the docs:
When you search text and text-array fields for individual terms, Amazon CloudSearch finds all documents that contain the search terms anywhere within the specified field, in any order. For example, in the sample movie data, the title field is configured as a text field. If you search the title field for star, you will find all of the movies that contain star anywhere in the title field, such as star, star wars, and a star is born. This differs from searching literal fields, where the field value must be identical to the search string to be considered a match.

Find / search the AOT for an exact match

Is it possible to find (search) in Dynamics AX 2009 for an exact match?
For example, when I am searching in the AOT for "AddressRelationship", I don't want to see DirPartyAddressRelationship in the results.
Okay, it took me a while, but I have figured this out, it Is possible.
Adding a breakpoint to the find form shows that it uses a class called SysUtilScanSource to find your string within the AX source code.
In SysUtilScanSource.do() the method match is used to find a match against the specific source code. You can read more about match here;
http://msdn.microsoft.com/en-us/library/aa886279(v=ax.10).aspx
The match method allows you to use expressions.
The expression you require is as follows;
:SPACE
Where SPACE is the character ' '. Sets the match to blanks, tabulations, and control characters such as Enter (new line).
For example:
match("ab: cd","ab cd"); //returns 1
match("ab: cd","ab\ncd"); //returns 1
match("ab: cd","ab\tcd"); //returns 1
match("ab: cd","ab cd"); //returns 0 - only the first space is matched
Therefore, in your example you need enter the following string in the "containing text" field;
: AddressRelationship:
Note that in the above string there are spaces in the following locations;
:SPACEAddressRelationship:SPACE
Try it. I did, it works a treat.
When you do the find, look at "properties" tab at the end of the find form window. This allows you to scale down the search based on properties. I do not believe there is a way to use an exact match but you can narrow your search down using the properties.

Count characters after given symbol in oracle varchar column value

How would I go about counting the characters after a certain character. I'm new to Oracle, and I've learned quite a bit however I'm stumped at this point. I found a couple functions that will get you a substring and I found a function that will give you the length of a string. I am examining an email address, myemail#thedomain.com. I want to check the length after the '.' in the email.
SELECT email
FROM user_table
WHERE length(substr(email, /*what values*/, /*to put here*/))
I don't know if it's actually possible to find the location of the final '.' in the email string?
I'm not sure I would use substr. You can try something like this :
select length('abcd#efgh.123.4567') - instr('abcd#efgh.123.4567', '.', -1) from dual
Using instr(..,..,-1) searches backwards from the last character to find the position.
Since you're doing checks, I suggest you validate the format with regular expressions using REGEXP_INSTR. For instance, an email validation I found on this site is REGEXP_INSTR(email, '\w+#\w+(\.\w+)+') > 0
I didn't check it myself, but it looks quite ok.
Cheers.

Resources