how to optimize query with contains clause - oracle

When I use contains() function on title field with a parameter with length 1 (for example ..... where contains(title , '%x%') > 0 )
my query response time slows down.
As I know it's because of the length of the parameter (i.e length ('x') = 1 )
Is there any solution to optimize such query?

Are you using CONTAINS just to check whether title contains the letter x? That is a very inefficient way. Use this instead:
... where title like '%x%'
"like" simply returns yes or no, it doesn't calculate a score like contains does. And by not wrapping "title" within a function, the Oracle optimizer is free to use an index on "title" if you defined one. (If you didn't you may consider adding one.)
Good luck!

The problem is not in the length of the search string.
A search for %xxxxxxxxxxx% will take a similar time.
The problem is in the leading % which disables the index range scan. To match %x% the whole index must be scaned.
You should try to limit the search for only x%, i.e. without the leading % which is in general impossible as you get only subset of the results.
In some cases it is possible and I'll illustrate it on a simple example.
Suppose you are searching a prefix of a table and the table can by both unqualified and qualified.
You must search for %x% to get a match on both of the strings:
XTABLE
OWNER.XTABLE2
What you can do is define the dot as a whitespace
which will split the owner form the table name and index three tokes:
XTABLE
OWNER
XTABLE2
That will enable to search for x% only ending with index range scan with better performance.

Related

Query a text/keyword field in Elasticsearch that contains at least one item not matching a set

I have a document has a "bag.contents" field (indexed as text with a .keyword derivative) that contains a comma separated list of items contained in it. Below are some samples:
`Apple, Apple, Apple`
`Apple, Orange`
`Car, Apple` <--
`Orange`
`Bus` <--
`Grape, Car` <--
'Car, Bus` <--
The desired query results should be all documents where there is at least one instance of something other than 'Apple', 'Orange', 'Grape', as per the arrows above.
I'm sure the DSL is a combination of must and not but after 20 or so iterations it seems very difficult to get Elasticsearch to return the correct result set short of one that doesn't contain any of those 3 things.
It is also worth noting that this field in the original document is a JSON array and Kibana shows it as a single field with the elements as a comma-separated field. I suspect this may be complicating it.
1 - If it is showing up as single field, probably its not indexed as array - Please make sure document to index is formed properly. i.e, you need it to be
{ "contents": ["apple","orange","grape"]}
and not
{"contents": "apple,orange,grape"}
2- Regarding query - if you know all the terms possible while doing query- you can form a term_set query with all other terms but apple , orange and grape. termset query allows to control min matches required ( 1 in your case)
If you dont know all possible terms , may be create a separate field for indexing all other words minus apple orange and grape and query against that field.

Elasticsearch query on string representation of number

Good day:
I have an indexed field called amount, which is of string type. The value of amount can be either one or 1. Say in this example, we have amount=1 as an indexed document but, I try to search for one, ElasticSearch will not return the value unless I put 1 for the search query. Thoughts on how I can get this to work? I'm thinking a tokenizer is what's needed.
Thanks.
You probably don't want this for sevenmillionfourhundredfifteenthousendtwohundredfourteen and the like, but only for a small number of values.
At index time I would convert everything to a proper number and store it in a numerical field, which then even allows to sort --- if you need it. Apart from this I would use synonyms at index and at query time and map everything to the digit-strings, but in a general text field that is searched by default.

Elasticsearch multi term search

I am using Elasticsearch to allow a user to type in a term to search. I have the following property 'name' I'd like to search, for instance:
'name': 'The car is black'
I'd like to have this document returned if the following is used to search black car or car black.
I've tried doing a bool must and doing multiple terms ['black', 'car'] but it seems like it only works if the entire string is a match.
So what I'd really like to do is more of a, does the term contain both words in any order.
Can someone please get me on the right track? I've been banging my head on this one for a while.
If it seems like it only works if the entire string is a match, first make sure that in index mapping your string property name is analysed, i.e. mapping for this property doesn't contain "index": "not_analyzed". If it isn't so, you'll need to reindex your index in order to be able to search for tokens rather than for the whole phrase only.
Once you're sure your strings are analysed you can use:
Terms query with "minimum_should_match" parameter equalling to the number of words entered.
Bool query with must clause containing term queries per each word.
Common terms query which has a nice clean syntax for this purpose (you don't need to break down input string and construct more complex query structure in your app like with previous two) in addition to taking a smarter approach to stopwords analysing.

Oracle string search performance issue

I have a simple search store procedure in Oracle 11GR2 in a table with over 1.6 million records. I am puzzled by the fact that if I want to search for a work inside a column, such as "%boston%", it would take 12 seconds. I have an index on the name collumn.
select description from travel_websites where name like "%boston%";
If I only search for a word start with Boston like "boston%", it only takes 0.15 seconds.
select description from travel_websites where name like "boston%";
I added an index hint and try to force optimizer to use my index on the name column, it did not help either.
select description /*+ index name_idx */ from travel_websites where name like "%boston%";
Any advises would be greatly appreciated.
You cannot use an index range scan for a predicate that has a leading wildcard (i.e. like '%boston%'). This makes sense if you think about how an index is stored on disk-- if you don't know what the first character of the string you are searching is, you can't traverse the index to look for index entries that match that string. You may be able to do a full scan of the index where you read every leaf block and search the name there to see if it contains the string you want. But that requires a full scan of the index plus you then have to visit the table for every ROWID you get from the index in order to fetch any columns that are not part of the index that you just full-scanned. Depending on the relative size of the table and the index and how selective the predicate is, the optimizer may easily decide that it is quicker to just do a table scan if you're searching for a leading wildcard.
Oracle does support full text search but you have to use Oracle Text which would require that you build an Oracle Text index on the name column and use the CONTAINS operator to do the search rather than using a LIKE query. Oracle Text is very robust product so there are quite a few options to consider both in building the index, refreshing the index, and building the query depending on how sophisticated you want to get.
Your index hint is not correctly specified. Assuming there is an index on name, that the name of that index is name_idx, and that you want to force a full scan of the index (just to reiterate, a range scan on the index is not a valid option if there is a leading wildcard), you would need something like
select /*+ index(travel_websites name_idx) */ description
from travel_websites
where name like '%boston%'
There is no guarantee, however, that a full index scan is going to be any more efficient than a full table scan. And it is entirely possible that the optimizer is choosing the index full scan already without the hint (you don't specify what the query plans are for the three queries).
Oracle (and as far as I know most other databases) by default indexes strings so that the index can only be used to look up string matches from the start of the string. That means, a LIKE 'boston%' (startswith) will be able to use the index, while a LIKE '%boston' (endswith) or LIKE '%boston%' (contains) will not.
If you really need indexes that can find substrings fast, you can't use the regular index types for strings, but you can use TEXT indexes which sadly may require slightly different query syntax.

SQLite - how to return rows containing a text field that contains one or more strings?

I need to query a table in an SQLite database to return all the rows in a table that match a given set of words.
To be more precise: I have a database with ~80,000 records in it. One of the fields is a text field with around 100-200 words per record. What I want to be able to do is take a list of 200 single word keywords {"apple", "orange", "pear", ... } and retrieve a set of all the records in the table that contain at least one of the keyword terms in the description column.
The immediately obvious way to do this is with something like this:
SELECT stuff FROM table
WHERE (description LIKE '% apple %') or (description LIKE '% orange %') or ...
If I have 200 terms, I end up with a big and nasty looking SQL statement that seems to me to be clumsy, smacks of bad practice, and not surprisingly takes a long time to process - more than a second per 1000 records.
This answer Better performance for SQLite Select Statement seemed close to what I need, and as a result I created an index, but according to http://www.sqlite.org/optoverview.html sqlite doesn't use any optimisations if the LIKE operator is used with a beginning % wildcard.
Not being an SQL expert, I am assuming I'm doing this the dumb way. I was wondering if someone with more experience could suggest a more sensible and perhaps more efficient way of doing this?
Alternatively, is there a better approach I could use to the problem?
Using the SQLite fulltext search would be faster than a LIKE '%...%' query. I don't think there's any database that can use an index for a query beginning with %, as if the database doesn't know what the query starts with then it can't use the index to look it up.
An alternative approach is putting the keywords in a separate table instead, and making an intermediate table that has the information about which row in your main table has which keywords. If you indexed all the relevant columns that way, it could be queried very quickly.
Sounds like you might want to have a look at Full Text Search. It was contributed to SQLite by someone from google. The description:
allows the user to efficiently query
the database for all rows that contain
one or more words (hereafter
"tokens"), even if the table contains
many large documents.
This is the same problem as full-text search, right? In which case, you need some help from the DB to construct indexes into these fields if you want to do this efficiently. A quick search for SQLite full text search yields this page.
The solution you correctly identify as clumsy is probably going to do up to 200 regular expression matches per document in the worst case (i.e. when a document doesn't match), where each match has to traverse the entire field. Using the index approach will mean that your search speed will be independent of the size of each document.

Resources