I have a table that has IDs and Strings and I need to be able to properly index for searching for the end of the strings. How we are currently handling it is copying the information into another table and reversing each string and indexing it normally. What I would like to do is use some kind of index that allows to search in reverse.
Example
Data:
F7421kFSD1234
d7421kFSD1235
F7541kFSD1236
d7421kFSD1234
F7421kFSD1235
b8765kFSD1235
d7421kFSD1234
The way our users usually input thier search is something along the lines of...
*1234
By reversing the strings (and the search string: 4321*) I could find what I am looking for without completely scanning the whole table. My question is: Is making a second table the best way of doing this?
Is there a way to reverse index?
Ive tried an index like this...
create index REVERSE_STR_IDX on TABLE(STRING) REVERSE;
but oracle doesn't seem to be using it according to the Explain Plan.
Thanks in advance for the help.
Update:
I did have a problem with unicode characters not being reversed correctly. The solution to this was casting them.
Example:
select REVERSE(cast(string AS varchar2(2000)))
from tbl
where id = 1
There is the myth that a reverse key index can be used for that, however, I've never seen that in action.
I would try a "manual" function based index.
CREATE INDEX REVERSE_STR_IDX on TBL(reverse(string));
SELECT *
FROM TBL
WHERE reverse(string) LIKE '4321%';
Related
We have a query that takes 48 minutes to run a search on a clob. The query is written as if it is not a clob column and uses contains and near. This search for 3 words within a certain word distance from each other is important. I'm needing to speed this up and want to do an index on the clob, but don't know if that would work and don't fully understand how to do it. I found this from Tom Burleson
http://www.dba-oracle.com/t_clob_search_query.htm OR https://asktom.oracle.com/pls/apex/asktom.search?tag=oracle-text-contains-search-with-near-is-very-slow
, but can't figure out how to do it with contains and near to enable the search of 3 words withing a certain distance from each other.
current script:
SELECT clob_field
FROM clob_table
WHERE contains(clob_field,'NEAR (((QUICK),(FOX),(LAZY)),5)') > 0;
Want to use something like this if it will act like indexing:
SELECT clob_field
FROM clob_table
WHERE contains(dbms_lob.substr(clob_field,'near(((QUICK),(FOX),(LAZY)),5)')) > 0;
If not, I need to do indexing, but I don't quite understand how to use CTXCAT and CONTEXT (https://docs.oracle.com/cd/A91202_01/901_doc/text.901/a90122/ind4.htm). I also don't like what I read here that says that if one uses CTXCAT for indexing a clob you have to use CONTEXT, or something like that. It can't affect the other queries that are done on this field.
Thanks in advance!
Contains won't work unless it is globally indexed, so I had to index the field and then could get the original query working.
I have below query, because of the huge data in the MATTER Table, it is taking huge time for LIKE statement to execute, so I was thinking of using the CONTEXT Index and using CONTAIN.
Shall I do indexing only on Matter_title or some other column as well,. Based on the below select query
Inputs highly appreciated
SELECT DISTINCT dm.MATTER_SEQ
FROM MATTER dm
,MATTER_TYPE dmt
,MATTER_SUBTYPE dms
,STATUS ds
,FILING df
WHERE dm.MATTER_TYPE_SEQ=dmt.MATTER_TYPE_SEQ
AND dm.MATTER_SUBTYPE_SEQ=dms.MATTER_SUBTYPE_SEQ
AND dm.STATUS_CODE NOT IN ('abc','jkl','xyz')
AND dm.STATUS_CODE = DS.STATUS_CODE
AND dm.IS_EXTERNAL='1'
AND dm.IS_DELETED='0'
AND dm.MATTER_SEQ = df.MATTER_SEQ
AND trunc(dm.CREATED_DATE) between '01-NOV-95' AND '02-OCT-18'
AND upper(dm.MATTER_TITLE) like(upper (q'{%jdasuidhajsndjahs%}'))
It sounds like you're already aware that LIKE with a leading wildcard ('%ABC') is notoriously inefficient since it typically can't use indexes and does a full table scan.
If the other optimizing suggestions don't help much, you probably would see better performance with a Context index. Be sure to set the SUBSTRING_INDEX preference so it'll specifically prepare the index for infix searches like yours. See this Ask Tom for more details. (If you will also have wildcards in the middle of strings ('ABC%DEF'), you might also want to set the PREFIX options.)
begin
ctx_ddl.create_preference('SUBSTRING_PREF','BASIC_WORDLIST');
ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
end;
create index matter_title_idx on MATTER(MATTER_TITLE)
indextype is ctxsys.context
parameters ('wordlist SUBSTRING_PREF');
Also note that Context indexes are case-insensitive by default, so you don't need to do UPPER(). I haven't tried using q'' literals with contains, so I'm not sure how this'll work.
AND CONTAINS(dm.MATTER_TITLE, q'{%jdasuidhajsndjahs%}') > 0
Try creating function Indexes upper(dm.MATTER_TITLE) and second trunc(dm.CREATED_DATE).
Also I am considering that the columns in the Join conditions already have indexes. If not have them indexed.
I have two classes claim and index. i have a field in my claim class called topic which is a string. I m trying to index the topic column not using database index column features. But it should by coding the following method.
Suppose i have claim 1, for claim 1 topic field ("i love muffins muffins") i ll do the folowing treatment
#1. Create an empty Dictionary with "word"=>occurrences
#2. Create a List of the stopwords exemple stopwords = ("For","This".....etc )
#3. Create List of the delimiters exemple delimiter_chars = ",.;:!?"
#4. Split the Text(topic field) into words delimited by whitespace.
#5. Remove unwanted delimiter characters adjoining words.
#6. Remove stopwords.
#7. Remove Duplicate
#8. now i create multiple index object (word="love",occurences = 1,looked = 0,reference on claim 1),(word="muffins",occurences = 2,looked = 0,reference on claim 1),
now whenever i look the word muffins for exemple looked will increase by one and i will move the record up in my database. So my question is the following is this method good ? is it better than database index features ? is there someways to improve things ?
What I think you are looking for is something called a B-Tree. In your case, you would use a 26 (or 54 if you need case sensitivity) branch node in the tree. This will make finding objects very fast. I think the time is nlogn or something. In the node, you would have a pointer to the actual data in an array, list, file, or something else.
However, unless you are willing to put the time in to code something specific for your application, you might be better off using a database such as Oracle, Microsoft SQL Server, or MySQL because these are professionally developed and profiled to get the maximum performance possible.
I need to search over a DB table using some kind of fuzzy search like the one from oracle and using indexes since I do not want a table scan(there is a lot of data).
I want to ignore case, language special stuff(ñ, ß, ...) and special characters like _, (), -, etc...
Search for "maria (cool)" should get "maria- COOL" and "María_Cool" as matches.
Is that possible in Oracle in some way?
About the case, I think it can be solved created the index directly in lower case and searching always lower-cased. But I do not know how to solve the special chars stuff.
I thought about storing the data without special chars in a separated column and searching on that returning the real one, but I am not 100% sure where that is the perfect solution.
Any ideas?
Maybe UTL_MATCH can help.
But you can also create a function based index on, lets say, something like this:
regexp_replace(your_column, '[^0-9a-zA-Z]+', ' ')
And try to match like this:
...
WHERE regexp_replace(your_column, '[^0-9a-zA-Z]+', ' ') =
regexp_replace('maria (cool)' , '[^0-9a-zA-Z]+', ' ')
Here is a sqlfiddle demo It's not complete, but can be a start
I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here
http://76.12.87.164/cpd/testperformance.cfm
At that page you can also see the database tables and indexes, if any.
I'm not using Stored Procedures - the queries are inline using Coldfusion.
I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns.
Thanks
You need to create indexes on columns that appear in your WHERE clauses. There are a few exceptions to that rule:
If the column only has one or two unique values (the canonical example of this is "gender" - with only "Male" and "Female" the possible values, there is no point to an index here). Generally, you want an index that will be able to restrict the rows that need to be processed by a significant number (for example, an index that only reduces the search space by 50% is not worth it, but one that reduces it by 99% is).
If you are search for x LIKE '%something' then there is no point for an index. If you think of an index as specifying a particular order for rows, then sorting by x if you're searching for "%something" is useless: you're going to have to scan all rows anyway.
So let's take a look at the case where you're searching for "keyword 'accounting'". According to your result page, the SQL that this generates is:
SELECT
*
FROM (
SELECT TOP 10
ROW_NUMBER() OVER (ORDER BY sq.name) AS Row,
sq.*
FROM (
SELECT
c.*,
p.providername,
p.school,
p.website,
p.type
FROM
cpd_COURSES c, cpd_PROVIDERS p
WHERE
c.providerid = p.providerid AND
c.activatedYN = 'Y' AND
(
c.name like '%accounting%' OR
c.title like '%accounting%' OR
c.keywords like '%accounting%'
)
) sq
) AS temp
WHERE
Row >= 1 AND Row <= 10
In this case, I will assume that cpd_COURSES.providerid is a foreign key to cpd_PROVIDERS.providerid in which case you don't need an index, because it'll already have one.
Additionally, the activatedYN column is a T/F column and (according to my rule above about restricting the possible values by only 50%) a T/F column should not be indexed, either.
Finally, because searching with a x LIKE '%accounting%' query, you don't need an index on name, title or keywords either - because it would never be used.
So the main thing you need to do in this case is make sure that cpd_COURSES.providerid actually is a foreign key for cpd_PROVIDERS.providerid.
SQL Server Specific
Because you're using SQL Server, the Management Studio has a number of tools to help you decide where you need to put indexes. If you use the "Index Tuning Wizard" it is actually usually pretty good at tell you what will give you the good performance improvements. You just cut'n'paste your query into it, and it'll come back with recommendations for indexes to add.
You still need to be a little bit careful with the indexes that you add, because the more indexes you have, the slower INSERTs and UPDATEs will be. So sometimes you'll need to consolidate indexes, or just ignore them altogether if they don't give enough of a performance benefit. Some judgement is required.
Is this the real live database data? 52,000 records is a very small table, relatively speaking, for what SQL 2005 can deal with.
I wonder how much RAM is allocated to the SQL server, or what sort of disk the database is on. An IDE or even SATA hard disk can't give the same performance as a 15K RPM SAS disk, and it would be nice if there was sufficient RAM to cache the bulk of the frequently accessed data.
Having said all that, I feel the " (c.name like '%accounting%' OR c.title like '%accounting%' OR c.keywords like '%accounting%') " clause is problematic.
Could you create a separate Course_Keywords table, with two columns "courseid" and "keyword" (varchar(24) should be sufficient for the longest keyword?), with a composite clustered index on courseid+keyword
Then, to make the UI even more friendly, use AJAX to apply keyword validation & auto-completion when people type words into the keywords input field. This gives you the behind-the-scenes benefit of having an exact keyword to search for, removing the need for pattern-matching with the LIKE operator...
Using CF9? Try using Solr full text search instead of %xxx%?
You'll want to create indexes on the fields you search by. An index is a secondary list of your records presorted by the indexed fields.
Think of an old-fashioned printed yellow pages - if you want to look up a person by their last name, the phonebook is already sorted in that way - Last Name is the clustered index field. If you wanted to find phone numbers for people named Jennifer or the person with the phone number 867-5309, you'd have to search through every entry and it would take a long time. If there were an index in the back with all the phone numbers or first names listed in order along with the page in the phonebook that the person is listed, it would be a lot faster. These would be the unclustered indexes.
I would try changing your IN statements to an EXISTS query to see if you get better performance on the Zip code lookup. My experience is that IN statements work great for small lists but the larger they get, you get better performance out of EXISTS as the query engine will stop searching for a specific value the first instance it runs into.
<CFIF zipcodes is not "">
EXISTS (
SELECT zipcode
FROM cpd_CODES_ZIPCODES
WHERE zipcode = p.zipcode
AND 3963 * (ACOS((SIN(#getzipcodeinfo.latitude#/57.2958) * SIN(latitude/57.2958)) +
(COS(#getzipcodeinfo.latitude#/57.2958) * COS(latitude/57.2958) *
COS(longitude/57.2958 - #getzipcodeinfo.longitude#/57.2958)))) <= #radius#
)
</CFIF>