If I need to check a field in a table against an array of possible strings (using Postgres with the Sequel gem), what's the fastest way to do so? I've tried building a regular expression from the array with |'s inbetween each entry, and then using it to search the table using .where, but it's slow... and I'm hoping there may be a faster way.
If your field can be equal to one of multiple words, you can use :
SELECT * FROM table WHERE field IN ('apple', 'banana', 'carrot', 'dog', ....)
PostgreSQL will probably not use an index for a regular expression lookup with |s, which would explain the slow speed on large datasets. The IN operator is what you want to use, as it should be able to use an index lookup (assuming the appropriate index). In Sequel:
DB[:table].where(:field=>['string1', 'string2', 'string3'])
Related
I have a field tags which is a string formed by comma-separated strings.
When I try to get the data from the database I want them to be separated.
jsonObject(
.......
key("tags").value(splitPart(TABLENAME.TAGS, ",", 1))
.......
)
From the above, I can only get one value but I want the entire array.
What should I do to get the entire array after splitting?
Unfortunately, the PostgreSQL SPLIT_PART function doesn't allow for accessing the entire split array, only a "part". You probably want to use string_to_array instead? That will produce a PostgreSQL array, which can be converted to a JSON(B) using the array_to_json function.
jOOQ does not yet support the latter (see #12841), but as always, when jOOQ doesn't know a vendor-specific SQL feature, you can easily resort to using plain SQL templating.
How to filter all the records from a table where a column 'name' is like (in any order) all the values from an array of strings
You could use REGEXP
Model.where('name REGEXP ?', array_of_string.join('|'))
If you're using postgres DB please check the following answer since postgress has arrays.
You might want to do it at application level if your db is small.
If REGEXP is viable to you then the #khiav reoy answer is the best you could do.
Consider a collection of Users:
{ name: 'Jeff' }
{ name: 'Joel' }
Is there a way to efficiently get all the unique values for name?
User.pluck(:name).uniq
To return
[ 'Jeff', 'Joel' ]
I think this would get the whole collection, so it would be inefficient.
However, if there is an index on name, is there a way to get all the unique values without getting all the documents?
Or is there another way to efficiently get the unique names?
As indicated in the comments, you can efficiently get the unique values of a field over all docs in a collection using distinct.
The documentation specifically mentions that indexes are used when possible, and that they can cover the distinct query. This means that only the supporting index needs to be loaded into memory to get the results.
When possible, db.collection.distinct() operations can use indexes.
Indexes can also cover db.collection.distinct() operations. See
Covered Query for more information on queries covered by indexes.
In Ruby, you would perform your distinct query as:
User.distinct(:name)
I have a simple search store procedure in Oracle 11GR2 in a table with over 1.6 million records. I am puzzled by the fact that if I want to search for a work inside a column, such as "%boston%", it would take 12 seconds. I have an index on the name collumn.
select description from travel_websites where name like "%boston%";
If I only search for a word start with Boston like "boston%", it only takes 0.15 seconds.
select description from travel_websites where name like "boston%";
I added an index hint and try to force optimizer to use my index on the name column, it did not help either.
select description /*+ index name_idx */ from travel_websites where name like "%boston%";
Any advises would be greatly appreciated.
You cannot use an index range scan for a predicate that has a leading wildcard (i.e. like '%boston%'). This makes sense if you think about how an index is stored on disk-- if you don't know what the first character of the string you are searching is, you can't traverse the index to look for index entries that match that string. You may be able to do a full scan of the index where you read every leaf block and search the name there to see if it contains the string you want. But that requires a full scan of the index plus you then have to visit the table for every ROWID you get from the index in order to fetch any columns that are not part of the index that you just full-scanned. Depending on the relative size of the table and the index and how selective the predicate is, the optimizer may easily decide that it is quicker to just do a table scan if you're searching for a leading wildcard.
Oracle does support full text search but you have to use Oracle Text which would require that you build an Oracle Text index on the name column and use the CONTAINS operator to do the search rather than using a LIKE query. Oracle Text is very robust product so there are quite a few options to consider both in building the index, refreshing the index, and building the query depending on how sophisticated you want to get.
Your index hint is not correctly specified. Assuming there is an index on name, that the name of that index is name_idx, and that you want to force a full scan of the index (just to reiterate, a range scan on the index is not a valid option if there is a leading wildcard), you would need something like
select /*+ index(travel_websites name_idx) */ description
from travel_websites
where name like '%boston%'
There is no guarantee, however, that a full index scan is going to be any more efficient than a full table scan. And it is entirely possible that the optimizer is choosing the index full scan already without the hint (you don't specify what the query plans are for the three queries).
Oracle (and as far as I know most other databases) by default indexes strings so that the index can only be used to look up string matches from the start of the string. That means, a LIKE 'boston%' (startswith) will be able to use the index, while a LIKE '%boston' (endswith) or LIKE '%boston%' (contains) will not.
If you really need indexes that can find substrings fast, you can't use the regular index types for strings, but you can use TEXT indexes which sadly may require slightly different query syntax.
I need to query a table in an SQLite database to return all the rows in a table that match a given set of words.
To be more precise: I have a database with ~80,000 records in it. One of the fields is a text field with around 100-200 words per record. What I want to be able to do is take a list of 200 single word keywords {"apple", "orange", "pear", ... } and retrieve a set of all the records in the table that contain at least one of the keyword terms in the description column.
The immediately obvious way to do this is with something like this:
SELECT stuff FROM table
WHERE (description LIKE '% apple %') or (description LIKE '% orange %') or ...
If I have 200 terms, I end up with a big and nasty looking SQL statement that seems to me to be clumsy, smacks of bad practice, and not surprisingly takes a long time to process - more than a second per 1000 records.
This answer Better performance for SQLite Select Statement seemed close to what I need, and as a result I created an index, but according to http://www.sqlite.org/optoverview.html sqlite doesn't use any optimisations if the LIKE operator is used with a beginning % wildcard.
Not being an SQL expert, I am assuming I'm doing this the dumb way. I was wondering if someone with more experience could suggest a more sensible and perhaps more efficient way of doing this?
Alternatively, is there a better approach I could use to the problem?
Using the SQLite fulltext search would be faster than a LIKE '%...%' query. I don't think there's any database that can use an index for a query beginning with %, as if the database doesn't know what the query starts with then it can't use the index to look it up.
An alternative approach is putting the keywords in a separate table instead, and making an intermediate table that has the information about which row in your main table has which keywords. If you indexed all the relevant columns that way, it could be queried very quickly.
Sounds like you might want to have a look at Full Text Search. It was contributed to SQLite by someone from google. The description:
allows the user to efficiently query
the database for all rows that contain
one or more words (hereafter
"tokens"), even if the table contains
many large documents.
This is the same problem as full-text search, right? In which case, you need some help from the DB to construct indexes into these fields if you want to do this efficiently. A quick search for SQLite full text search yields this page.
The solution you correctly identify as clumsy is probably going to do up to 200 regular expression matches per document in the worst case (i.e. when a document doesn't match), where each match has to traverse the entire field. Using the index approach will mean that your search speed will be independent of the size of each document.