Full text search in H2 problem with like clause? - full-text-search

I have created Full-Text search index for my H2 database.
and i can execute following queries like
stat.executeQuery("SELECT * FROM FT_SEARCH_DATA('abc', 0, 0)")
(refer to this example)
and this returns the query to find the record with abc string.
How can I make the query search the word in between? I want to use the like clause (like %abc%) in the query.

You can use regular expressions in h2 database SELECT. I think you will be interested in something like:
SELECT * FROM db.table WHERE name regexp '.*abc.*'

The H2 Database Native Fulltext Search feature allows searches for key words in context. You can examine the words in your index using this query:
SELECT * FROM FT.WORDS;
Using Apache Lucene may be better suited to wild-card searches.

Related

How to search an array element in SurrealDB

I'm trying to perform a query that evaluates if an array includes or contains a specif value or set of values.
SELECT * FROM table
WHERE data = ['value_a']
https://surrealdb.com/docs/surrealql/statements/select
I've read the documentation and try several queries, but I didn't find any function or way to create this query.
I've read the official documentation and performed several queries based on the examples, but nothing have worked.
https://surrealdb.com/docs/surrealql/statements/select
My expected behaviour is:
Matches a specific value or set of values like these examples in SQL relational database.
https://www.w3schools.com/sql/sql_in.asp
There is a CONTAINS operator :
https://surrealdb.com/docs/surrealql/operators#contains
The Query should be like:
SELECT * FROM table WHERE data CONTAINS 'value_a'

faster search for a substring through large document

I have a csv file of more than 1M records written in English + another language. I have to make a UI that gets a keyword, search through the document, and returns record where that key appears. I look for the key in two columns only.
Here is how I implemented it:
First, I made a postgres database for the data stored in the CSV file. Then made a classic website where the user can enter a keyword. This is the SQL query that I use(In spring boot)
SELECT * FROM table WHERE col1 LIKE %:keyword% OR col2 LIKE %:keyword%;
Right now, it is working perfectly fine, but I was wondering how to make search faster? was using SQL instead of classic document search better?
If the document is only searched once and thrown away, then it's overhead to load into a database. Instead can search the file directly using the nio parallel search feature which uses multiple threads to concurrently search the file:
List<Record> result = Files.lines("some/path")
.parallel()
.unordered()
.map(l -> lineToRecord(l))
.filter(r -> r.getCol1().contains(keyword) || r.getCol2().contains(keyword))
.collect(Collectors.toList());
NOTE: need to provide the lineToRecord() method and the Record class.
If the document is going to be searched over and over again, then can think about indexing the document. This means pre-processing the document to suit the search requirements. In this case it's keywords of col1 and col2. An index is like a map in java, eg:
Map<String, Record> col1Index
But since you have the "LIKE" semantics, this is not so easy to do as it's not as simple as splitting the string by white space since the keyword could match a substring. So in this case it might be best to look for some tool to help. Typically this would be something like solr/lucene.
Databases can also provide similar functionality eg: https://www.postgresql.org/docs/current/pgtrgm.html
For LIKE queries, you should look at the pg_trgm index type with the gin_trgm_ops operator class. You shouldn't need to change query at all, just build the index on each column. Or maybe one multi-column index.

What is the ElasticSearch equivalent for an SQL subquery?

Can you nest queries logically in ElasticSearch, so the output of one query is the input to another query.
Another, way to ask is how can I chain or pipe queries together?
This should be analogous to the IN operator or subqueries in SQL
i.e.:-
select au_lname, au_fname, title from
(select au_lname, au_fname, au_id from pubs.dbo.authors
where state = 'CA')
or
SELECT Name
FROM AdventureWorks2008R2.Production.Product
WHERE ListPrice =
(SELECT ListPrice
FROM AdventureWorks2008R2.Production.Product
WHERE Name = 'Chainring Bolts' );
Elasticsearch doesn't support subqueries; you would need to perform your first query, then construct a second query using the results of the first query as an input.
this is not supported in elastic-search you must normalize your data and have all field you need in one setting
That is totally correct, you must programm a subquery in your favorite programming language. An example can be found here:
https://sebastianviereck.de/elasticsearch-subquery-scoring-optimization/

Oracle string search performance issue

I have a simple search store procedure in Oracle 11GR2 in a table with over 1.6 million records. I am puzzled by the fact that if I want to search for a work inside a column, such as "%boston%", it would take 12 seconds. I have an index on the name collumn.
select description from travel_websites where name like "%boston%";
If I only search for a word start with Boston like "boston%", it only takes 0.15 seconds.
select description from travel_websites where name like "boston%";
I added an index hint and try to force optimizer to use my index on the name column, it did not help either.
select description /*+ index name_idx */ from travel_websites where name like "%boston%";
Any advises would be greatly appreciated.
You cannot use an index range scan for a predicate that has a leading wildcard (i.e. like '%boston%'). This makes sense if you think about how an index is stored on disk-- if you don't know what the first character of the string you are searching is, you can't traverse the index to look for index entries that match that string. You may be able to do a full scan of the index where you read every leaf block and search the name there to see if it contains the string you want. But that requires a full scan of the index plus you then have to visit the table for every ROWID you get from the index in order to fetch any columns that are not part of the index that you just full-scanned. Depending on the relative size of the table and the index and how selective the predicate is, the optimizer may easily decide that it is quicker to just do a table scan if you're searching for a leading wildcard.
Oracle does support full text search but you have to use Oracle Text which would require that you build an Oracle Text index on the name column and use the CONTAINS operator to do the search rather than using a LIKE query. Oracle Text is very robust product so there are quite a few options to consider both in building the index, refreshing the index, and building the query depending on how sophisticated you want to get.
Your index hint is not correctly specified. Assuming there is an index on name, that the name of that index is name_idx, and that you want to force a full scan of the index (just to reiterate, a range scan on the index is not a valid option if there is a leading wildcard), you would need something like
select /*+ index(travel_websites name_idx) */ description
from travel_websites
where name like '%boston%'
There is no guarantee, however, that a full index scan is going to be any more efficient than a full table scan. And it is entirely possible that the optimizer is choosing the index full scan already without the hint (you don't specify what the query plans are for the three queries).
Oracle (and as far as I know most other databases) by default indexes strings so that the index can only be used to look up string matches from the start of the string. That means, a LIKE 'boston%' (startswith) will be able to use the index, while a LIKE '%boston' (endswith) or LIKE '%boston%' (contains) will not.
If you really need indexes that can find substrings fast, you can't use the regular index types for strings, but you can use TEXT indexes which sadly may require slightly different query syntax.

SQLite - how to return rows containing a text field that contains one or more strings?

I need to query a table in an SQLite database to return all the rows in a table that match a given set of words.
To be more precise: I have a database with ~80,000 records in it. One of the fields is a text field with around 100-200 words per record. What I want to be able to do is take a list of 200 single word keywords {"apple", "orange", "pear", ... } and retrieve a set of all the records in the table that contain at least one of the keyword terms in the description column.
The immediately obvious way to do this is with something like this:
SELECT stuff FROM table
WHERE (description LIKE '% apple %') or (description LIKE '% orange %') or ...
If I have 200 terms, I end up with a big and nasty looking SQL statement that seems to me to be clumsy, smacks of bad practice, and not surprisingly takes a long time to process - more than a second per 1000 records.
This answer Better performance for SQLite Select Statement seemed close to what I need, and as a result I created an index, but according to http://www.sqlite.org/optoverview.html sqlite doesn't use any optimisations if the LIKE operator is used with a beginning % wildcard.
Not being an SQL expert, I am assuming I'm doing this the dumb way. I was wondering if someone with more experience could suggest a more sensible and perhaps more efficient way of doing this?
Alternatively, is there a better approach I could use to the problem?
Using the SQLite fulltext search would be faster than a LIKE '%...%' query. I don't think there's any database that can use an index for a query beginning with %, as if the database doesn't know what the query starts with then it can't use the index to look it up.
An alternative approach is putting the keywords in a separate table instead, and making an intermediate table that has the information about which row in your main table has which keywords. If you indexed all the relevant columns that way, it could be queried very quickly.
Sounds like you might want to have a look at Full Text Search. It was contributed to SQLite by someone from google. The description:
allows the user to efficiently query
the database for all rows that contain
one or more words (hereafter
"tokens"), even if the table contains
many large documents.
This is the same problem as full-text search, right? In which case, you need some help from the DB to construct indexes into these fields if you want to do this efficiently. A quick search for SQLite full text search yields this page.
The solution you correctly identify as clumsy is probably going to do up to 200 regular expression matches per document in the worst case (i.e. when a document doesn't match), where each match has to traverse the entire field. Using the index approach will mean that your search speed will be independent of the size of each document.

Resources