Query a table and only match rows where a field matches "STRING" - rethinkdb

During prototyping I have imported a bunch of Facebook posts into a table in batches. After the first batch I did a bulk update to convert the "created_date" column from string to a native timestamp (using the handy r.ISO8601 function):
r.db('db').table('table').update({'created_date': r.ISO8601(r.row('created_date'))
On the second pass, when I try to repeat this update, the server throws an error because not all row fields are of type STRING (ie the ones previously converted), which is what ISO861 expects.
I've already tried to filter on r.row('created_date').typeOf() == "STRING" but got no matches. I can't find any other way to refer to the STRING type as an object rather than a literal string either.
I know that I could import these out and do the if/else logic in code, but I'm interested to understand if there's a native query that will filter out rows that match a certain type.

You have to use eq for comparing like this:
r.row('created_date').typeOf().eq("STRING")
Using == only works on some language support operator overrding.

Related

faster search for a substring through large document

I have a csv file of more than 1M records written in English + another language. I have to make a UI that gets a keyword, search through the document, and returns record where that key appears. I look for the key in two columns only.
Here is how I implemented it:
First, I made a postgres database for the data stored in the CSV file. Then made a classic website where the user can enter a keyword. This is the SQL query that I use(In spring boot)
SELECT * FROM table WHERE col1 LIKE %:keyword% OR col2 LIKE %:keyword%;
Right now, it is working perfectly fine, but I was wondering how to make search faster? was using SQL instead of classic document search better?
If the document is only searched once and thrown away, then it's overhead to load into a database. Instead can search the file directly using the nio parallel search feature which uses multiple threads to concurrently search the file:
List<Record> result = Files.lines("some/path")
.parallel()
.unordered()
.map(l -> lineToRecord(l))
.filter(r -> r.getCol1().contains(keyword) || r.getCol2().contains(keyword))
.collect(Collectors.toList());
NOTE: need to provide the lineToRecord() method and the Record class.
If the document is going to be searched over and over again, then can think about indexing the document. This means pre-processing the document to suit the search requirements. In this case it's keywords of col1 and col2. An index is like a map in java, eg:
Map<String, Record> col1Index
But since you have the "LIKE" semantics, this is not so easy to do as it's not as simple as splitting the string by white space since the keyword could match a substring. So in this case it might be best to look for some tool to help. Typically this would be something like solr/lucene.
Databases can also provide similar functionality eg: https://www.postgresql.org/docs/current/pgtrgm.html
For LIKE queries, you should look at the pg_trgm index type with the gin_trgm_ops operator class. You shouldn't need to change query at all, just build the index on each column. Or maybe one multi-column index.

How to construct subquery in the form of SELECT * FROM (<subquery>) ORDER BY column;?

I am using gorm to interact with a postgres database. I'm trying to ORDER BY a query that uses DISTINCT ON and this question documents how it's not that easy to do that. So I need to end up with a query in the form of
SELECT * FROM (<subquery>) ORDER BY column;
At first glance it looks like I need to use db.QueryExpr() to turn the query I have into an expression and build another query around it. However it doesn't seem gorm has an easy way to directly specify the FROM clause. I tried using db.Model(expr) or db.Table(fmt.Sprint(expr)) but Model seems to be completely ignored and fmt.Sprint(expr) doesn't return exactly what I thought. Expressions contain a few private variables. If I could turn the original query into a completely parsed string then I could use db.Table(query) but I'm not sure if I can generate the query as a string without running it.
If I have a fully built gorm query, how can I wrap it in another query to do the ORDER BY I'm trying to do?
If you want to write raw SQL (including one that has a SQL subquery) that will be executed and the results added to an object using gorm, you can use the .Raw() and .Scan() methods:
query := `
SELECT sub.*
FROM (<subquery>) sub
ORDER BY sub.column;`
db.Raw(query).Scan(&result)
You pass a pointer reference to an object to .Scan() that is structured like the resulting rows, very similarly to how you would use .First(). .Raw() can also have data added to the query using ? in the query and adding the values as comma separated inputs to the function:
query := `
SELECT sub.*
FROM (<subquery>) sub
WHERE
sub.column1 = ?
AND sub.column2 = ?
ORDER BY sub.column;`
db.Raw(query, val1, val2).Scan(&result)
For more information on how to use the SQL builder, .Raw(), and .Scan() take a look at the examples in the documentation: http://gorm.io/advanced.html#sql-builder

How to filter and delete a record that has a specific attribute

Im trying filter records with that has a specific value key and delete them. I tried "withFields" and "hasFields" but seems that i can't apply delete to them. My question is how can i do that?
r.db('databaseFoo').table('checkpoints').filter(function (user) {
return user('type').default(false);
}).delete();
If you want all documents that have a type key, you can use hasFields for that.
r.db('databaseFoo').table('checkpoints')
.hasFields('type')
In your current query, what you are doing is getting all documents that don't have a type key or where the value for type is equal to false. This might be what you want, but it's a little confusing if you only want documents that have a type property.
Keeping a reference to the original document
The problem with using hasFields is that it converts a selection (a sequence with a reference to the specific rows in the database) that you can update, and delete into a sequence, with which you can't do that. This is a known issue in RethinkDB. You can read this blog post to understand the different types in ReQL a bit better.
In order to get around this, you can use the hasFields method with the filter method.
r.db('databaseFoo').table('checkpoints')
.filter(r.row.hasFields('type'))
.delete()
This query will work since it returns a selection which can then be passed into delete.
If you want to get all records with with a specific value at a specific key, you can do so a couple of different ways. To get all documents where the property type is equal to false, you can do as follows:
r.db('databaseFoo').table('checkpoints')
.filter({ type: false })
or, you can do:
r.db('databaseFoo').table('checkpoints')
.filter(r.row('type').eq(false))

LINQ value.Contains function error

i am facing an error while using contains function of LINQ query
following error occured
Contains is not supported, doing a substring match over a text field is a very
slow operation, and is not allowed using the Linq API.
The recommended method is to use full text search (mark the field as Analyzed and
use the Search() method to query it.
here is my query
query = from u in Session.Query<Article>() where u.Tags.Contains(tags) orderby u.CreationDate descending select
StartWith/EndsWith works fine but it is not full filling my requirements
As the error states, Contains won't work and you need to use Analyzed fields. You can start here: http://ravendb.net/docs/client-api/querying/static-indexes/configuring-index-options

Query is not understandable - using field Fulltext search [Tags] = "foo"

I have a problem that only happens rarely with FT search. but once it happens it stays. I use the following search term in the FT search box in Lotus Notes
[Tags] = "foo"
in most application this search term work fine. but for some applications this search term gives the error "query is not understandable".
It does not matter if I replace the value, e.g [Tags] = "boo" produce the same result. and also FIELD Tags = "boo". for the record [Tag] = "foo" works fine so it seem be issues with the field or field name.
The problem only happens in some applications. Once this problem start happening no views can be searched using that search query and I get the error message everytime I search.
It does not help to remove, compact and re-create the FT index.
I get the same error in xpages when using the same search query in a view data source.
I have seen this problem using other fieldnames as well in other application.
If I remove the FT index the search query works
Creating a new copy of the "broken" database does not resolve the problem
I tried to have only one document in database, create a new FT index. the document in view does not have the field "Tags" still not working. (there are other forms in db with the fieldname "Tags")
This is a real show stopper for me as I have built some of my XPages based on search values from specific fields
In my own invstigation of this problem I think it has to do with some sort of bug in the FT index. There seem to be some data contained in documents or forms that causes the FT index to not work correctly.
I am looking for a solution to this problem as I have not found a way to repair it once it has become broken.
Update:
It does not help to follow this procedure
https://www-304.ibm.com/support/docview.wss?uid=swg21261002
Here is my debug info
[1078:0002-2250] IN FTGSearch
[1078:0002-2250] option = 0x400219
[1078:0002-2250] Query: ( FIELD Tags = "foo")
[1078:0002-2250] OUT FTGSearch error = F09
[1078:0002-2250] FTGSearch: found=0, returned=0, start=0, count=0, limit=0
It sounds like you need to fix the UNK table with a compact. Here is the listing of compact options, use a copy style not in place.
http://www-01.ibm.com/support/docview.wss?uid=swg21084388
If Tags field is sometimes numeric, I would advise looking at the database design. The UNK table is a table of all fields in the NSF. The first time a field name is used, it is stored in the UNK table as that data type. Full text searching uses that data type and only that data type. If you have a field Tags on more than one form in a database, once numeric and once text, you're in for big trouble with full text searches. The datatype in searches will depend on which datatype the field was on the first document saved which had that field. Even if you delete all documents that have it as numeric, you won't change the UNK table without the compact. Sounds like that's what you have here. Ensure the database never stores Tags as numeric. Delete or change all docs with it stored numeric. Then compact.
Thank you all for answering. I learned a whole lot about UNK tables and FT index today.
The problem was that I had a numeric field called "Tags" in a form that I hadn't looked at and really didn't think that it would contain a field by that name.
after using the DDE search I found all instances of the tags field and could eaily locate the problem form. I renamed the field in the form, removed the FT indx , used compact -c and recreated the ft index. now everythig is working fine.
One other thing to notice is that I have several databases with the same design but only a few of them had the ft index problem, the reason for this is probably because some of these databases was created after the form with the faulty Tags field was created.
I am so happy to have solved this.
lessons learned
If you plan to use fulltext index in your application. make sure you do not have the same field name in different forms and use different field types.
from now on I will probably use shared fields more :-)
One more thing we discovered
You actually do not need notes peek to find out which field tpe is stored in the UNK table. you can use the "Fields" button in the searchbar. if you select the field and the right hand box displays "contains" you know the unk table has a text field type set.

Resources