Query is not understandable - using field Fulltext search [Tags] = "foo" - view

I have a problem that only happens rarely with FT search. but once it happens it stays. I use the following search term in the FT search box in Lotus Notes
[Tags] = "foo"
in most application this search term work fine. but for some applications this search term gives the error "query is not understandable".
It does not matter if I replace the value, e.g [Tags] = "boo" produce the same result. and also FIELD Tags = "boo". for the record [Tag] = "foo" works fine so it seem be issues with the field or field name.
The problem only happens in some applications. Once this problem start happening no views can be searched using that search query and I get the error message everytime I search.
It does not help to remove, compact and re-create the FT index.
I get the same error in xpages when using the same search query in a view data source.
I have seen this problem using other fieldnames as well in other application.
If I remove the FT index the search query works
Creating a new copy of the "broken" database does not resolve the problem
I tried to have only one document in database, create a new FT index. the document in view does not have the field "Tags" still not working. (there are other forms in db with the fieldname "Tags")
This is a real show stopper for me as I have built some of my XPages based on search values from specific fields
In my own invstigation of this problem I think it has to do with some sort of bug in the FT index. There seem to be some data contained in documents or forms that causes the FT index to not work correctly.
I am looking for a solution to this problem as I have not found a way to repair it once it has become broken.
Update:
It does not help to follow this procedure
https://www-304.ibm.com/support/docview.wss?uid=swg21261002
Here is my debug info
[1078:0002-2250] IN FTGSearch
[1078:0002-2250] option = 0x400219
[1078:0002-2250] Query: ( FIELD Tags = "foo")
[1078:0002-2250] OUT FTGSearch error = F09
[1078:0002-2250] FTGSearch: found=0, returned=0, start=0, count=0, limit=0

It sounds like you need to fix the UNK table with a compact. Here is the listing of compact options, use a copy style not in place.
http://www-01.ibm.com/support/docview.wss?uid=swg21084388

If Tags field is sometimes numeric, I would advise looking at the database design. The UNK table is a table of all fields in the NSF. The first time a field name is used, it is stored in the UNK table as that data type. Full text searching uses that data type and only that data type. If you have a field Tags on more than one form in a database, once numeric and once text, you're in for big trouble with full text searches. The datatype in searches will depend on which datatype the field was on the first document saved which had that field. Even if you delete all documents that have it as numeric, you won't change the UNK table without the compact. Sounds like that's what you have here. Ensure the database never stores Tags as numeric. Delete or change all docs with it stored numeric. Then compact.

Thank you all for answering. I learned a whole lot about UNK tables and FT index today.
The problem was that I had a numeric field called "Tags" in a form that I hadn't looked at and really didn't think that it would contain a field by that name.
after using the DDE search I found all instances of the tags field and could eaily locate the problem form. I renamed the field in the form, removed the FT indx , used compact -c and recreated the ft index. now everythig is working fine.
One other thing to notice is that I have several databases with the same design but only a few of them had the ft index problem, the reason for this is probably because some of these databases was created after the form with the faulty Tags field was created.
I am so happy to have solved this.
lessons learned
If you plan to use fulltext index in your application. make sure you do not have the same field name in different forms and use different field types.
from now on I will probably use shared fields more :-)
One more thing we discovered
You actually do not need notes peek to find out which field tpe is stored in the UNK table. you can use the "Fields" button in the searchbar. if you select the field and the right hand box displays "contains" you know the unk table has a text field type set.

Related

Control which multi fields are queried by default

I have a preexisting index that contains field mappings and is currently being queried by many applications. I would like to add additional ways for the data to be queried, specifically, support full text search via analysis. Multi-fields seemed like the obvious way to do this, but I found that adding new multi-fields actually changes the existing query behavior.
For example, I have an "id" field that is a keyword. Applications are already using this field to query on. After I add a new multi-field, like "txt" (using the standard analyzer), new documents can be found by querying with just a partial value match. Values for "id" look like this: "123-abc" so now a query with just "abc" will match when querying against the "id" field. This is not how it worked previously (the keyword only field would require the entire value "123-abc").
Ideally, the top-level "id" field would be keyword only, and if a "full text" search was required, the query would need to specify "id.txt". So my question is... is there a way to disable multi-fields and require that the query explicitly set a sub field when needed?
My only other thought on how to solve this, was to use copy_to so that these fields are completely distinct... but that is a bit more work and there are many many fields to deal with that would require this.

faster search for a substring through large document

I have a csv file of more than 1M records written in English + another language. I have to make a UI that gets a keyword, search through the document, and returns record where that key appears. I look for the key in two columns only.
Here is how I implemented it:
First, I made a postgres database for the data stored in the CSV file. Then made a classic website where the user can enter a keyword. This is the SQL query that I use(In spring boot)
SELECT * FROM table WHERE col1 LIKE %:keyword% OR col2 LIKE %:keyword%;
Right now, it is working perfectly fine, but I was wondering how to make search faster? was using SQL instead of classic document search better?
If the document is only searched once and thrown away, then it's overhead to load into a database. Instead can search the file directly using the nio parallel search feature which uses multiple threads to concurrently search the file:
List<Record> result = Files.lines("some/path")
.parallel()
.unordered()
.map(l -> lineToRecord(l))
.filter(r -> r.getCol1().contains(keyword) || r.getCol2().contains(keyword))
.collect(Collectors.toList());
NOTE: need to provide the lineToRecord() method and the Record class.
If the document is going to be searched over and over again, then can think about indexing the document. This means pre-processing the document to suit the search requirements. In this case it's keywords of col1 and col2. An index is like a map in java, eg:
Map<String, Record> col1Index
But since you have the "LIKE" semantics, this is not so easy to do as it's not as simple as splitting the string by white space since the keyword could match a substring. So in this case it might be best to look for some tool to help. Typically this would be something like solr/lucene.
Databases can also provide similar functionality eg: https://www.postgresql.org/docs/current/pgtrgm.html
For LIKE queries, you should look at the pg_trgm index type with the gin_trgm_ops operator class. You shouldn't need to change query at all, just build the index on each column. Or maybe one multi-column index.

ES custom dynamic mapping field name change

I have a use case which is a bit similar to the ES example of dynamic_template where I want certain strings to be analyzed and certain not.
My document fields don't have such a convention and the decision is made based on an external schema. So currently my flow is:
I grab the inputs document from the DB
I grab the approrpiate schema (same database, currently using logstash for import)
I adjust the name in the document accordingly (using logstash's ruby mutator):
if not analyzed I don't change the name
if analyzed I change it to ORIGINALNAME_analyzed
This will handle the analyzed/not_analyzed problem thanks to dynamic_template I set but now the user doesn't know which fields are analyzed so there's no easy way for him to write queries because he doesn't know what's the name of the field.
I wanted to use field name aliases but apparently ES doesn't support them. Are there any other mechanisms I'm missing I could use here like field rename after indexation or something else?
For example this ancient thread mentions that field.sub.name can be queried as just name but I'm guessing this has changed when they disallowed . in the name some time ago since I cannot get it to work?
Let the user only create queries with the original name. I believe you have some code that converts this user query to Elasticsearch query. When converting to Elasticsearch query, instead of using the field name provided by the user alone use both the field names ORIGINALNAME as well as ORIGINALNAME_analyzed. If you are using a match query, convert it to multi_match. If you are using a term query, convert it to a bool should query. I guess you get where I am going with this.
Elasticsearch won't mind if a field does not exists. This can be a problem if there is already a field with _analyzed appended in its original name. But with some tricks that can be fixed too.

Elasticsearch not searching some fields

I have just updated a website, the update adds new fields to elasticsearch.
In my dev environment, it all works fine. but on the live site, the new fields are not being found.
Eg. I have added a new field with the value : 1
However, when adding a filtered query of
{"field":1}
It does not find any matching results.
When I look in the documents, I can see docs with the field set to 1
Would the reason for this be that the new field was added after the mappings was set? I am not all that familiar with elasticsearch, So I am not really sure where to start looking to fix it.
Any help would be appreciated.
Update:
querying from URL shows nothing either
_search/?pretty=true&size=50&q=field1:*
however there is another field that was added at the same time which I can search on.
I can see field1 in the result set but it just wont allow me to search on it.
Only difference i see in the mapping is that the one that is working is set to type:long whereas the one not working is set as type:string
Is it a length issue on the ngram? what was your "min_gram" settings?
When you check on your index settings like this:
GET <host>/<index_name>/_settings
Does it work when you filter for a two digit field?
Are all the field values one digit?
It's OK to add a field after the mapping was set. ElasticSearch will guess the mapping for you. (in fact, it's one of their selling features --- no need to define the mapping, just throw the data at it)
There are a few things that can go wrong:
Verify that data is actually in the index. To do that, just navigate to the _search url with no parameters, you should see the field if it is indexed.
Look at your mapping. Could it be that the field is explicitly set not to be indexed?
Another possibility is that your query is wrong (but that is unlikely, since you're saying it works in the development environment)

What indexes are created when indexing a document in elasticsearch

If I create a first document of it's type, or put a mapping, is an index created for each field?
Obviously if i set "index" to "analyzed" or "not analyzed" the field is indexed.
Is there a way to store a field so it can be retrieved but never searched by? I imagine this will save a lot of space? If I set this to "no" will this save space?
Will I still be able to search by this, just take more time, or will this be totally unsearchable?
Is there a way to make a field indexed after some documents are inserted and I change my mind?
For example, I might have a mapping:
{
"book":{"properties":{
"title":{"type":"string", "index":"not_analyzed"},
"shelf":{"type":"long","index":"no"}
}}}
so I want to be able to search by title, but also retrieve the shelf the book is on
index:no will indeed not create an index for that field, so that saves some space. Once you've done that you can't search for that particular field anymore.
Perhaps also useful in this context is to know aboutthe _source field, which is returned by default and includes all fields you've stored. http://www.elasticsearch.org/guide/reference/mapping/source-field/
As to your second question:
you can't change your mind halfway. When you want to index a particular field later on you have to reindex the documents.
That's why you may want to reconsider setting index:no, etc. In fact a good strategy to begin is to don't define a schema for fields at all, unless you're 100% sure you need a non-default analyzer for a particular field for instance. Otherwise ES will use generally usable defaults.

Resources