Doctrine Searchable with non-ASCII characters - doctrine

I have text in Turkish language: "selam günaydın". Doctrine
searchable converts it to keywords in table:
-selam
-guenaydin
So "guenaydin" was saved in table as keyword "günaydın" so when
somebody writes in search "günaydın" he gets nothing - what can I
do?

Solution:
You should first use the analyze() method of the Doctrine_Search_Analyzer_Standard class, in order to convert the incoming search string to the format of the index table.

Related

Can I use stopwords, case insensitivty and removing punctuation in Elastic search Keyword data type?

Actually, as we know for keyword data type we have to use normalizer, but I am gettting an error to use stopword in normalizer. Is there any other way to add stopwords in normalizer for keyword data type?
It is working for lowercase but not for stop word.
Can any one help me out?
I am just want track or any sample code...
Or I have to convert keyword data type into text?

Query a table and only match rows where a field matches "STRING"

During prototyping I have imported a bunch of Facebook posts into a table in batches. After the first batch I did a bulk update to convert the "created_date" column from string to a native timestamp (using the handy r.ISO8601 function):
r.db('db').table('table').update({'created_date': r.ISO8601(r.row('created_date'))
On the second pass, when I try to repeat this update, the server throws an error because not all row fields are of type STRING (ie the ones previously converted), which is what ISO861 expects.
I've already tried to filter on r.row('created_date').typeOf() == "STRING" but got no matches. I can't find any other way to refer to the STRING type as an object rather than a literal string either.
I know that I could import these out and do the if/else logic in code, but I'm interested to understand if there's a native query that will filter out rows that match a certain type.
You have to use eq for comparing like this:
r.row('created_date').typeOf().eq("STRING")
Using == only works on some language support operator overrding.

Oracle Contains Function Returning False BLOB Positives

I'm using the Contains function to search for strings in BLOB fields containing PDFs or Word documents. Recently I did the following search:
SELECT doc_id
FROM table_of_documents
WHERE CONTAINS (BLOB_FIELD, 'SDS.IF.00005') > 0
Most of the records returned were correct, but a few had PDFs in them that did not have "SDS.IF.00005" in them but did have "SDS.EL.00005" in them.
When I say the PDFs did not have the search term, I mean I opened them in Adobe reader and searched them using the search function and my own eyeballs, and also people extremely familiar with the documents insist that the term is not there and should not be there.
I tried treating the dots as escape characters: SDS\\.IF\\.00005 and {SDS.IF.00005}. However, I am still getting the same results.
I also tried setting CONTAINS (BLOB_FIELD, 'SDS.IF.00005') = 100, but I'm still getting documents with SDS.EL.00005 in them and not SDS.IF.00005.
Do the dots in the search term mean something like SDS.%.00005 to Oracle? Or should I be researching how to find deep hidden text in Adobe documents that's not visible to the naked eye or to the Adobe text search function?
Thanks for your help.
As far as I know, CONTAINS is a Oracle Text function that performs full text search, so Oracle is tokenizing your string, probably according to its BASIC_LEXER. This lexer uses . as a word separator. So Oracle understands your query as "return anything that matches at least one of the words 'SDS', 'IF' or '00005'". As your PDF will probably have been indexed using that same lexer, from Oracle Text point of view your PDF contains the words 'SDS', 'EL' and '00005', so it matches 2 of 3 words and so Oracle returns that row.
Actually, 'IF' is included in Oracle Text default stopword list (words that are ignored because they are so common that they mostly introduce "noise"); so your query actually is "return anything that matches at least one of 'SDS' or '00005'". Therefore I am not surprised that a PDF that contains the literal text "SDS.EL.00005" will give you CONTAINS(BLOB_FIELD, 'SDS.IF.00005') = 100 (a "perfect" match) as you wrote.
If you want to search for a verbatim string, I think you should rather not use Oracle Text and just implement a solution using plain old DBMS_LOB.INSTR. If that is not viable, then you will have to find a way to make Oracle Text index those strings without tokenizing them.

ES custom dynamic mapping field name change

I have a use case which is a bit similar to the ES example of dynamic_template where I want certain strings to be analyzed and certain not.
My document fields don't have such a convention and the decision is made based on an external schema. So currently my flow is:
I grab the inputs document from the DB
I grab the approrpiate schema (same database, currently using logstash for import)
I adjust the name in the document accordingly (using logstash's ruby mutator):
if not analyzed I don't change the name
if analyzed I change it to ORIGINALNAME_analyzed
This will handle the analyzed/not_analyzed problem thanks to dynamic_template I set but now the user doesn't know which fields are analyzed so there's no easy way for him to write queries because he doesn't know what's the name of the field.
I wanted to use field name aliases but apparently ES doesn't support them. Are there any other mechanisms I'm missing I could use here like field rename after indexation or something else?
For example this ancient thread mentions that field.sub.name can be queried as just name but I'm guessing this has changed when they disallowed . in the name some time ago since I cannot get it to work?
Let the user only create queries with the original name. I believe you have some code that converts this user query to Elasticsearch query. When converting to Elasticsearch query, instead of using the field name provided by the user alone use both the field names ORIGINALNAME as well as ORIGINALNAME_analyzed. If you are using a match query, convert it to multi_match. If you are using a term query, convert it to a bool should query. I guess you get where I am going with this.
Elasticsearch won't mind if a field does not exists. This can be a problem if there is already a field with _analyzed appended in its original name. But with some tricks that can be fixed too.

Query is not understandable - using field Fulltext search [Tags] = "foo"

I have a problem that only happens rarely with FT search. but once it happens it stays. I use the following search term in the FT search box in Lotus Notes
[Tags] = "foo"
in most application this search term work fine. but for some applications this search term gives the error "query is not understandable".
It does not matter if I replace the value, e.g [Tags] = "boo" produce the same result. and also FIELD Tags = "boo". for the record [Tag] = "foo" works fine so it seem be issues with the field or field name.
The problem only happens in some applications. Once this problem start happening no views can be searched using that search query and I get the error message everytime I search.
It does not help to remove, compact and re-create the FT index.
I get the same error in xpages when using the same search query in a view data source.
I have seen this problem using other fieldnames as well in other application.
If I remove the FT index the search query works
Creating a new copy of the "broken" database does not resolve the problem
I tried to have only one document in database, create a new FT index. the document in view does not have the field "Tags" still not working. (there are other forms in db with the fieldname "Tags")
This is a real show stopper for me as I have built some of my XPages based on search values from specific fields
In my own invstigation of this problem I think it has to do with some sort of bug in the FT index. There seem to be some data contained in documents or forms that causes the FT index to not work correctly.
I am looking for a solution to this problem as I have not found a way to repair it once it has become broken.
Update:
It does not help to follow this procedure
https://www-304.ibm.com/support/docview.wss?uid=swg21261002
Here is my debug info
[1078:0002-2250] IN FTGSearch
[1078:0002-2250] option = 0x400219
[1078:0002-2250] Query: ( FIELD Tags = "foo")
[1078:0002-2250] OUT FTGSearch error = F09
[1078:0002-2250] FTGSearch: found=0, returned=0, start=0, count=0, limit=0
It sounds like you need to fix the UNK table with a compact. Here is the listing of compact options, use a copy style not in place.
http://www-01.ibm.com/support/docview.wss?uid=swg21084388
If Tags field is sometimes numeric, I would advise looking at the database design. The UNK table is a table of all fields in the NSF. The first time a field name is used, it is stored in the UNK table as that data type. Full text searching uses that data type and only that data type. If you have a field Tags on more than one form in a database, once numeric and once text, you're in for big trouble with full text searches. The datatype in searches will depend on which datatype the field was on the first document saved which had that field. Even if you delete all documents that have it as numeric, you won't change the UNK table without the compact. Sounds like that's what you have here. Ensure the database never stores Tags as numeric. Delete or change all docs with it stored numeric. Then compact.
Thank you all for answering. I learned a whole lot about UNK tables and FT index today.
The problem was that I had a numeric field called "Tags" in a form that I hadn't looked at and really didn't think that it would contain a field by that name.
after using the DDE search I found all instances of the tags field and could eaily locate the problem form. I renamed the field in the form, removed the FT indx , used compact -c and recreated the ft index. now everythig is working fine.
One other thing to notice is that I have several databases with the same design but only a few of them had the ft index problem, the reason for this is probably because some of these databases was created after the form with the faulty Tags field was created.
I am so happy to have solved this.
lessons learned
If you plan to use fulltext index in your application. make sure you do not have the same field name in different forms and use different field types.
from now on I will probably use shared fields more :-)
One more thing we discovered
You actually do not need notes peek to find out which field tpe is stored in the UNK table. you can use the "Fields" button in the searchbar. if you select the field and the right hand box displays "contains" you know the unk table has a text field type set.

Resources