Oracle: search for diacritics

Oracle: search for diacritics - oracle

In Oracle I try to find all the rows that contains some diacritics in one column. I used something like:
where regexp_like(name,'(Ă|Î|Ș|Ț|Â)','i');
The problem is that it also returns rows that contain the letters without diacritics (A,I,S,T). For example the clause above will return a row that contains "Adrian" as name.
How can I search only for diacritics?
Thank you

The way diacritics is handled in comparisons and when sorting is a property of the session that depends on the value of NLS_SORT. See Linguistic Sorting and String Searching

I think it may be caused by character conversion.
What do you get when you run the query?:
select 'ĂÎȘȚÂ' from dual

Related

Oracle BI (OBIEE) - Wildcard search for numeric values only

I need to set up a query that returns only the values that would match a pattern {Any character & digit}
I've tried setting up a filter criteria to ' %_
[0-9] but it doesn't work...I want to exclude any results that have two letters as the first two characters of a value.
Thanks!

In the presentation column, you can use the OBIEE - Evaluate - Embedded DB Functions in a formula.
For example (not the exact regex you are looking for but you get the point):
This link contains the same image/example I uploaded, as well as a few others: http://gerardnico.com/wiki/dat/obiee/regexp_evaluate

how to replace characters in hive?

I have a string column description in a hive table which may contain tab characters '\t', these characters are however messing some views when connecting hive to an external application.
is there a simple way to get rid of all tab characters in that column?. I could run a simple python program to do it, but I want to find a better solution for this.

regexp_replace UDF performs my task. Below is the definition and usage from apache Wiki.
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT):
This returns the string resulting from replacing all substrings in INITIAL_STRING
that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT,
e.g.: regexp_replace("foobar", "oo|ar", "") returns fb

Custom SerDe might be a way to do it. Or you could use some kind of mediation process with regex_replace:
create table tableB as
select
columnA
regexp_replace(description, '\\t', '') as description
from tableA
;

select translate(description,'\\t','') from myTable;
Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)
Char/varchar support added as of Hive 0.14.0

You can also use translate(). If the third argument is too short, the corresponding characters from the second argument are deleted. Unlike regexp_replace() you don't need to worry about special characters.
Source code.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

There is no OOTB feature at this moment which allows this. One way to achieve that could be to write a custom InputFormat and/or SerDe that will do this for you. You might this JIRA useful : https://issues.apache.org/jira/browse/HIVE-3751. (not related directly to your problem though).

Search for similar words using an index

I need to search over a DB table using some kind of fuzzy search like the one from oracle and using indexes since I do not want a table scan(there is a lot of data).
I want to ignore case, language special stuff(ñ, ß, ...) and special characters like _, (), -, etc...
Search for "maria (cool)" should get "maria- COOL" and "María_Cool" as matches.
Is that possible in Oracle in some way?
About the case, I think it can be solved created the index directly in lower case and searching always lower-cased. But I do not know how to solve the special chars stuff.
I thought about storing the data without special chars in a separated column and searching on that returning the real one, but I am not 100% sure where that is the perfect solution.
Any ideas?

Maybe UTL_MATCH can help.
But you can also create a function based index on, lets say, something like this:
regexp_replace(your_column, '[^0-9a-zA-Z]+', ' ')
And try to match like this:
...
WHERE regexp_replace(your_column, '[^0-9a-zA-Z]+', ' ') =
regexp_replace('maria (cool)' , '[^0-9a-zA-Z]+', ' ')
Here is a sqlfiddle demo It's not complete, but can be a start

Reversing a string using an index in Oracle

I have a table that has IDs and Strings and I need to be able to properly index for searching for the end of the strings. How we are currently handling it is copying the information into another table and reversing each string and indexing it normally. What I would like to do is use some kind of index that allows to search in reverse.
Example
Data:
F7421kFSD1234
d7421kFSD1235
F7541kFSD1236
d7421kFSD1234
F7421kFSD1235
b8765kFSD1235
d7421kFSD1234
The way our users usually input thier search is something along the lines of...
*1234
By reversing the strings (and the search string: 4321*) I could find what I am looking for without completely scanning the whole table. My question is: Is making a second table the best way of doing this?
Is there a way to reverse index?
Ive tried an index like this...
create index REVERSE_STR_IDX on TABLE(STRING) REVERSE;
but oracle doesn't seem to be using it according to the Explain Plan.
Thanks in advance for the help.
Update:
I did have a problem with unicode characters not being reversed correctly. The solution to this was casting them.
Example:
select REVERSE(cast(string AS varchar2(2000)))
from tbl
where id = 1

There is the myth that a reverse key index can be used for that, however, I've never seen that in action.
I would try a "manual" function based index.
CREATE INDEX REVERSE_STR_IDX on TBL(reverse(string));
SELECT *
FROM TBL
WHERE reverse(string) LIKE '4321%';

Oracle empty strings

How do you guys treat empty strings with Oracle?
Statement #1: Oracle treats empty string (e.g. '') as NULL in "varchar2" fields.
Statement #2: We have a model that defines abstract 'table structure', where for we have fields, that can't be NULL, but can be "empty". This model works with various DBMS; almost everywhere, all is just fine, but not with Oracle. You just can't insert empty string into a "not null" field.
Statement #3: non-empty default value is not allowed in our case.
So, would someone be so kind to tell me - how can we resolve it?

This is why I've never understood why Oracle is so popular. They don't actually follow the SQL standard, based on a silly decision they made many years ago.
The Oracle 9i SQL Reference states (this has been there for at least three major versions):
Oracle currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.
But they don't say what you should do. The only ways I've ever found to get around this problem are either:
have a sentinel value that cannot occur in your real data to represent NULL (e.g, "deoxyribonucleic" for a surname field and hope that the movie stars don't start giving their kids weird surnames as well as weird first names :-).
have a separate field to indicate whether the first field is valid or not, basically what a real database does with NULLs.

Are we allowed to say "Don't support Oracle until it supports the standard SQL behaviour"? It seems the least pain-laden way in many respects.
If you can't force (use) a single blank, or maybe a Unicode Zero Width Non-Break Space (U+FEFF), then you probably have to go the whole hog and use something implausible such as 32 Z's to indicate that the data should be blank but isn't because the DBMS in use is Orrible.

Empty string and NULL in Oracle are the same thing. You want to allow empty strings but disallow NULLs.
You have put a NOT NULL constraint on your table, which is the same as a not-an-empty-string constraint. If you remove that constraint, what are you losing?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Oracle: search for diacritics - oracle

The way diacritics is handled in comparisons and when sorting is a property of the session that depends on the value of NLS_SORT. See Linguistic Sorting and String Searching

I think it may be caused by character conversion. What do you get when you run the query?: select 'ĂÎȘȚÂ' from dual

Related

Oracle BI (OBIEE) - Wildcard search for numeric values only

how to replace characters in hive?

Search for similar words using an index

Reversing a string using an index in Oracle

Oracle empty strings

Categories

Resources