Oracle Db Sorting Issue for Turkish Text - spring

We're developing project with Spring Boot (JPA-Spring-Data) and Oracle Db. When trying to sort column which includes text, oracle gives me wrong sorting for Turkish Chars. Liquibase is our db migration tool.
This is a script for creation of table's column;
<column name="CONTENT" type="VARCHAR2(255 char)">
<constraints nullable="false"/>
</column>
Entitys property;
#Column(name = "CONTENT", nullable = false)
private String content;
When trying to order by Content; The result looks like below
But for example; İngilizceeee row would be after 'I' letter. There is an issue for Turkish text.

Oracle uses binary sort by default. This works by sorting the numerical value behind the character encoding. This works well for the English alphabet because the ASCII and EBCDIC standards define the letters A to Z in ascending numeric value.
Try using 'Linquistic Sort'. See here -
https://docs.oracle.com/cd/B10501_01/server.920/a96529/ch4.htm
Example -
SELECT * FROM test ORDER BY NLSSORT(name, 'NLS_SORT=german');

Related

Elastic API custom date field range query

I'm learning Elasticsearch API while practicing I'm facing the issue is unable to fetch documents between two dates those documents match two fields but without date range it's working fine
BoolQueryBuilder filter = new BoolQueryBuilder();
BoolQueryBuilder query = QueryBuilders.boolQuery();
for (String q : list) {
// both the fields must exists
query = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("field1", q))
.must(QueryBuilders.matchQuery("field2", val));
filter.should(query);
}
filter.must(QueryBuilders.rangeQuery("datetime").gte(from).lte(to);
searchSourceBuilder.query(filter);
Where,
list contains the list of words for the field1 field.
Both field1 & field2 must match such document I want to retrieve
datetime is a custom datetime field & the value looks like 2022-06-09 12:32:36
Can anyone help me to resolve this issue
I think you need, Date format to convert date values in the query. To format your dates, either use the built-in formats provided by ES - https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html
or, you can try customised format - https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
After formatting, gte and lte should work as expected.

Which Postgresql index is most efficient for text column with queries based on similarity

I would like to create an index on text column for the following use case. We have a table of Segment with a column content of type text. We perform queries based on the similarity by using pg_trgm. This is used in a translation editor for finding similar strings.
Here are the table details:
CREATE TABLE public.segments
(
id integer NOT NULL DEFAULT nextval('segments_id_seq'::regclass),
language_id integer NOT NULL,
content text NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT segments_pkey PRIMARY KEY (id),
CONSTRAINT segments_language_id_fkey FOREIGN KEY (language_id)
REFERENCES public.languages (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT segments_content_language_id_key UNIQUE (content, language_id)
)
And here is the query (Ruby + Hanami):
def find_by_segment_match(source_text_for_lookup, source_lang, sim_score)
aggregate(:translation_records)
.where(language_id: source_lang)
.where { similarity(:content, source_text_for_lookup) > sim_score/100.00 }
.select_append { float::similarity(:content, source_text_for_lookup).as(:similarity) }
.order { similarity(:content, source_text_for_lookup).desc }
end
---EDIT---
This is the query:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity" FROM "segments" WHERE (("language_id" = 2) AND (similarity("content", 'This will not work.') > 0.45)) ORDER BY SIMILARITY("content", 'This will not work.') DESC
SELECT "translation_records"."id", "translation_records"."source_segment_id", "translation_records"."target_segment_id", "translation_records"."domain_id",
"translation_records"."style_id",
"translation_records"."created_by", "translation_records"."updated_by", "translation_records"."project_name", "translation_records"."created_at", "translation_records"."updated_at", "translation_records"."language_combination", "translation_records"."uid",
"translation_records"."import_comment" FROM "translation_records" INNER JOIN "segments" ON ("segments"."id" = "translation_records"."source_segment_id") WHERE ("translation_records"."source_segment_id" IN (27548)) ORDER BY "translation_records"."id"
---END EDIT---
---EDIT 1---
What about re-indexing? Initially we'll import about 2 million legacy records. When and how often, if at all, should we rebuild the index?
---END EDIT 1---
Would something like CREATE INDEX ON segment USING gist (content) be ok? I can't really find which of the available indices would be best suitable for our use case.
Best, seba
The 2nd query you show seems to be unrelated to this question.
Your first query can't use a trigram index, as the query would have to be written in operator form, not function form, to do that.
In operator form, it would look like this:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity"
FROM segments
WHERE language_id = 2 AND content % 'This will not work.'
ORDER BY content <-> 'This will not work.';
In order for % to be equivalent to similarity("content", 'This will not work.') > 0.45, you would first need to do a set pg_trgm.similarity_threshold TO 0.45;.
Now how you get ruby/hanami to generate this form, I don't know.
The % operator can be supported by either the gin_trgm_ops index or the gist_index_ops index. The <-> can only be supported by gist_trgm_ops. But it is pretty hard to predict how efficient that support will be. If your "contents" column is long or your text to compare is long, it is unlikely to be very efficient, especially in the case of gist.
Ideally you would partition your table by language_id. If not, then it might be helpful to build a multicolumn index having both columns.
CREATE INDEX segment_language_id_idx ON segment USING btree (language_id);
CREATE INDEX segment_content_gin ON segment USING gin (content gin_trgm_ops);

Rails compare same object's 1 field with another + addition of string in Active Record

I've two string fields which contains dates in string like field_1 = "2003.11.14" and I use them in ORM and they are working just fine. Now I want to compare 1 field value with another field's - 18.months. Here is a example
User.where("users.field_1 > '#{Date.today - 18.months}' AND users.field_2 > (users.fields_1 - 18.months)")
something like. Can anyone help me?
Thanks in advance
Most databases support data calculations in SQL. Something like this should work.
query = User.where("users.field_1 > ?", 18.months.ago)
query.where("users.field_2 > users.field_1 - :time", time: 18.months.ago)
edit: Just saw that the values are stored as strings, then you can not use SQL.
can not do that because the table has millions of records
I don't really understand why the size of the table limits to use the correct data type?

Is there an ISNUMBER() or ISTEXT() equivalent for Power Query?

I have a column with mixed types of Number and Text and am trying to separate them into different columns using an if ... then ... else conditional. Is there an ISNUMBER() or ISTEXT equivalent for power query?
Here is how to check type in Excel Powerquery
IsNumber
=Value.Is(Value.FromText([ColumnOfMixedValues]), type number)
IsText
=Value.Is(Value.FromText([ColumnOfMixedValues]), type text)
hope it helps!
That depends a bit on the nature of the data and how it is originally encoded. Power Query is more strongly typed than Excel.
For example:
Source = Table.FromRecords({[A=1],[A="1"],[A="a"]})
Creates a table with three rows. The first row's data type is number. The second and third rows are both text. But the second row's text could be interpreted as a number.
The following is a query that creates two new columns showing if each row is a text or number type. The first column checks the data type. The second column attempts to guess the data type based on the value. The guessing code assumes everything that isn't a number is text.
Example Code
Edit: Borrowing from #AlejandroLopez-Lago-MSFT's comment for the interpreted type.
let
Source = Table.FromRecords({[A=1],[A="1"],[A="a"]}),
#"Added Custom" = Table.AddColumn(Source, "Type", each
let
TypeLookup = (inputType as type) as text =>
Table.FromRecords(
{
[Type=type text, Value="Text"],
[Type=type number, Value="Number"]
}
){[Type=inputType]}[Value]
in
TypeLookup(Value.Type([A]))
),
#"Added Custom 2" = Table.AddColumn(#"Added Custom", "Interpreted Type", each
let
result = try Number.From([A]) otherwise "Text",
resultType = if result = "Text" then "Text" else "Number"
in
resultType
)
in
#"Added Custom 2"
Sample output
Put it in logical test format
Value.Type([Column1]) = type number
Value.Type([Column1]) = type text
The function Value.Type returns a type, so by putting it in equation thus return a true / false.
Also, equivalently,
Value.Type([Column1]) = Date.Type
Value.Type([Column1]) = Text.Type
HTH
ISTEXT() doesn't exist in any language I've worked with - typically any numeric or date value can be converted to text so what would be a false result?
For ISNUMBER, I would solve this without any code by changing the Data Type to a number type e.g. Whole Number. Any rows that don't convert will show Error - you can then apply Replace Errors or Remove Errors to handle them.
Use Duplicate Column first if you don't want to disturb the original column.
I agree with Mike Honey.
I have a SKU code that is a mix of Char and Num.
Normally the last 8 Char are Numbers but in some weird circumstances the SKU is repeated with an additional letter but given the same EAN which causes chaos.
by creating a new temp column using Text.End(SKU, 1) I get only the last character. I then convert that column to Whole Number. Any Error rows are then removed to leave only the rows I need. I then delete the temp Column and am left with the Rows I need in the format I started with.

Date formatting in a grid

I'm trying to display a date column in grid like this: "dd-mm-yyyy". In dbf table, the date is stored in this format: "YYYY-MM-DDThh:mm:ss" in a character field.
The grid is created from this cursor:
select id,beginningDate,endDate,cnp from doc ORDER BY id desc INTO CURSOR myCursor
I wish something like this:
select id,convert(beginningDate, Datetime,"dd-mm-yyyy"),endDate,cnp from doc ORDER BY id desc INTO CURSOR myCursor
Fox doesn't have a builtin function called convert(), nor can it handle your non-standard date/time string format directly.
A quick and dirty way to convert a string foo in the given format ("YYYY-MM-DDThh:mm:ss") to a date/time value is
ctot("^" + chrtran(foo, "T", " "))
The caret marks the input as the locale-independent standard format, which differs from the input format only by having a space instead of a 'T'.
You can extract the date portion from this via the ttod() function, or simply extract only the date portion from the string and convert that:
ctod("^" + left(foo, 10))
Fox's controls - including those in a grid - normally use the configured Windows system format (assuming that set("SYSFORMATS") == "ON"); you can override this by playing with the SET DATE command.
There seems to be no mask-based date formatting option as in most other languages. dtoc() and ttoc() don't take format strings, transform() takes a format string but blithely ignores it for date values.
I am with Tamar on this subject, you should have used a datetime field instead.
Since you are storing it like this anyway, you can 'convert' to datetime using the built-in cast function (or ttod(ctot()) in versions older than VFP9 - in either case you don't need to remove T character):
select id, ;
Cast(Cast("^"+beginningDate as datetime) as date) as beginningDate, ;
endDate,cnp ;
from doc ;
ORDER BY id desc ;
INTO CURSOR myCursor ;
nofilter
In grid or any other textbox control, you can control its display style using DateFormat property. ie:
* assuming it is Columns(2). 11 is DMY
thisform.myGrid.Columns(2).SetAll('DateFormat', 11)

Resources