Different behaviour in "order by" clause: Oracle vs. PostgreSQL - oracle

I have the following table (created and populated them in Oracle and PostgreSQL):
> create table foo (a varchar(10));
I populated them with values, and order by clause is behaving differently in PostgreSQL and Oracle (I don't think versions are relevant to this question):
Oracle:
> select a, length(a) from foo order by a;
A LENGTH(A)
---------- ----------
.1 2
01 2
1 1
1#0 3
1#1 3
1.0 3
1.1 3
10 2
11 2
9 rows selected.
I get what I expect. .1 before 01, since . is before 0 in ascii table.
However, in PostgreSQL I have:
=> select a, length(a) from foo order by a;
a | length
-----+--------
01 | 2
1 | 1
.1 | 2
10 | 2
1.0 | 3
1#0 | 3
11 | 2
1.1 | 3
1#1 | 3
(9 rows)
Why the difference? I know it probably has something to do with collate order or similar, but I would like some pointers on where to read more about it.
UPDATE: collate info on the PostgreSQL database:
Encoding: UTF8
Collante: en_US.UTF-8
Ctype: en_US.UTF-8 |
Thanks!

Postgres has only two built-in collations: C and POSIX.
Any other collations are provided by operating system.
On many linux systems in UTF locales all non alphanumeric characters are ignored during sorting.
You can obtain expected result using collate C:
select a, length(a) from foo order by a collate "C";
You can find more detailed explanation in this answer.

Related

Confusion regarding to_char and to_number

First of all, I am aware about basics.
select to_number('A231') from dual; --this will not work but
select to_char('123') from dual;-- this will work
select to_number('123') from dual;-- this will also work
Actually in my package, we have 2 tables A(X number) and B(Y varchar) There are many columns but we are worried about only X and Y. X contains values only numeric like 123,456 etc but Y contains some string and some number for eg '123','HR123','Hello'. We have to join these 2 tables. its legacy application so we are not able to change tables and columns.
Till this time below condition was working properly
to_char(A.x)=B.y;
But since there is index on Y, performance team suggested us to do
A.x=to_number(B.y); it is running in dev env.
My question is, in any circumstances will this query give error? if it picks '123' definitely it will give 123. but if it picks 'AB123' then it will fail. can it fail? can it pick 'AB123' even when it is getting joined with other table.
can it fail?
Yes. It must put every row through TO_NUMBER before it can check whether or not it meets the filter condition. Therefore, if you have any one row where it will fail then it will always fail.
From Oracle 12.2 (since you tagged Oracle 12) you can use:
SELECT *
FROM A
INNER JOIN B
ON (A.x = TO_NUMBER(B.y DEFAULT NULL ON CONVERSION ERROR))
Alternatively, put an index on TO_CHAR(A.x) and use your original query:
SELECT *
FROM A
INNER JOIN B
ON (TO_CHAR(A.x) = B.y)
Also note: Having an index on B.y does not mean that the index will be used. If you are filtering on TO_NUMBER(B.y) (with or without the default on conversion error) then you would need a function-based index on the function TO_NUMBER(B.Y) that you are using. You should profile the queries and check the explain plans to see whether there is any improvement or change in use of indexes.
Never convert a VARCHAR2 column that can contain non-mumeric strings to_number.
This can partially work, but will eventuelly definitively fail.
Small Example
create table a as
select rownum X from dual connect by level <= 10;
create table b as
select to_char(rownum) Y from dual connect by level <= 10
union all
select 'Hello' from dual;
This could work (as you limit the rows, so that the conversion works; if you are lucky and Oracle chooses the right execution plan; which is probable, but not guarantied;)
select *
from a
join b on A.x=to_number(B.y)
where B.y = '1';
But this will fail
select *
from a
join b on A.x=to_number(B.y)
ORA-01722: invalid number
Performance
But since there is index on Y, performance team suggested us to do A.x=to_number(B.y);
You should chalange the team, as if you use a function on a column (to_number(B.y)) index can't be used.
On the contrary, your original query can perfectly use the following indexes:
create index b_y on b(y);
create index a_x on a(x);
Query
select *
from a
join b on to_char(A.x)=B.y
where A.x = 1;
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 5 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN| A_X | 1 | 3 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| B_Y | 1 | 2 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."X"=1)
3 - access("B"."Y"=TO_CHAR("A"."X"))

Reorder factored matrix columns in Power BI

I have a matrix visual in Power BI. The columns are departments and the rows years. The values are counts of people in each department each year. The departments obviously don't have a natural ordering, BUT I would like to reorder them using the total column count for each department in descending order.
For example, if Department C has 100 people total over the years (rows), and all the other departments have fewer, I want Department C to come first.
I have seen other solutions that add an index column, but this doesn't work very well for me because the "count of people" variable is what I want to index by and that doesn't already exist in my data. Rather it's a calculation based on individual people which each have a department and year.
If anyone can point me to an easy way of changing the column ordering/sorting that would be splendid!
| DeptA | DeptB | DeptC
------|-------|-------|-------
1900 | 2 | 5 | 10
2000 | 6 | 7 | 2
2010 | 10 | 1 | 12
2020 | 0 | 3 | 30
------|-------|-------|-------
Total | 18 | 16 | 54
Order: #2 #3 #1
I don't think there is a built-in way to do this like there is for sorting the rows (there should be though, so go vote for a similar idea here), but here's a possible workaround.
I will assume your source table is called Employees and looks something like this:
Department Year Value
A 1900 2
B 1900 5
C 1900 10
A 2000 6
B 2000 7
C 2000 2
A 2010 10
B 2010 1
C 2010 12
A 2020 0
B 2020 3
C 2020 30
First, create a new calculated table like this:
Depts = SUMMARIZE(Employees, Employees[Department], "Total", SUM(Employees[Value]))
This should give you a short table as follows:
Department Total
A 18
B 16
C 54
From this, you can easily rank the totals with a calculated column on this Depts table:
Rank = RANKX('Depts', 'Depts'[Total])
Make sure your new Depts table is related to the original Employees table on the Department column.
Under the Data tab, use Modeling > Sort by Column to sort Depts[Department] by Depts[Rank].
Finally, replace the Employees[Department] with Depts[Department] on your matrix visual and you should get the following:

How to do alternate column sorting on two columns?

I know how to sort by two columns. But I wanted to know how to sort by alternate columns in neo4j.
Node name = Product
value | version
1 | 2
4 | 1
2 | 1
4 | 1
2 | 1
3 | 2
There are 2 values of versions 1 and 2. And value can be anything. First it will give higher value of version 1 , then it will give higher value of version 2, then it will give second higher value of version 1 , then it will give second higher value of version 2, and so on.
value | version
4 | 1
3 | 2
4 | 1
1 | 2
2 | 1
1 | 1
I don't know this type of sorting logically done or not through cypher query. I havn't done this type of logic in mysql also. Anyone can give me such clue of ne4j cypher query.
Update :
Match (p:Product)
RETURN p.value as value, p.version as version
ORDER BY version ASC, value DESC
This query sorts by version first then sorts by value. I don't want it.
I want alternate sort.
unwind[1,2,3] AS value unwind[1,2] AS version
RETURN value, version
ORDER BY value DESC , version ASC
value version
3 1
3 2
2 1
2 2
1 1
1 2

Create pivot table using KornShell

I am trying to create a pivot table (csv) using KornShell (ksh).
This is the data (csv):
name a1 a2 amount
----------------------------
product1 | 1 | 1000 | 1.5
product1 | 2 | 2000 | 2.6
product1 | 3 | 3000 | 1.2
product1 | 1 | 3000 | 2.1
product1 | 2 | 3000 | 4.1
product1 | 3 | 2000 | 3.1
The result should be:
__| a2| 1000 | 2000 | 3000
a1 \----------------------
1 | 1.5 2.1
2 | 2.6 4.1
3 | 3.1 1.2
I want to "group" the data by the two attributes and create a table which contains the sums of the amount column for the respective attributes.
EDIT: The attributes a1 and a2 are dynamic. I do not know which one of them is going to exist and which not, or how many attributes there will be at all.
It sounds like you are not using the database as effectively as you might. Here is a thought to help you make some headway: You can generate SQL using the shell.
You know what the beginning of each query is going to look like:
select a1,
So you would want to start building some report SQL:
Report_SQL="select a1, "
You then would need to get the list of SQL statements for an arbitrarily sized set of columns for the pivot report(in MySQL - Other databases would require || concatenation):
select distinct concat('sum(case a2 when ', a2, ' then amount else null end) as "_', a2,'",')
from my_database_table
order by 1
;
Because this is in the shell, it is easy to pull this into a variable as follows:
SQL=" select distinct concat('sum(case a2 when ', a2, ' then amount else null end) as _', a2,',') "
SQL+=" from my_database_table "
SQL+=" order by 1 "
# You would have to define a runsql command for your database platform.
a2_columns=$(runsql "$SQL")
At this point you will have an extra comma at the end of the a2_columns variable. That is easily removed:
a2_columns=${a2_columns%,}
Now we can concatenate these variables to create the report SQL that you really seem to need:
Report_SQL+="${a2_columns}"
Report_SQL+=" from my_database_table "
Report_SQL+=" group by 1"
Report_SQL+=" order by 1"
The resulting report SQL would look something like this:
select a1,
sum(case a2 when 1000 then amount else null end) as _1000,
sum(case a2 when 2000 then amount else null end) as _2000,
sum(case a2 when 3000 then amount else null end) as _3000
from my_database_table
group by 1
order by 1
;
The formatting of the report header is left as an exercise to the reader. :)

Oracle Identify not unique values in a clob column of a table

I want to identify all rows whose content in a clob column is not unique.
The query I use is:
select
id,
clobtext
from
table t
where
(select count(*) from table innerT where dbms_lob.compare(innerT.clobtext, t.clobtext) = 0)>1
However this query is very slow. Any suggestions to speed it up? I already tried to use the dbms_lob.getlength function to eliminate more elements in the subquery but I didn't really improve the performance (feels the same).
To make it more clear an example:
table
ID | clobtext
1 | a
2 | b
3 | c
4 | d
5 | a
6 | d
After running the query. I'd like to get (order doesn't matter):
1 | a
4 | d
5 | a
6 | d
In the past I've generated checksums (in my C# code) for each clob.
Whilst this will inccur a one off increase in io (to generate the checksum)
subsequent scans will be quicker, and you can index the value too
TK has a good PL\SQL example here:
Ask Tom

Resources