I want to create random numbers between 1 and 99,999,999.
I am using the following code:
SELECT CAST(RAND() * 100000000 AS INT) AS [RandomNumber]
However my results are always between the length of 7 and 8, which means that I never saw a value lower then 1,000,000.
Is there any way to generate random numbers between a defined range?
RAND Returns a pseudo-random float value from 0 through 1, exclusive.
So RAND() * 100000000 does exactly what you need. However assuming that every number between 1 and 99,999,999 does have equal probability then 99% of the numbers will likely be between the length of 7 and 8 as these numbers are simply more common.
+--------+-------------------+----------+------------+
| Length | Range | Count | Percent |
+--------+-------------------+----------+------------+
| 1 | 1-9 | 9 | 0.000009 |
| 2 | 10-99 | 90 | 0.000090 |
| 3 | 100-999 | 900 | 0.000900 |
| 4 | 1000-9999 | 9000 | 0.009000 |
| 5 | 10000-99999 | 90000 | 0.090000 |
| 6 | 100000-999999 | 900000 | 0.900000 |
| 7 | 1000000-9999999 | 9000000 | 9.000000 |
| 8 | 10000000-99999999 | 90000000 | 90.000001 |
+--------+-------------------+----------+------------+
I created a function that might help. You will need to send it the Rand() function for it to work.
CREATE FUNCTION [dbo].[RangedRand]
(
#MyRand float
,#Lower bigint = 0
,#Upper bigint = 999
)
RETURNS bigint
AS
BEGIN
DECLARE #Random BIGINT
SELECT #Random = ROUND(((#Upper - #Lower) * #MyRand + #Lower), 0)
RETURN #Random
END
GO
--Here is how it works.
--Create a test table for Random values
CREATE TABLE #MySample
(
RID INT IDENTITY(1,1) Primary Key
,MyValue bigint
)
GO
-- Lets use the function to populate the value column
INSERT INTO #MySample
(MyValue)
SELECT dbo.RangedRand(RAND(), 0, 100)
GO 1000
-- Lets look at what we get.
SELECT RID, MyValue
FROM #MySample
--ORDER BY MyValue -- Use this "Order By" to see the distribution of the random values
-- Lets use the function again to get a random row from the table
DECLARE #MyMAXID int
SELECT #MyMAXID = MAX(RID)
FROM #MySample
SELECT RID, MyValue
FROM #MySample
WHERE RID = dbo.RangedRand(RAND(), 1, #MyMAXID)
DROP TABLE #MySample
--I hope this helps.
Related
If I had a table with two columns
Id
json_data
1
[1, 10, 11]
2
[]
I could easily query the results using
SELECT M1.Id
FROM MyTable M1
WHERE Id NOT EXISTS (
SELECT 1 FROM MyTable M2, (JSON_TABLE(M2.JsonData, '$[*]'
ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
COLUMNS(
Id NVARCHAR2(20) PATH '$'))) JT
WHERE JT.Id = M1.Id)
Now how do I index this column so the query is not doint a full table scan?
MULTIVALUE indexes are used (I believe) for only JSON_EXISTS queries like this one
SELECT Id
FROM MyTable WHERE NOT JSON_EXISTS(JsonData, '$?(# == 1)')
but I can't use this function for non constant values such as M1.Id
MULTIVALUE INDEX is available only in 21c and the array should be a field of a record, and the column must be JSON, it doesn't work with a CLOB with a CHECK constraint "IS JSON".
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9545758700346721260
create table t_test_ixjs (
id number(10,0),
json_data JSON
);
insert into t_test_ixjs(id,json_data) values(1,'{ "a" : [1, 10, 11] }') ;
insert into t_test_ixjs(id,json_data) values(2,'{ "a" : [] }') ;
create multivalue index ix_test_json_data on t_test_ixjs t ( t.json_data.a.number() );
with vals(d) as (
select 1 from dual
)
SELECT *
FROM vals v,
t_test_ixjs WHERE JSON_EXISTS(json_data, '$.a?(# == $d)' PASSING v.d AS "d")
;
Plan hash value: 1205791918
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20513 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T_TEST_IXJS | 1 | 20513 | 2 (0)| 00:00:01 |
| 2 | HASH UNIQUE | | 1 | 20513 | | |
|* 3 | INDEX RANGE SCAN (MULTI VALUE) | IX_TEST_JSON_DATA | 1 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
I encountered a very weird result while trying to filter my data using RAND() function.
Suppose i have a table filled with some data:
CREATE TABLE `status_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rank` int(11) DEFAULT 50,
)
Then i do the following simple select:
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50
and have a clear and expected output:
<...skip...>
| 6575476 | 50 | 34.51090244065123 |
| 6575511 | 50 | 67.84258230388404 |
| 6575589 | 50 | 35.68020727083106 |
| 6575644 | 50 | 74.87329251586766 |
| 6575723 | 50 | 67.32584384020961 |
| 6575771 | 50 | 12.009344726809621 |
| 6575863 | 50 | 58.06919518678374 |
+---------+------+-----------------------+
66169 rows in set (2.502 sec).
So, i generate some random data from 0 to 100 and join each result to the table, around 66000 results in total.
Then i want only a (random) part of the data to be shown. It doesn't have any purpose for production, by the way, it's just some artificial test, so let's not discuss it.
select *
from (
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50) t
where thres>rank
order by thres;
After that i get the following:
<...skip...>
| 4396732 | 50 | 99.97966075314177 |
| 4001782 | 50 | 99.98002871869134 |
| 1788580 | 50 | 99.98064143581375 |
| 5300286 | 50 | 99.98275954274717 |
| 146401 | 50 | 99.98552389441573 |
| 4744748 | 50 | 99.98644758014609 |
+---------+------+--------------------+
16449 rows in set (2.188 sec)
It's obvious that for the mean of 50 the expected number of results should be around 33000 out of total 66000. So it seems that the distribution of rand() is biased, correct?
Let's then change > to <:
select *
from (
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50) t
where thres<rank
order by thres;
<...skip...>
| 4653786 | 50 | 49.98035016467827 |
| 6041489 | 50 | 49.980370281245904 |
| 5064204 | 50 | 49.989308742796354 |
| 1699741 | 50 | 49.991373205549436 |
| 3234039 | 50 | 49.99390454030959 |
| 806791 | 50 | 49.99575274996064 |
| 3713581 | 50 | 49.99814410693771 |
+---------+------+----------------------+
16562 rows in set (2.373 sec)
Again 16000! So not the half but the quarter of all results is shown!
It seems that the output of rand() inside the brackets is somehow influenced with the expression outside them. How is this possible?
I can also union it:
select * from (select id,rank as rank,(rand()*100) as thres from status_log where rank = 50) t where thres<50
UNION ALL
select * from (select id,rank as rank,(rand()*100) as thres from status_log where rank = 50) t where thres>=50;
The expected number of results has to be somewhere around 66000, but it returns only 33000 or so.
I observe this behavior only when rand() is non-deterministic and is generated dynamically each time. If i do ...select id,rank as rank,(rand(id)*100)... (i.e. make the output of rand() dependent of id), i start getting the expected number of results (33000-ish). The same happens if i precalculate and fill a temporary field in the table.
I also tried making the filtering with rank=30, and the results were ~6000 and ~32000 for < and > respectively.
Version 10.5.8-MariaDB-3, InnoDB
Using a single query with HAVING instead of a subquery with WHERE in the main query seems to work around it.
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50
having thres > rank
order by thres
This appears to be this bug:
RAND() evaluated and filtered twice with subquery
I'm wondering if it is possible to create a calculated member to obtain the sum of distinct values for a fact. I will try to explain it with the following example:
I have a fact where the primary key is related with two dimensions (one to many cardinality). The fact contains a measure and its value is the same for all members of each distinct combination of FACT_ID and DIM_1_ID. For the total, I don't want to consider multiple times the same values. So, with the following values the total should be 450 and not 850 (default Mondrian behavior).
| FACT_ID | DIM_1_ID | DIM_2_ID | MEASURE |
|---------|----------|----------|---------|
| 1 | A | D | 100 |
| 1 | A | E | 100 |
| 1 | B | F | 50 |
| 2 | A | D | 300 |
| 2 | A | E | 300 |
|---------|----------|----------|---------|
TOTAL | 450 |
Is it possible? How can it be done with Mondrian?
Thanks in advance
UPDATE - Current status
As described in one of the comments bellow, base on #whytheq's answer, I managed to calculate the right value for the total, using the following MDX formula for the measure:
Sum(
Order(
[dActivity.hActivity].[lActivity].MEMBERS*[dFacility.hFacility].[lFacility].MEMBERS,
[dActivity.hActivity].[lActivity].currentmember.name
) as [m_set] ,
iif(
[m_set].currentordinal = 0
OR
not(
[m_set]
.item([m_set].currentordinal)
.item(0).NAME
=
[m_set]
.item([m_set].currentordinal-1)
.item(0).NAME
) ,
[Measures].[mBudget]
,
0
)
)
However, this expression is using the complete set for every single row, so the result overrides the measure real value for the different fact rows.
| FACT_ID | DIM_1_ID | DIM_2_ID | MEASURE |
|---------|----------|----------|---------|
| 1 | A | D | 450 |
| 1 | A | E | 450 |
| 1 | B | F | 450 |
| 2 | A | D | 450 |
| 2 | A | E | 450 |
|---------|----------|----------|---------|
TOTAL | 450 |
Great question - really tricky to do in MDX.
If we do the following then there are 158 rows returned - a handful have duplicate values for [Measures].[Internet Sales Amount]:
SELECT
[Measures].[Internet Sales Amount] ON 0
,NON EMPTY
Order
(
[Product].[Product].[Product]
,[Measures].[Internet Sales Amount]
,bdesc
) ON 1
FROM [Adventure Works];
This only counts them if the member above is different for the respective measure:
WITH
SET [x] AS
Order
(
NonEmpty
(
[Product].[Product].[Product]
,[Measures].[Internet Sales Amount]
)
,[Measures].[Internet Sales Amount]
,bdesc
)
SET [FILTERED] AS
Filter
(
[x]
,
(
[x].Item(
[x].CurrentOrdinal - 1)
,[Measures].[Internet Sales Amount]
)
<>
(
[x].Item(
[x].CurrentOrdinal)
,[Measures].[Internet Sales Amount]
)
)
MEMBER [Measures].[distCount] AS
Count([FILTERED])
SELECT
[Measures].[distCount] ON 0
FROM [Adventure Works];
Maybe try adding the EXISTING keyword into your calculatio:
Sum
(
Order
(
EXISTING //<<<
[dActivity.hActivity].[lActivity].MEMBERS
*
[dFacility.hFacility].[lFacility].MEMBERS
,[dActivity.hActivity].[lActivity].CurrentMember.Name
) AS [m_set]
,IIF
(
[m_set].CurrentOrdinal = 0
OR
(NOT
[m_set].Item(
[m_set].CurrentOrdinal).Item(0).Name
=
[m_set].Item(
[m_set].CurrentOrdinal - 1).Item(0).Name)
,[Measures].[mBudget]
,0
)
)
You could try to obtain the average over the set. The code is a bit complex.
WITH SET SomeSet AS
{
Fact.FactID.FactID.MEMBERS
*
Fact.DimID1.DimID1.MEMBERS
*
Fact.DimID2.DimID2.MEMBERS
}
MEMBER Measures.AvgVal AS
AVG
(
{Fact.FactID.CURRENTMEMBER}
*
{Fact.DimID1.CURRENTMEMBER}
*
NonEmpty
(
Fact.DimID2.DimID2.MEMBERS,
{{Fact.FactID.CURRENTMEMBER} *
{Fact.DimID1.CURRENTMEMBER}} *
[Measures].[TheMeasure]
)
,
[Measures].[TheMeasure]
)
SELECT NON EMPTY SomeSet ON 1,
NON EMPTY {
[Measures].[TheMeasure],
Measures.AvgVal
} on 0
from [YourCube]
What I am doing is, for the current FactID- DimID1 combination on the axis, I am getting the list of all possible DimID2s and then, over the internally generated non-empty tuples of FactID-DimID1-DimID2, deriving the average value of the measure TheMeasure
So, for example (100+100)/2 = 100 value would be displayed for the combination of FactID = 1 and DimID1 = A
I'm trying to use this recursive SQL feature but can't get it to do what I want, not even close. I've coded up the logic in an unrolled loop, asking if it can be converted into a single recursive SQL query, not the table update style I've used.
http://sqlfiddle.com/#!4/b7217/1
There are six players to be ranked. They have id, group id, score and rank.
Initial state
+----+--------+-------+--------+
| id | grp_id | score | rank |
+----+--------+-------+--------+
| 1 | 1 | 100 | (null) |
| 2 | 1 | 90 | (null) |
| 3 | 1 | 70 | (null) |
| 4 | 2 | 95 | (null) |
| 5 | 2 | 70 | (null) |
| 6 | 2 | 60 | (null) |
+----+--------+-------+--------+
I want to take the person with the highest initial score and give them rank 1. Then I apply 10 bonus points to the score of everyone who has the same group id. Take the next highest, assign rank 2, distribute bonus points and so on until there are no players left.
User id breaks ties.
The bonus points changes the ranking. id=4 initially appears to be second placed with 95, behind the leader with 100 but with the 10 pts bonus, id=2 moves up and takes the spot.
Final state
+-----+---------+--------+------+
| ID | GRP_ID | SCORE | RANK |
+-----+---------+--------+------+
| 1 | 1 | 100 | 1 |
| 2 | 1 | 100 | 2 |
| 4 | 2 | 95 | 3 |
| 3 | 1 | 90 | 4 |
| 5 | 2 | 80 | 5 |
| 6 | 2 | 80 | 6 |
+-----+---------+--------+------+
This is a quite a bit late, but I'm not sure this can be done using Recursive CTE. I did however come up with a solution using the MODEL clause:
WITH SAMPLE (ID,GRP_ID,SCORE,RANK) AS (
SELECT 1,1,100,NULL FROM DUAL UNION
SELECT 2,1,90,NULL FROM DUAL UNION
SELECT 3,1,70,NULL FROM DUAL UNION
SELECT 4,2,95,NULL FROM DUAL UNION
SELECT 5,2,70,NULL FROM DUAL UNION
SELECT 6,2,60,NULL FROM DUAL)
SELECT ID,GRP_ID,SCORE,RANK FROM SAMPLE
MODEL
DIMENSION BY (ID,GRP_ID)
MEASURES (SCORE,0 RANK,0 LAST_RANKED_GRP,0 ITEM_COUNT,0 HAS_RANK)
RULES
ITERATE (1000) UNTIL (ITERATION_NUMBER = ITEM_COUNT[1,1]) --ITERATE ONCE FOR EACH ITEM TO BE RANKED
(
RANK[ANY,ANY] = CASE WHEN SCORE[CV(),CV()] = MAX(SCORE) OVER (PARTITION BY HAS_RANK) THEN RANK() OVER (ORDER BY SCORE DESC,ID) ELSE RANK[CV(),CV()] END, --IF THE CURRENT ITEM SCORE IS EQUAL TO THE MAX SCORE OF UNRANKED, ASSIGN A RANK
LAST_RANKED_GRP[ANY,ANY] = FIRST_VALUE(GRP_ID) OVER (ORDER BY RANK DESC),
SCORE[ANY,ANY] = CASE WHEN RANK[CV(),CV()] = 0 AND CV(GRP_ID) = LAST_RANKED_GRP[CV(),CV()] THEN SCORE[CV(),CV()]+10 ELSE SCORE[CV(),CV()] END,
ITEM_COUNT[ANY,ANY] = COUNT(*) OVER (),
HAS_RANK[ANY,ANY] = CASE WHEN RANK[CV(),CV()] <> 0 THEN 1 ELSE 0 END --TO SEPARATE RANKED/UNRANKED ITEMS
)
ORDER BY RANK;
It's not very pretty, and I suspect there is a better way to go about this, but it does give the expected output.
Caveats:
You'd have to increase the iteration count if you have more than that number of rows.
This does a full re-ranking based on the score after each iteration. So if we took your sample data, but changed the initial score of item 2 to 95 rather than 90: after ranking item 1 and giving the 10 point bonus to item 2, it now has a score of 105. So we rank it as 1st and move item 1 down to 2nd. You'd have to make a few modifications if this is not the desired behavior.
This is a bit hard to explain in words ... I'm trying to calculate a sum of grouped distinct values in a matrix. Let's say I have the following data returned by a SQL query:
------------------------------------------------
| Group | ParentID | ChildID | ParentProdCount |
| A | 1 | 1 | 2 |
| A | 1 | 2 | 2 |
| A | 1 | 3 | 2 |
| A | 1 | 4 | 2 |
| A | 2 | 5 | 3 |
| A | 2 | 6 | 3 |
| A | 2 | 7 | 3 |
| A | 2 | 8 | 3 |
| B | 3 | 9 | 1 |
| B | 3 | 10 | 1 |
| B | 3 | 11 | 1 |
------------------------------------------------
There's some other data in the query, but it's irrelevant. ParentProdCount is specific to the ParentID.
Now, I have a matrix in the MS Report Designer in which I'm trying to calculate a sum for ParentProdCount (grouped by "Group"). If I just add the expression
=Sum(Fields!ParentProdCount.Value)
I get a result 20 for Group A and 3 for Group B, which is incorrect. The correct values should be 5 for group A and 1 for group B. This wouldn't happen if there wasn't ChildID involved, but I have to use some other child-specific data in the same matrix.
I tried to nest FIRST() and SUM() aggregate functions but apparently it's not possible to have nested aggregation functions, even when they have scopes defined.
I'm pretty sure there is some way to calculate the grouped distinct sum without needing to create another SQL query. Anyone got an idea how to do that?
Ok I got this sorted out by adding a ROW_NUMBER() function my SQL query:
SELECT Group, ParentID, ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY ChildID ASC) AS Position, ChildID, ParentProdCount FROM Table
and then I replaced the SSRS SUM function with
=SUM(IIF(Position = 1, ParentProdCount.Value, 0))
Put a grouping over the ParentID and use a summation over that group,
eg:
if group over ParentID = "ParentIDGroup"
then
column sum of ParentPrdCount = SUM(Fields!ParentProdCount.Value,"ParentIDGroup")