MonetDB table statistics - monetdb

Below is a portion of statistics of one of my tables. I'm not sure how to understand width column. Are those values in bytes? If so, I know fname and lname have higher ascii char counts than 5 and 6 and there are some 1 char long values in mname.
Update 1.
Below is the output of select * from statistics. I'm only showing first 5 columns of the ouput.
+--------+---------+------------------------+---------+-------+
| schema | table | column | type | width |
+========+=========+========================+=========+=======+
| abc | targets | fname | varchar | 5 |
| abc | targets | mname | varchar | 0 |
| abc | targets | lname | varchar | 6 |

The column width shows the "byte-width of the atom array" (defined in gdk.h). This is however not the entire story in the case of string columns, because here the atom array only stores offsets into a string heap.
MonetDB uses variable-width columns, because if there are few distinct string values, 64-bit offsets would be a waste of memory. So in your case, the fname column needs string offsets with 5 bytes, or 40 bits, and lname needs 6 bytes (48 bits). This could change if new values are inserted.
The zero value for mname is interesting, because the width is initialised to 1 for new columns. Which version are you using?

Related

Powerquery - appending the same table to itself using differing columns

So I have a list of properties and a list of the next four servicing dates
e.g:
Property| Last | Next1 | Next2 | Next3 | Next4 |
123 Road| 01-2019 |03-2019| 05-2019| 07-2019| 09-2019|
444 Str | 01-2019 |07-2019| 01-2020| 07-2020| 01-2021|
etc.
I want to see:
Property | Date
123 Road | 01-2019
444 Str | 01-2019
123 Road | 03-2019
123 Road | 05-2019
123 Road | 07-2019
444 Str | 07-2019
etc.
In SQL this would be a union join, in powerquery. I think it's an append, but I'm not sure how to go about it. i.e. how to select columns from a table, then append a table with a different selection. I can append the full table easily, but not certain columns.
Select the date columns and do Transform > Unpivot Columns.
Then you can rename the Value column to Date, remove the Attribute column if you want, and sort as desired.

how to count number of words in each column delimited by "|" seperator using hive?

input data is
+----------------------+--------------------------------+
| movie_name | Genres |
+----------------------+--------------------------------+
| digimon | Adventure|Animation|Children's |
| Slumber_Party_Massac | Horror |
+----------------------+--------------------------------+
i need output like
+----------------------+--------------------------------+-----------------+
| movie_name | Genres | count_of_genres |
+----------------------+--------------------------------+-----------------+
| digimon | Adventure|Animation|Children's | 3 |
| Slumber_Party_Massac | Horror | 1 |
+----------------------+--------------------------------+-----------------+
select *
,size(split(coalesce(Genres,''),'[^|\\s]+'))-1 as count_of_genres
from mytable
This solution covers varying use-cases, including -
NULL values
Empty strings
Empty tokens (e.g. Adventure||Animation orAdventure| |Animation )
This is a really, really bad way to store data. You should have a separate MovieGenres table with one row per movie and per genre.
One method is to use length() and replace():
select t.*,
(1 + length(genres) - length(replace(genres, '|', ''))) as num_genres
from t;
This assumes that each movie has at least one genre. If not, you need to test for that as well.

Using a non-literal value in Apache Derby's OFFSET clause

Using Derby, is it possible to offset by a value from the query rather than an integer literal?
When I run this query, it complains about the value I've given to the offset clause.
select
PRIZE."NAME" as "Prize Name",
PRIZE."POSITION" as "Position",
(select
PARTICIPANT."NAME"
from PARTICIPANT
order by POINTS desc
offset PRIZE."POSITION" rows fetch next 1 row only <-- notice I'm trying to pass in a value to offset by
) as "Participant"
from PRIZE
With the expectation that the results would look like this:
| Prize Name | Position | Participant |
|--------------|----------|---------------|
| Gold medal | 1 | Mari Loudi |
| Silver medal | 2 | Keesha Vacc |
| Bronze medal | 3 | Melba Hammit |
| Hundredth | 100 | James Thornby |
The documentation suggests that it's possible to pass in a value from java code, but I'm trying to use a value from the query itself.
By the way, this is just an example schema to illustrate the point.
I know there are other ways to achieve the ranking, but I'm specifically interested if there's a way to pass values to the offset clause.

Creating sql statements to return information from a table

I am creating sql queries to return information from a table, but I am having issues with one in particular. I want to return all of the urban areas that are in the country of colorado.
The actual definition of the query is
Return the names (name10) of all urban areas (in alphabetical order) that are entirely contained
within Colorado. Return the results in alphabetical order. (64 records)
The tables that I am using are tl_2010_us_state10 (this stores information for the states). I think I am going to use the name10 variable in this table because that has all of the names of the states.
Table "public.tl_2010_us_state10"
Column | Type | Modifiers
------------+-----------------------------+-------------------------------------
gid | integer | not null default
region10 | character varying(2) |
division10 | character varying(2) |
statefp10 | character varying(2) |
statens10 | character varying(8) |
geoid10 | character varying(2) |
stusps10 | character varying(2) |
name10 | character varying(100) |
Then I have a table that displays all the urban information. Once again I think I am going to use the name10 variable because it stores the name of all the urban areas.
Table "public.tl_2010_us_uac10"
Column | Type | Modifiers
------------+-----------------------------+-------------------------------------
gid | integer | not null default
uace10 | character varying(5) |
geoid10 | character varying(5) |
name10 | character varying(100) |
The code That I wrote in my sql was
select a.name10 from tl_2010_us_uac10 as a join tl_2010_us_state10 as b where (b.name10 = 'colorado');
but I get this error
LINE 1: ...l_2010_us_uac10 as a join tl_2010_us_state10 as b where (b.n...
gid is a primary key
You must have a join condition for an inner join. Then an order by to meet your sorting requirement.
select a.name10 as urban_area
from tl_2010_us_uac10 as a
join tl_2010_us_state10 as b
on b.gid = a.gid
where b.name10 = 'colorado'
order by a.name10;

SQL Server- RAND() - range

I want to create random numbers between 1 and 99,999,999.
I am using the following code:
SELECT CAST(RAND() * 100000000 AS INT) AS [RandomNumber]
However my results are always between the length of 7 and 8, which means that I never saw a value lower then 1,000,000.
Is there any way to generate random numbers between a defined range?
RAND Returns a pseudo-random float value from 0 through 1, exclusive.
So RAND() * 100000000 does exactly what you need. However assuming that every number between 1 and 99,999,999 does have equal probability then 99% of the numbers will likely be between the length of 7 and 8 as these numbers are simply more common.
+--------+-------------------+----------+------------+
| Length | Range | Count | Percent |
+--------+-------------------+----------+------------+
| 1 | 1-9 | 9 | 0.000009 |
| 2 | 10-99 | 90 | 0.000090 |
| 3 | 100-999 | 900 | 0.000900 |
| 4 | 1000-9999 | 9000 | 0.009000 |
| 5 | 10000-99999 | 90000 | 0.090000 |
| 6 | 100000-999999 | 900000 | 0.900000 |
| 7 | 1000000-9999999 | 9000000 | 9.000000 |
| 8 | 10000000-99999999 | 90000000 | 90.000001 |
+--------+-------------------+----------+------------+
I created a function that might help. You will need to send it the Rand() function for it to work.
CREATE FUNCTION [dbo].[RangedRand]
(
#MyRand float
,#Lower bigint = 0
,#Upper bigint = 999
)
RETURNS bigint
AS
BEGIN
DECLARE #Random BIGINT
SELECT #Random = ROUND(((#Upper - #Lower) * #MyRand + #Lower), 0)
RETURN #Random
END
GO
--Here is how it works.
--Create a test table for Random values
CREATE TABLE #MySample
(
RID INT IDENTITY(1,1) Primary Key
,MyValue bigint
)
GO
-- Lets use the function to populate the value column
INSERT INTO #MySample
(MyValue)
SELECT dbo.RangedRand(RAND(), 0, 100)
GO 1000
-- Lets look at what we get.
SELECT RID, MyValue
FROM #MySample
--ORDER BY MyValue -- Use this "Order By" to see the distribution of the random values
-- Lets use the function again to get a random row from the table
DECLARE #MyMAXID int
SELECT #MyMAXID = MAX(RID)
FROM #MySample
SELECT RID, MyValue
FROM #MySample
WHERE RID = dbo.RangedRand(RAND(), 1, #MyMAXID)
DROP TABLE #MySample
--I hope this helps.

Resources