Count instances on a table - relational-algebra

If I have a table with duplicate instances how can I count if I don't have a count function?
All I have is select, project, union, difference, product, intersect, njoin. I am using WinRDBI.
Table looks like this:
Children
ID| NAME|
A | 'alice'
A | 'jon'
A | 'alex'
B | 'joe'
B | 'mary'
C | 'amy'
Parent
ID| NAME|
A | 'Smith'
B | 'Johnson'
C | 'Meyer'
I want to know how what parent has two children.

Use the difference operator and the fact that (n*n)-n = n is true only for n = 2 when n > 0.
For each parent, create a cross product of their Children [call this "C"] with itself by renaming second copy of Children [call this "C1"]. Let's call this "CxC1"
If (((select attributes of C from CxC1) - C) = C) then the parent has exactly 2 children [1]
[1] Assuming referential integrity such that a parent cannot have zero children.

Related

How do i strip rows inside a Column-Table, based on the "outer" tables value in Power Query?

Beeing pretty new to Power Query, I find myself faced with this problem I wish to solve.
I have a TableA with these columns. Example:
Key | Sprint | Index
-------------------------
A | PI1-I1 | 1
A | PI1-I2 | 2
B | PI1-I3 | 1
C | PI1-I1 | 1
I want to end up with a set looking like this:
Key | Sprint | Index | HasSpillOver
-------------------------
A | PI1-I1 | 1 | Yes
A | PI2-I2 | 2 | No
B | PI1-I3 | 1 | No
C | PI1-I1 | 1 | No
I thought I could maybe nestedjoin TableA on itself and then compare indicies and strip them away and then count rows in the table, like outlined below.
TableA=Key, Sprint, Index
// TableA Nested joined on itself (Key, Sprint, Index, Nested)
TableB=NestedJoin(#"TableA", "Key", #"TableA", "Key", "Nested", JoinKind.Inner)
TableC= Table.TransformColumns(#"TableB", {"Nested", (x)=>Table.SelectRows(x, each [Index] <x[Index])} )
.. and then do the count, however this throws an error:
Can not apply operator < on types List and Number.
Any suggestions how to approach this problem? Possibly (probably) in a different way.
You did not define very well what "spillover" means but this should get you most of the way
Mine assumes adding another index. You could use what you have if it is relevant
Then the code counts the number of rows where the (2nd) index is higher, and the [Key] field matches. You could add code so that the Sprint field matches as well if relevant
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index.1", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index" ,"Count",(i)=>Table.RowCount(Table.SelectRows(#"Added Index" , each [Key]=i[Key] and [Index.1]>i[Index.1])))
in #"Added Custom"

Scanning into struct of gorm query

I'm trying to scan the result of a query into a res structure.
The code builds and the query passes but the result array consists of default values like this:
[{0 0 0} {0 0 0} {0 0 0} {0 0 0} {0 0 0} {0 0 0}]
Also, result array has the exact length as the query result should have.
When i try generated query in postgres shell it returns the result correctly.
Code:
type res struct{
id int
number int
user_id int
}
func getDataJoin(){
new := []res{}
db.Db.Table("users").Select("users.id as id, credit_cards.number as number, credit_cards.user_id as user_id").Joins("left join credit_cards on credit_cards.user_id = users.id").Scan(&new)
fmt.Println("user\n",new)
}
Generated Query:
SELECT users.id as id, credit_cards.number as number, credit_cards.user_id as user_id FROM "users" left join credit_cards on credit_cards.user_id = users.id
Database result
id | number | user_id
----+--------+---------
1 | 1 | 1
1 | 2 | 1
2 | 1 | 2
2 | 2 | 2
3 | 1 | 3
3 | 2 | 3
(6 rows)
Since go-gorm has a certain convention when it comes to naming, you might want to try two things.
Make your res struct publicly available, with public fields:
type Res struct{
ID int
Number int
UserID int
}
Or, specify mappings between columns and fields:
type res struct{
id int `gorm:"column:id"`
number int `gorm:"column:number"`
user_id int `gorm:"column:user_id"`
}
gorm can only read/write on exported fields much like Marshal/Unmarshal methods of json package. If the first letter of your field is in capital, it will be used. By default, gorm matches struct fields with their camel-cased forms. You can also define your own column names.
Since camel-cased form of both ID and Id, is id, as long as the first letter of your field is in capital, it should work. On a different note, it's good practice to write ID, i.e., both letter capital.

ActiveRecord - How are Associations stored/generated? Need to import XML to Postgres database in Sinatra App

I have an XML dataset I am trying to import into a Postgresql database to be used by a Sinatra app.
The data consists, essentially, of:
ArchiveObjects
Tags
I would like to define these relationships in my DB Migration:
ArchiveObject - has_many Tags
ArchiveObject - has_many_and_belongs_to ArchiveObjects
Tag - belongs_to ArchiveObject
We can manipulate the XML and were thinking it is best to treat this relationship as an array of ID's, for example:
ArchiveObject.tags = [Array, of, Tags]
However, I'm unsure how we should format the XML, am I correct in assuming that the migration has_many :Tags creates a column to hold an array? Or does ActiveRecord do something completely different? (eg: Create a junction table)
has_many :tags will imply that the database schema will look as follows
create table archive_object(
id int not null
);
create table tag(
id int not null,
archive_object_id int not null
);
i.e. each tag record will contain the id of the archive object that it belongs to.
So there is a column involved, but it doesn't contain an array struct, it contains a reference to another table, and it is the Owned object that contains a reference to its parent.
Edit: An attempt to further explain this.
Given the structures:
A1 : ArchiveObject, A1.tags => [T1, T2, T3]
A2 : ArchiveObject, A2.tags => [T4, T5]
You would have the SQL tables:
archive_objects
id | name
-------------
1 | A1
2 | A2
tags
id | archive_object_id | name
-----------------------------
1 | 1 | T1
2 | 1 | T2
3 | 1 | T3
4 | 2 | T4
5 | 2 | T5
This is the whole point behind a relational database model - the logical list of [T1,T2,T3] belonging to A1 is mapped as a table with 3 rows, linked in a SQL join:
A1.tags # would run the query SELECT * FROM tags WHERE archive_object_id = 1;
A2.tags # would run the query SELECT * FROM tags WHERE archive_object_id = 2;
and is part of the heavy lifting that ActiveRecord endeavours to do for you.
Edit 2: demonstration of how to create and save an ArchiveObject
class ArchiveObject < ActiveRecord::Base
has_many :tags
end
class Tag < ActiveRecord::Base
belongs_to :archive_object
end
# create a structure
ao = ArchiveObject.new({
:name=>'A1',
:p1=>'some param',
:p2=>'some other param'})
t1 = Tag.new({
:name=>'T1',
:somevalue=>'S1',
:someothervalue=>'S2'})
t2 = Tag.new({
:name=>'T2',
:somevalue=>'S3',
:someothervalue=>'S4'})
ao.tags = [t1, t2]
ao.save
# at this point you have a record in archive_objects and two records in tags
Any fields you want present in either ArchiveObject or Tags you would generate through a migration.

Relational algebra - List the names of employees who do not work in the project 1

I've the following tables:
Employee
|name|employee_cod|
Project
|name|project_cod|
Work
|employee_cod|project_cod|
So, how can I list the names of employees who not work,
for example, in the project with project_cod = 1, with relational algebra?
The following not work:
π(employee.name(σ work.project_cod != 1 (Employee ∞ Work ∞ Project)
Because if I have the following data in work table:
| employee_cod | project_cod |
-------1--------------1-------
-------1--------------2-------
-------1--------------3-------
-------2--------------2-------
This σ work.project_cod != 1 will result in:
| employee_cod | project_cod |
-------1--------------2-------
-------1--------------3-------
-------2--------------2-------
But the employee with code = 1 should not be returned, because it participates in project 1
You first find all the employees who do work on the project. Those who don't are produced by relational difference (minus) operator.
Thank you for the tip Tegiri.
The solution is:
π name (Employee) - ( π name (σ project_cod = 1 (Employee ∞ Work)))

Bitwise operations in Postgres

I have the following tables:
types | id | name
------+----+----------
1 | A
2 | B
4 | C
8 | D
16| E
32| F
and
vendors | id | name | type
--------+----+----------+-----
1 | Alex | 2 //type B only
2 | Bob | 5 //A,C
3 | Cheryl | 32 //F
4 | David | 43 //F,D,A,B
5 | Ed | 15 //A,B,C,D
6 | Felix | 8 //D
7 | Gopal | 4 //C
8 | Herry | 9 //A,D
9 | Iris | 7 //A,B,C
10| Jack | 23 //A,B,C,E
I would like to query now:
select id, name from vendors where type & 16 >0 //should return Jack as he is type E
select id, name from vendors where type & 7 >0 //should return Ed, Iris, Jack
select id, name from vendors where type & 8 >0 //should return David, Ed, Felix, Herry
What is the best possible index for tables types and vendors in postgres? I may have millions of rows in vendors. Moreover, what are the tradeoffs of using this bitwise method compared with Many To Many relation using a 3rd table? Which is better?
Use can use partial indices to work around the fact that "&" isn't an indexable operator (afaik):
CREATE INDEX vendors_typeA ON vendors(id) WHERE (type & 2) > 0;
CREATE INDEX vendors_typeB ON vendors(id) WHERE (type & 4) > 0;
Of course, you'll need to add a new index every time you add a new type. Which is one of the reasons for expanding the data into an association table which can then be indexed properly. You can always write triggers to maintain a bitmask table additionally, but use the many-to-many table to actually maintain the data normally, as it will be much clearer.
If your entire evaluation of scaling and performance is to say "I may have millions of rows", you haven't done enough to start going for this sort of optimisation. Create a properly-structured clear model first, optimise it later on the basis of real statistics about how it performs.

Resources