Tagged: relational algebra
Distribution, commutation and association in database queries
Another reminder note.
The RDBMS query engine relies on these laws to build efficient queries:
Commutation:
Numbers may travel (commute)
or in relational algebra
Association:
Numbers may freely associate
or in relational algebra
Distribution:
Operators can be distributed
or, in relational algebra
A more general statement of these laws:
Commutative law:
Associative law:
Distributive law:
Date has an excellent discussion of all this on pp 124 – 127 of SQL and Relational Theory: How to Write Accurate SQL Code
Related articles
- Relational algebra problem (cartesianproduct.wordpress.com)
- Hall algebras are Grothendieck groups (sbseminar.wordpress.com)
Solving a relational query: part 5
Found: the relational algebra to tackle the problem outlined in part 3.
This is based on a solution to a similar problem found in Database Management Systems, Third Edition, though, unfortunately, that book prints a solution that is not relational algebra at all, as it relies on the order in which results are returned (relational algebra is a form of set algebra and sets have no order).
(Rename the attributes in TEMP1 – this is the step that is broken in the book)
Related articles
- Solving a relational query: part 4 (cartesianproduct.wordpress.com)
- Solving a relational query: part 3 (cartesianproduct.wordpress.com)
- Solving a relational query: part 1 (cartesianproduct.wordpress.com)
- Solving a relational query: part 2 (cartesianproduct.wordpress.com)
- Relational algebra problem (cartesianproduct.wordpress.com)
- Types of Database Management Systems (brighthub.com)
Solving a relational query: part 4
As Paul Rubin pointed out the SQL in the last post was broken – here’s some that, I think (please do not let it be wrong a second time!), by relying on SQL’s coercion of a tuple to a scalar, would work (Paul proposed an even more ambitions way of doing this in a single join):
SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND (SELECT COUNT(*) FROM
BID, LOT WHERE
BID.LOT_ID = B.LOT_ID AND BID.BIDDER_ID = U.USER_ID
AND BID.LOT_ID = LOT.LOT_ID) > 1
Still no clue as to how to do this with relational algebra though.
Related articles
- Solving a relational query: part 2 (cartesianproduct.wordpress.com)
- Solving a relational query: part 1 (cartesianproduct.wordpress.com)
- Relational division again (cartesianproduct.wordpress.com)
Solving a relational query: part 3
This is starting to depress me, now, because I thought I was full on top of all this material until I tried this question.
So, going back to the database specified in part 1 – I now need to find the relational algebra (not the SQL, I have done that) for: “find the name and email of any user who has made multiple bids for a lot including a bid exceeding £100″.
Various ideas have crossed my mind and some bits are easy eg gets a relation from which we coul;d extract users who bid more than £100 for anything, but how can I get it all to string together.
I think the SQL that works is:
SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND L.LOT_ID IN (
SELECT COUNT(*) > 1 FROM
LOT, BID WHERE
LOT.LOT_ID = L.LOT_ID AND BID.LOT_ID = LOT.LOT_ID
AND BID.BIDDER_ID = U.USER_ID)
Update: See part 4 (next post)
Related articles
- Solving a relational query: part 1 (cartesianproduct.wordpress.com)
- Solving a relational query: part 2 (cartesianproduct.wordpress.com)
- Relational algebra problem (cartesianproduct.wordpress.com)
- A co-Relational Model of Data for Large Shared Data Banks – ACM Queue (queue.acm.org)
Solving a relational query: part 2
With thanks to Professor Paul A. Rubin, I think this may be the way to solve this:

This joins LOT to a relation with one attribute (LOT_ID) made up from tuples where LOT_ID should only be returned where no bids have been made that are equal or greater than the reserve price.
Very difficult part question for an exam, though
Related articles
- Solving a relational query: part 1 (cartesianproduct.wordpress.com)
Solving a relational query: part 1
I have not found the answer to this, so thought I’d go through the steps.
Assume we have a relational database with these tables:
USER (USER_ID, NAME, EMAIL)
LOT (LOT_ID, SLLER_ID, TYPE, DESCRIPTION, RESERVE_PRICE, START_DATE, END_DATE)
BID (BID_ID, LOT_ID, BIDDER_ID, BID_DATE, AMOUNT)
FEE (LOT_ID, AMOUNT)
What is the relational algebra that finds “the description and reserve price of any lot where all bids have been less than the reserve price”.
I have been struggling with this and came up with this:
But that will not work as will return any lot where a bid has been made for less than the reserve price.
I am sure this is a question of relational division but have not yet been able to get my mind round the right way to formulate it.
Related articles
- Relational algebra problem (cartesianproduct.wordpress.com)
- Relational division again (cartesianproduct.wordpress.com)
Relational division again
Take the same relational database tables as before:
RESTAURANT(RT_ID, NAME, TYPE, LOCATION)
PROXIMITY(RT1_ID, RT2_ID, DISTANCE)
USER(U_ID, NAME, EMAIL)
REVIEW(U_ID, RT_ID, RT_DATE, RATING, COMMENT)
Find the names of the reviewers who reviewed all the restaurants in Earlsfield.
But what about SQL? As I wrote of relational division when I started the blog:
one has to implement a “double not exists” query
In this case that means “find the name of any reviewer for whom there is no Earlsfield restaurant not reviewed by that reviewer”:
SELECT NAME
FROM USER
WHERE NOT EXISTS (
SELECT *
FROM RESTAURANT
WHERE LOCATION=’Earlsfield’ AND
NOT EXISTS (
SELECT *
FROM REVIEW
WHERE REVIEW.U_ID = USER.U_ID AND
REVIEW.RT_ID = RESTAURANT.RT_ID))
At least, I hope so! I am finding this all quite heavy going – and have ordered a book – SQL and Relational Theory: How to Write Accurate SQL Code – in the hope that it will help makes things clearer. Any other recommendations gratefully received.
Update: It does work, amazingly enough.
Related Articles
- Relational algebra problem (cartesianproduct.wordpress.com)
- Microsoft Researchers: NoSQL Needs Standardization (pcworld.com)
- NoSQL, NewSQL highly scalable databases (nextbigfuture.com)
Relational algebra problem
Have a relational database with the following tables
RESTAURANT(RT_ID, NAME, TYPE, LOCATION)
PROXIMITY(RT1_ID, RT2_ID, DISTANCE)
USER(U_ID, NAME, EMAIL)
REVIEW(U_ID, RT_ID, RT_DATE, RATING, COMMENT)
What’s the relational algebra for “name of any user who has never given Chez Brian a rating lower than 5″?
Is this right?
Related Articles
Basic symbols of relational algebra
Apologies in advance but this blog might start to look like online revision notes (and a LaTeX scratchpad) for the next six weeks.
Projection
Selection
Rename
Conditional Join
Natural Join
(Page 100 onwards of Database Management Systems, Third Edition)
Relational division in SQL
In 1970 EF Codd proposed the relational model for databases. Although other and older models exist and are in use (primarily in the legacy world), relational databases are the dominant form.
Well, you know all that already – probably.
You probably also know that most database management systems make the relational model available to end users and programmers through SQL – though SQL is not a pure implementation of the relational model.
One of the missing chunks is the idea of relational division.
This is supposed to work like this:
(Apologies for the crudeness of the drawing: My xfig skills are not all they could be).
Taking the relations (tables) to be A, B and C as we move across and then down the graphic: A/B = C.
In other words A/B will return in C all the tuples (rows) in relation A (table A) where the divisor attribute values are present in A.
The parallel with simple arithmetic is this: B x C = A, hence A/B = C.
In relational algebra B x C would be the cartesian product of B and C, which in the simple example opposite would be A.
So the above example could be restated “Find all the rows in A where the sign column equals QQ (leaving out the sign column)”.
That’s a very useful function to have, but unfortunately it is not present in SQL. So do this one has to implement a “double not exists” query.
This particular construct caused me some difficulties but now I have got my head around it I thought I’d try and explain it – I hope – a bit more clearly than in one or two places.
Thinking of the query above again this can be restated as “Find me all the rows where it is not the case that the sign column is not QQ”
or in SQL:
SELECT Letter, Number, Code
FROM BIGTABLE AS A
WHERE NOT EXISTS (
SELECT *
FROM BIGTABLE AS BT
WHERE NOT EXISTS (
SELECT *
FROM BIGTABLE
WHERE A. SIGN = 'QQ')
Or in the real world (table slightly expanded):
mysql> SELECT * FROM BIGTABLE;
+--------+--------+------+------+
| LETTER | NUMBER | CODE | SIGN |
+--------+--------+------+------+
| A | 10 | ZX | QQ |
| B | 12 | TR | QQ |
| A | 15 | CB | NN |
+--------+--------+------+------+
3 rows in set (0.00 sec)
mysql> SELECT Letter, Number, Code FROM BIGTABLE AS A WHERE NOT EXISTS ( SELECT * FROM BIGTABLE AS BT WHERE NOT EXISTS ( SELECT * FROM BIGTABLE WHERE A.SIGN = 'QQ'));
+--------+--------+------+
| Letter | Number | Code |
+--------+--------+------+
| A | 10 | ZX |
| B | 12 | TR |
+--------+--------+------+
2 rows in set (0.02 sec)
So how does this work?
As the query scans through BIGTABLE AS A then the innermost query returns a row where A.SIGN = QQ. But the ‘NOT EXISTS’ means that is converted to ‘FALSE’ when such a row is return and ‘TRUE’ when no row is returned. The query above that then converts that to ensure only those rows where Sign = ‘QQ’ is returned.
OK, by now some of you may have realised that a much simpler query will also work in this case:
mysql> SELECT Letter, Number, Code FROM BIGTABLE AS A WHERE EXISTS ( SELECT * FROM BIGTABLE WHERE A.SIGN = 'QQ');
+--------+--------+------+
| Letter | Number | Code |
+--------+--------+------+
| A | 10 | ZX |
| B | 12 | TR |
+--------+--------+------+
2 rows in set (0.00 sec)
So here’s a more complex example that does require the two loops:
These are the relations
PART
SQL> select * from part;
P# PNAME COLOUR WEIGHT CITY
------ -------------------- ------ ---------- ---------------
P1 NUT RED 12 LONDON
P2 BOLT GREEN 17 PARIS
P3 SCREW BLUE 17 ROME
P4 SCREW RED 14 LONDON
P5 CAM BLUE 12 PARIS
P6 COG RED 19 LONDON
PROJECT
SQL> select * from project;
J# JNAME CITY
---- ---------- ---------------
J1 SORTER PARIS
J2 DISPLAY ROME
J3 OCR ATHENS
J4 CONSOLE ATHENS
J5 RAID LONDON
J6 EDS OSLO
J7 TAPE LONDON
7 rows selected.
SUPPLIER
SQL> select * from supplier;
S# SNAME STATUS CITY
----- -------------------- ---------- ---------------
S1 SMITH 20 LONDON
S2 JONES 10 PARIS
S3 BLAKE 30 PARIS
S4 CLARK 20 LONDON
S5 ADAMS 30 ATHENS
SUPPLY
SQL> select * from supply;
S# P# J# QUANTITY
----- ------ ---- ----------
S1 P1 J1 200
S1 P1 J4 700
S2 P3 J1 400
S2 P3 J2 200
S2 P3 J3 200
S2 P3 J4 500
S2 P3 J5 600
S2 P3 J6 400
S2 P3 J7 800
S2 P5 J2 100
S3 P3 J1 200
S3 P4 J2 500
S4 P6 J3 300
S4 P6 J7 300
S5 P2 J2 200
S5 P2 J4 100
S5 P5 J5 500
S5 P5 J7 100
S5 P6 J2 200
S5 P1 J4 100
S5 P3 J4 200
S5 P4 J4 800
S5 P5 J4 400
S5 P6 J4 500
24 rows selected.
Query is “Find the name of each part with weight greater than 15 that is supplied to all projects”. Restated in “double not exists” form this becomes “Find the name of each part with weight greater than 15 that no project is not supplied with”.
SELECT P.PNAME FROM PART P WHERE P.WEIGHT > 15 AND NOT EXISTS (SELECT * FROM PROJECT J WHERE NOT EXISTS (SELECT * FROM PROJECT, SUPPLY, PART WHERE PROJECT.J# = J.J# AND SUPPLY.J# = J.J# AND SUPPLY.P# = P.P#));
PNAME
--------------------
SCREW
If we just used the inner loop here then the query simply fails as it is not properly formed.
