## Solving a relational query: part 1

Assume we have a relational database with these tables:

USER (USER_ID, NAME, EMAIL)
LOT (LOT_ID, SLLER_ID, TYPE, DESCRIPTION, RESERVE_PRICE, START_DATE, END_DATE)
BID (BID_ID, LOT_ID, BIDDER_ID, BID_DATE, AMOUNT)
FEE (LOT_ID, AMOUNT)

What is the relational algebra that finds “the description and reserve price of any lot where all bids have been less than the reserve price”.

I have been struggling with this and came up with this:

$\Pi_{(DESCRIPTION,RESERVE\_PRICE)}\ \sigma_{(\Pi_{(LOT\_ID=(\Pi_{LOT\_ID}(LOT\Join BID)\ -\ \Pi_{LOT\_ID}(LOT\Join BID-\sigma_{(AMOUNT

But that will not work as will return any lot where a bid has been made for less than the reserve price.

I am sure this is a question of relational division but have not yet been able to get my mind round the right way to formulate it.

## Relational division again

Take the same relational database tables as before:

RESTAURANT(RT_ID, NAME, TYPE, LOCATION)
PROXIMITY(RT1_ID, RT2_ID, DISTANCE)
USER(U_ID, NAME, EMAIL)
REVIEW(U_ID, RT_ID, RT_DATE, RATING, COMMENT)

Find the names of the reviewers who reviewed all the restaurants in Earlsfield.

$\Pi _{NAME} ( USER \Join REVIEW ) \div \Pi _{RT\_ID} ( \sigma _{LOCATION = 'Earlsfield'} (RESTAURANT))$

But what about SQL? As I wrote of relational division when I started the blog:

one has to implement a “double not exists” query

In this case that means “find the name of any reviewer for whom there is no Earlsfield restaurant not reviewed by that reviewer”:

SELECT NAME
FROM USER
WHERE NOT EXISTS (
SELECT *
FROM RESTAURANT
WHERE LOCATION=’Earlsfield’ AND
NOT EXISTS (
SELECT *
FROM REVIEW
WHERE REVIEW.U_ID = USER.U_ID AND
REVIEW.RT_ID = RESTAURANT.RT_ID))

At least, I hope so! I am finding this all quite heavy going – and have ordered a book – SQL and Relational Theory: How to Write Accurate SQL Code – in the hope that it will help makes things clearer. Any other recommendations gratefully received.

Update: It does work, amazingly enough.

## Relational division in SQL

In 1970 EF Codd proposed the relational model for databases. Although other and older models exist and are in use (primarily in the legacy world), relational databases are the dominant form.

Well, you know all that already – probably.

You probably also know that most database management systems make the relational model available to end users and programmers through SQL – though SQL is not a pure implementation of the relational model.

One of the missing chunks is the idea of relational division.

This is supposed to work like this:

(Apologies for the crudeness of the drawing: My xfig skills are not all they could be).

Taking the relations (tables) to be A, B and C as we move across and then down the graphic: A/B = C.

In other words A/B will return in C all the tuples (rows) in relation A (table A) where the divisor attribute values are present in A.

The parallel with simple arithmetic is this: B x C = A, hence A/B = C.

In relational algebra B x C would be the cartesian product of B and C, which in the simple example opposite would be A.

So the above example could be restated “Find all the rows in A where the sign column equals QQ (leaving out the sign column)”.

That’s a very useful function to have, but unfortunately it is not present in SQL. So do this one has to implement a “double not exists” query.

This particular construct caused me some difficulties but now I have got my head around it I thought I’d try and explain it – I hope – a bit more clearly than in one or two places.

Thinking of the query above again this can be restated as “Find me all the rows where it is not the case that the sign column is not QQ”

or in SQL:
 SELECT Letter, Number, Code FROM BIGTABLE AS A WHERE NOT EXISTS ( SELECT * FROM BIGTABLE AS BT WHERE NOT EXISTS ( SELECT * FROM BIGTABLE WHERE A. SIGN = 'QQ') 
Or in the real world (table slightly expanded):

 mysql> SELECT * FROM BIGTABLE; +--------+--------+------+------+ | LETTER | NUMBER | CODE | SIGN | +--------+--------+------+------+ | A      |     10 | ZX   | QQ   | | B      |     12 | TR   | QQ   | | A      |     15 | CB   | NN   | +--------+--------+------+------+ 3 rows in set (0.00 sec)
 mysql> SELECT Letter, Number, Code FROM BIGTABLE AS A WHERE NOT EXISTS ( SELECT * FROM BIGTABLE AS BT WHERE NOT EXISTS ( SELECT * FROM BIGTABLE WHERE A.SIGN = 'QQ')); +--------+--------+------+ | Letter | Number | Code | +--------+--------+------+ | A      |     10 | ZX   | | B      |     12 | TR   | +--------+--------+------+ 2 rows in set (0.02 sec) 
So how does this work?

As the query scans through BIGTABLE AS A then the innermost query returns a row where A.SIGN = QQ. But the ‘NOT EXISTS’ means that is converted to ‘FALSE’ when such a row is return and ‘TRUE’ when no row is returned. The query above that then converts that to ensure only those rows where Sign = ‘QQ’ is returned.

OK, by now some of you may have realised that a much simpler query will also work in this case:
 mysql> SELECT Letter, Number, Code FROM BIGTABLE AS A WHERE EXISTS ( SELECT * FROM BIGTABLE WHERE A.SIGN = 'QQ'); +--------+--------+------+ | Letter | Number | Code | +--------+--------+------+ | A      |     10 | ZX   | | B      |     12 | TR   | +--------+--------+------+ 2 rows in set (0.00 sec) 
So here’s a more complex example that does require the two loops:

These are the relations

PART
 SQL> select * from part;
 P#     PNAME                COLOUR     WEIGHT CITY ------ -------------------- ------ ---------- --------------- P1     NUT                  RED            12 LONDON P2     BOLT                 GREEN          17 PARIS P3     SCREW                BLUE           17 ROME P4     SCREW                RED            14 LONDON P5     CAM                  BLUE           12 PARIS P6     COG                  RED            19 LONDON 
PROJECT
 SQL> select * from project;  J#   JNAME      CITY ---- ---------- --------------- J1   SORTER     PARIS J2   DISPLAY    ROME J3   OCR        ATHENS J4   CONSOLE    ATHENS J5   RAID       LONDON J6   EDS        OSLO J7   TAPE       LONDON

7 rows selected.

SUPPLIER
 SQL> select * from supplier;  S#    SNAME                    STATUS CITY ----- -------------------- ---------- --------------- S1    SMITH                        20 LONDON S2    JONES                        10 PARIS S3    BLAKE                        30 PARIS S4    CLARK                        20 LONDON S5    ADAMS                        30 ATHENS 
SUPPLY
 SQL> select * from supply;  S#    P#     J#     QUANTITY ----- ------ ---- ---------- S1    P1     J1          200 S1    P1     J4          700 S2    P3     J1          400 S2    P3     J2          200 S2    P3     J3          200 S2    P3     J4          500 S2    P3     J5          600 S2    P3     J6          400 S2    P3     J7          800 S2    P5     J2          100 S3    P3     J1          200 S3    P4     J2          500 S4    P6     J3          300 S4    P6     J7          300 S5    P2     J2          200 S5    P2     J4          100 S5    P5     J5          500 S5    P5     J7          100 S5    P6     J2          200 S5    P1     J4          100 S5    P3     J4          200 S5    P4     J4          800 S5    P5     J4          400 S5    P6     J4          500

24 rows selected.

Query is “Find the name of each part with weight greater than 15 that is supplied to all projects”. Restated in “double not exists” form this becomes “Find the name of each part with weight greater than 15 that no project is not supplied with”.
 SELECT P.PNAME FROM PART P WHERE P.WEIGHT > 15 AND NOT EXISTS (SELECT * FROM PROJECT J WHERE NOT EXISTS (SELECT * FROM PROJECT, SUPPLY, PART WHERE PROJECT.J# = J.J# AND SUPPLY.J# = J.J# AND SUPPLY.P# = P.P#));  PNAME -------------------- SCREW 
If we just used the inner loop here then the query simply fails as it is not properly formed.