## Distribution, commutation and association in database queries

Another reminder note.

The RDBMS query engine relies on these laws to build efficient queries:

Commutation:

Numbers may travel (commute) $a + b = b + a$ or in relational algebra $A \Join B = B \Join A$

Association:

Numbers may freely associate $(a + b) + c = a + (b + c)$ or in relational algebra $(A \Join B) \Join C = A \Join (B \Join C)$

Distribution:

Operators can be distributed $3 \times (4 + 5) = 3 \times 4 + 3 \times 5$ or, in relational algebra $\sigma(A \Join B) = \sigma(A) \Join \sigma(B)$

A more general statement of these laws:

Commutative law: $f(a, b) = f(b, a)$
Associative law: $f(a, f(b, c)) = f(f(a, b), c)$
Distributive law: $f(g(a,b)) = g(f(a,b))$

Date has an excellent discussion of all this on pp 124 – 127 of SQL and Relational Theory: How to Write Accurate SQL Code Posted on

## Solving a relational query: part 5

Found: the relational algebra to tackle the problem outlined in part 3.

This is based on a solution to a similar problem found in Database Management Systems, Third Edition , though, unfortunately, that book prints a solution that is not relational algebra at all, as it relies on the order in which results are returned (relational algebra is a form of set algebra and sets have no order). $\rho (TEMP1, \rho (BIDDER\_ID \rightarrow USER\_ID, \sigma _{amount > 100} (BID)) \Join USER \Join LOT)$

(Rename the attributes in TEMP1 – this is the step that is broken in the book) $\rho (name \rightarrow TEMP1\_name, TEMP1)$ $\rho(TEMP2, TEMP1 \times ( \rho(BIDDER\_ID \rightarrow USER\_ID, BID) \Join USER \Join LOT ))$ $\Pi _{NAME, EMAIL} ( \sigma _{TEMP1\_LOT\_ID=LOT\_ID \wedge TEMP1\_USER\_ID=USER\_ID \wedge TEMP1\_BID\_ID \neq BID\_ID} (TEMP2))$

## Solving a relational query: part 4

As Paul Rubin pointed out the SQL in the last post was broken – here’s some that, I think (please do not let it be wrong a second time!), by relying on SQL’s coercion of a tuple to a scalar, would work (Paul proposed an even more ambitions way of doing this in a single join):

SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND (SELECT COUNT(*) FROM
BID, LOT WHERE
BID.LOT_ID = B.LOT_ID AND BID.BIDDER_ID = U.USER_ID
AND BID.LOT_ID = LOT.LOT_ID) > 1

Still no clue as to how to do this with relational algebra though.

## Solving a relational query: part 3

This is starting to depress me, now, because I thought I was full on top of all this material until I tried this question.

So, going back to the database specified in part 1 – I now need to find the relational algebra (not the SQL, I have done that) for: “find the name and email of any user who has made multiple bids for a lot including a bid exceeding £100”.

Various ideas have crossed my mind and some bits are easy eg $USER \Join \sigma _{AMOUNT > 100} \rho (BIDDER\_ID -> USER\_ID,\ BID)$ gets a relation from which we coul;d extract users who bid more than £100 for anything, but how can I get it all to string together.

I think the SQL that works is:

SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND L.LOT_ID IN (
SELECT COUNT(*) > 1 FROM
LOT, BID WHERE
LOT.LOT_ID = L.LOT_ID AND BID.LOT_ID = LOT.LOT_ID
AND BID.BIDDER_ID = U.USER_ID)

## Solving a relational query: part 2

With thanks to Professor Paul A. Rubin, I think this may be the way to solve this: This joins LOT to a relation with one attribute (LOT_ID) made up from tuples where LOT_ID should only be returned where no bids have been made that are equal or greater than the reserve price.

Very difficult part question for an exam, though 😦

## Solving a relational query: part 1

Assume we have a relational database with these tables:

USER (USER_ID, NAME, EMAIL)
LOT (LOT_ID, SLLER_ID, TYPE, DESCRIPTION, RESERVE_PRICE, START_DATE, END_DATE)
BID (BID_ID, LOT_ID, BIDDER_ID, BID_DATE, AMOUNT)
FEE (LOT_ID, AMOUNT)

What is the relational algebra that finds “the description and reserve price of any lot where all bids have been less than the reserve price”.

I have been struggling with this and came up with this: $\Pi_{(DESCRIPTION,RESERVE\_PRICE)}\ \sigma_{(\Pi_{(LOT\_ID=(\Pi_{LOT\_ID}(LOT\Join BID)\ -\ \Pi_{LOT\_ID}(LOT\Join BID-\sigma_{(AMOUNT

But that will not work as will return any lot where a bid has been made for less than the reserve price.

I am sure this is a question of relational division but have not yet been able to get my mind round the right way to formulate it.

## Looks like a great book

My copy of C. J. Date‘s SQL and Relational Theory: How to Write Accurate SQL Code turned up today – I have only read one chapter so far but already I have a feeling that it is going to have been worth every penny.

Date is concerned not to write another SQL primer but to give those with some SQL experience a good grounding in the relational model and the set theory on which it is based. It just feels more like a proper scientific/mathematical text book and it also seems to be well written.

Very pleased with the purchase and will keep you all posted on how it goes: at the moment I am just wishing I had bought it months ago.

## Relational division again Take the same relational database tables as before:

RESTAURANT(RT_ID, NAME, TYPE, LOCATION)
PROXIMITY(RT1_ID, RT2_ID, DISTANCE)
USER(U_ID, NAME, EMAIL)
REVIEW(U_ID, RT_ID, RT_DATE, RATING, COMMENT)

Find the names of the reviewers who reviewed all the restaurants in Earlsfield. $\Pi _{NAME} ( USER \Join REVIEW ) \div \Pi _{RT\_ID} ( \sigma _{LOCATION = 'Earlsfield'} (RESTAURANT))$

But what about SQL? As I wrote of relational division when I started the blog:

one has to implement a “double not exists” query

In this case that means “find the name of any reviewer for whom there is no Earlsfield restaurant not reviewed by that reviewer”:

SELECT NAME
FROM USER
WHERE NOT EXISTS (
SELECT *
FROM RESTAURANT
WHERE LOCATION=’Earlsfield’ AND
NOT EXISTS (
SELECT *
FROM REVIEW
WHERE REVIEW.U_ID = USER.U_ID AND
REVIEW.RT_ID = RESTAURANT.RT_ID))

At least, I hope so! I am finding this all quite heavy going – and have ordered a book – SQL and Relational Theory: How to Write Accurate SQL Code – in the hope that it will help makes things clearer. Any other recommendations gratefully received.

Update: It does work, amazingly enough.

## Relational algebra problem $\Pi _{NAME} (USER) - \Pi _{NAME} ( \sigma _{RESTNAME='Chez\ Brian' \wedge RATING < 5}( \rho (RESTAURANT(NAME \rightarrow RESTNAME), RESTAURANT) \Join REVIEW) \Join USER)$