Tag Archives: Relational database

Distribution, commutation and association in database queries


An image showing the commutativity of addition
Image via Wikipedia

Another reminder note.

The RDBMS query engine relies on these laws to build efficient queries:

Commutation:

Numbers may travel (commute)

a + b = b + a or in relational algebra A \Join B = B \Join A

Association:

Numbers may freely associate

(a + b) + c = a + (b + c) or in relational algebra (A \Join B) \Join C = A \Join (B \Join C)

Distribution:

Operators can be distributed

3 \times (4 + 5) = 3 \times 4 + 3 \times 5 or, in relational algebra \sigma(A \Join B) = \sigma(A) \Join \sigma(B)

A more general statement of these laws:

Commutative law: f(a, b) = f(b, a)
Associative law: f(a, f(b, c)) = f(f(a, b), c)
Distributive law: f(g(a,b)) = g(f(a,b))

Date has an excellent discussion of all this on pp 124 – 127 of SQL and Relational Theory: How to Write Accurate SQL Code

Solving a relational query: part 5


Found: the relational algebra to tackle the problem outlined in part 3.

This is based on a solution to a similar problem found in Database Management Systems, Third Edition, though, unfortunately, that book prints a solution that is not relational algebra at all, as it relies on the order in which results are returned (relational algebra is a form of set algebra and sets have no order).

\rho (TEMP1, \rho (BIDDER\_ID \rightarrow USER\_ID, \sigma _{amount > 100} (BID)) \Join USER \Join LOT)

(Rename the attributes in TEMP1 – this is the step that is broken in the book)

\rho (name \rightarrow TEMP1\_name, TEMP1)

\rho(TEMP2, TEMP1 \times ( \rho(BIDDER\_ID \rightarrow USER\_ID, BID) \Join USER \Join LOT ))

\Pi _{NAME, EMAIL} ( \sigma _{TEMP1\_LOT\_ID=LOT\_ID \wedge TEMP1\_USER\_ID=USER\_ID \wedge TEMP1\_BID\_ID \neq BID\_ID} (TEMP2))

Solving a relational query: part 4


As Paul Rubin pointed out the SQL in the last post was broken – here’s some that, I think (please do not let it be wrong a second time!), by relying on SQL’s coercion of a tuple to a scalar, would work (Paul proposed an even more ambitions way of doing this in a single join):

SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND (SELECT COUNT(*) FROM
BID, LOT WHERE
BID.LOT_ID = B.LOT_ID AND BID.BIDDER_ID = U.USER_ID
AND BID.LOT_ID = LOT.LOT_ID) > 1

Still no clue as to how to do this with relational algebra though.

Solving a relational query: part 3


This chart represents several constituent comp...
Image via Wikipedia

This is starting to depress me, now, because I thought I was full on top of all this material until I tried this question.

So, going back to the database specified in part 1 – I now need to find the relational algebra (not the SQL, I have done that) for: “find the name and email of any user who has made multiple bids for a lot including a bid exceeding £100″.

Various ideas have crossed my mind and some bits are easy eg USER \Join \sigma _{AMOUNT > 100} \rho (BIDDER\_ID -> USER\_ID,\ BID) gets a relation from which we coul;d extract users who bid more than £100 for anything, but how can I get it all to string together.

I think the SQL that works is:

SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND L.LOT_ID IN (
SELECT COUNT(*) > 1 FROM
LOT, BID WHERE
LOT.LOT_ID = L.LOT_ID AND BID.LOT_ID = LOT.LOT_ID
AND BID.BIDDER_ID = U.USER_ID)

Solving a relational query: part 2


Tuple Image from [www.youvan.com], Example 7 i...
Image via Wikipedia

With thanks to Professor Paul A. Rubin, I think this may be the way to solve this:

Relational algebra formula
This joins LOT to a relation with one attribute (LOT_ID) made up from tuples where LOT_ID should only be returned where no bids have been made that are equal or greater than the reserve price.

Very difficult part question for an exam, though :(

Solving a relational query: part 1


I have not found the answer to this, so thought I’d go through the steps.

Assume we have a relational database with these tables:

USER (USER_ID, NAME, EMAIL)
LOT (LOT_ID, SLLER_ID, TYPE, DESCRIPTION, RESERVE_PRICE, START_DATE, END_DATE)
BID (BID_ID, LOT_ID, BIDDER_ID, BID_DATE, AMOUNT)
FEE (LOT_ID, AMOUNT)

What is the relational algebra that finds “the description and reserve price of any lot where all bids have been less than the reserve price”.

I have been struggling with this and came up with this:

\Pi_{(DESCRIPTION,RESERVE\_PRICE)}\ \sigma_{(\Pi_{(LOT\_ID=(\Pi_{LOT\_ID}(LOT\Join BID)\ -\ \Pi_{LOT\_ID}(LOT\Join BID-\sigma_{(AMOUNT<RESERVE\_PRICE)}(LOT\Join BID))})(LOT)}

But that will not work as will return any lot where a bid has been made for less than the reserve price.

I am sure this is a question of relational division but have not yet been able to get my mind round the right way to formulate it.

Looks like a great book


Relational model Data items organized as a set...
Image via Wikipedia

My copy of C. J. Date‘s SQL and Relational Theory: How to Write Accurate SQL Code turned up today – I have only read one chapter so far but already I have a feeling that it is going to have been worth every penny.

Date is concerned not to write another SQL primer but to give those with some SQL experience a good grounding in the relational model and the set theory on which it is based. It just feels more like a proper scientific/mathematical text book and it also seems to be well written.

Very pleased with the purchase and will keep you all posted on how it goes: at the moment I am just wishing I had bought it months ago.

Relational division again


Relational divisionTake the same relational database tables as before:

RESTAURANT(RT_ID, NAME, TYPE, LOCATION)
PROXIMITY(RT1_ID, RT2_ID, DISTANCE)
USER(U_ID, NAME, EMAIL)
REVIEW(U_ID, RT_ID, RT_DATE, RATING, COMMENT)

Find the names of the reviewers who reviewed all the restaurants in Earlsfield.

In relational algebra:

\Pi _{NAME} ( USER \Join REVIEW ) \div \Pi _{RT\_ID} ( \sigma _{LOCATION = 'Earlsfield'} (RESTAURANT))

But what about SQL? As I wrote of relational division when I started the blog:

one has to implement a “double not exists” query

In this case that means “find the name of any reviewer for whom there is no Earlsfield restaurant not reviewed by that reviewer”:

SELECT NAME
FROM USER
WHERE NOT EXISTS (
SELECT *
FROM RESTAURANT
WHERE LOCATION=’Earlsfield’ AND
NOT EXISTS (
SELECT *
FROM REVIEW
WHERE REVIEW.U_ID = USER.U_ID AND
REVIEW.RT_ID = RESTAURANT.RT_ID))

At least, I hope so! I am finding this all quite heavy going – and have ordered a book – SQL and Relational Theory: How to Write Accurate SQL Code – in the hope that it will help makes things clearer. Any other recommendations gratefully received.

Update: It does work, amazingly enough.

Relational algebra problem


An illustration of Relational model concepts, ...
Image via Wikipedia

Have a relational database with the following tables

RESTAURANT(RT_ID, NAME, TYPE, LOCATION)
PROXIMITY(RT1_ID, RT2_ID, DISTANCE)
USER(U_ID, NAME, EMAIL)
REVIEW(U_ID, RT_ID, RT_DATE, RATING, COMMENT)

What’s the relational algebra for “name of any user who has never given Chez Brian a rating lower than 5″?

\Pi _{NAME} (USER) - \Pi _{NAME} ( \sigma _{RESTNAME='Chez\ Brian' \wedge RATING < 5}( \rho (RESTAURANT(NAME \rightarrow RESTNAME), RESTAURANT) \Join REVIEW) \Join USER)

Is this right?