# Distribution, commutation and association in database queries

Image via Wikipedia

Another reminder note.

The RDBMS query engine relies on these laws to build efficient queries:

Commutation:

Numbers may travel (commute)

$a + b = b + a$ or in relational algebra $A \Join B = B \Join A$

Association:

Numbers may freely associate

$(a + b) + c = a + (b + c)$ or in relational algebra $(A \Join B) \Join C = A \Join (B \Join C)$

Distribution:

Operators can be distributed

$3 \times (4 + 5) = 3 \times 4 + 3 \times 5$ or, in relational algebra $\sigma(A \Join B) = \sigma(A) \Join \sigma(B)$

A more general statement of these laws:

Commutative law: $f(a, b) = f(b, a)$
Associative law: $f(a, f(b, c)) = f(f(a, b), c)$
Distributive law: $f(g(a,b)) = g(f(a,b))$

Date has an excellent discussion of all this on pp 124 – 127 of SQL and Relational Theory: How to Write Accurate SQL Code

I did warn that this blog, might be used to remind myself of things: and this is one.

Inconsistent analysis is a problem with databases when an aggregate function is used on data that is being updated by another transaction eg

T1                                            T2

$\alpha$ =SUM(accounts(4 .. 7))

account(1) = $\alpha$          account(4) = account(4) + 10

account(2) = $\alpha + 5$

By the final line $\alpha$ is out of date (as account 4 has been updated but the SUM total has not).

# The long way round?

Image via Wikipedia

This SQL works for the problem posed below:

SELECT X.name, X.journeys, X.cname
FROM
(SELECT P.name, COUNT(T.trip_no) AS journeys, C.name AS cname
FROM
Passenger P, Trip T, Company C, Pass_in_trip PIT
WHERE
P.ID_psg = PIT.ID_psg AND PIT.trip_no = T.trip_no
AND T.ID_comp = C.ID_comp
AND
NOT EXISTS (SELECT * FROM
Trip, Pass_in_trip
WHERE Pass_in_trip.ID_psg = P.ID_psg AND Pass_in_trip.trip_no = Trip.trip_no
AND Trip.ID_comp <> C.ID_comp)
GROUP BY P.name, C.name
) X WHERE X.journeys =
(SELECT MAX(Y.travels) FROM
(SELECT COUNT(T2.trip_no) AS travels
FROM
Passenger P2, Trip T2, Company C2, Pass_in_trip PIT2
WHERE
P2.ID_psg = PIT2.ID_psg AND PIT2.trip_no = T2.trip_no
AND T2.ID_comp = C2.ID_comp
AND
NOT EXISTS (SELECT * FROM
Trip, Pass_in_trip
WHERE Pass_in_trip.ID_psg = P2.ID_psg AND Pass_in_trip.trip_no = Trip.trip_no
AND Trip.ID_comp <> C2.ID_comp)
GROUP BY P2.name, C2.name) Y)

Though it seems very long winded and I have seen other ways, involving OUTER JOIN suggested on the internet.

# Another SQL puzzle

Image by mobology via Flickr

Once more from http://www.sql-ex.ru which seems to have a phase-of-the-moon related availability.

Database is defined as:

Database schema consists of 4 tables:
Company(ID_comp, name)
Trip(trip_no, ID_comp, plane, town_from, town_to, time_out, time_in)
Passenger(ID_psg, name)
Pass_in_trip(trip_no, date, ID_psg, place)
Company table has ID and name of the company, which transports passengers. Trip table has information about trips: trip number, company ID, plane type, departure city, arrival city, departure time, and arrival time. The passenger table has passenger’s ID and passenger’s name. Pass_in_trip table has information about the flights: trip number, departure date (day), passenger’s ID and his place during the flight. We should note that,
– Any trip is being accomplished every day; duration of a flight is less than a calendar-day (24 hours);
– Time and date are considered comparatively one time zone;
– The departure time and the arrival time are given to within a minute;
– There can be the passengers bearing the same names (for example, Joe Bloggs);
– The place during the flight is a number followed by a letter; the number defines the row number, the letter (a – d) – the place in the row (from the left to the right) in the alphabetical order;
– Relationships and restrictions are shown in the data schema.

And the question is:

Among the clients which only use a single company, find the different passengers who have flown more often than others. Result set: passenger name, number of trips, and company name.

So, I have this SQL:

SELECT P.name, COUNT(T.trip_no) AS journeys, C.name
FROM
Passenger P, Trip T, Company C, Pass_in_trip PIT
WHERE
P.ID_psg = PIT.ID_psg AND PIT.trip_no = T.trip_no
AND T.ID_comp = C.ID_comp
AND
NOT EXISTS (SELECT * FROM
Passenger, Trip, Company, Pass_in_trip
WHERE Pass_in_trip.ID_psg = P.ID_psg AND Pass_in_trip.trip_no = Trip.trip_no
AND Trip.ID_comp <> C.ID_comp)
GROUP BY P.name, C.name

Which correctly, as far as I can see, outputs the names of clients who use only one company, the number of trips they have made and the company name.

But how do I limit that to the MAX?

I have tried adding “HAVING MAX(COUNT(T.trip))” but the RDMS objects – telling me I am trying to evaluate something that does not give a Boolean answer as a Boolean.

# Let’s try that again, shall we?

This query works for the previously mentioned SQL problem:

SELECT K.BATTLE
FROM
(SELECT S.name AS SHIP, B.name AS BATTLE, C.country AS COUNTRY
FROM
Ships S, Classes C, Battles B, Outcomes O
WHERE
S.class = C.class AND O.battle = B.name AND O.ship = S.name) K CROSS JOIN
(SELECT S.name AS SHIP, B.name AS BATTLE, C.country AS COUNTRY
FROM
Ships S, Classes C, Battles B, Outcomes O
WHERE
S.class = C.class AND O.battle = B.name AND O.ship = S.name) J
WHERE K.COUNTRY = J.COUNTRY AND K.BATTLE = J.BATTLE
GROUP BY K.BATTLE
HAVING COUNT(K.BATTLE) >=9

But it’s still not perfect, because it would record a positive answer if a single ship from 9 countries took part in a battle or two ships from three countries (something that certainly took place in the Pacific).

So I cheated and looked for clues – and this works properly:

SELECT K.BATTLE
FROM
(SELECT S.name AS SHIP, B.name AS BATTLE, C.country AS COUNTRY
FROM
Ships S, Classes C, Battles B, Outcomes O
WHERE
S.class = C.class AND O.battle = B.name AND O.ship = S.name) K
GROUP BY K.BATTLE
HAVING COUNT(K.COUNTRY) >=3

# Solving an old SQL puzzle

Image by stewartmorris via Flickr

Ages ago I was puzzled by an online SQL test (sadly no longer online, but here’s the text):

DESCRIPTION

The database of naval ships that took part in World War II is under consideration. The database has the following relations:
Classes(class, type, country, numGuns, bore, displacement)
Ships(name, class, launched)
Battles(name, date)
Outcomes(ship, battle, result)
Ships in classes are arranged to a single project. A class is normally assigned the name of the first ship in the class under consideration (head ship); otherwise, the class name does not coincide with any ship name in the database.
The Classes relation includes the class name, type (bb for a battle ship, or bc for a battle cruiser), country where the ship was built, number of main guns, gun caliber (diameter of the gun barrel, in inches), and displacement (weight in tons). The Ships relation includes the ship name, its class name, and launch year. The Battles relation covers the name and date of a battle the ships participated; while the result of their participation in the battle (sunk, damaged, or unharmed – OK) is in the Outcomes relation. Note: the Outcomes relation may include the ships not included in the Ships relation.

PROBLEM

Point out the battles in which at least three ships from the same country took part.

Well, I now think I can solve it:

SELECT BETA.name
FROM (SELECT S.name, C.country
FROM Ships S, Classes C WHERE S.class = C.class) ALPHA,
(SELECT B.name, O.ship
FROM Battles B, Outcomes O WHERE B.name = O.battle) BETA
WHERE
ALPHA.name = BETA.ship
GROUP BY BETA.name
HAVING COUNT (
SELECT * FROM
(SELECT S.name, C.country
FROM Ships S, Classes C WHERE S.class = C.class) X,
(SELECT B.name, O.ship
FROM Battles B, Outcomes O WHERE B.name = O.battle) Y
WHERE
X.name = ALPHA.name AND
Y.name = BETA.name AND
X.name = B.ship AND
A.country = ALPHA.country
)> 2

SELECT BETA.BATTLE
FROM (SELECT S.NAME, C.COUNTRY
FROM SHIPS S, CLASSES C WHERE S.CLASS = C.CLASS) ALPHA,
(SELECT B.NAME, O.SHIP
FROM BATTLES B, OUTCOMES O WHERE B.NAME = O.BATTLE) BETA
WHERE
ALPHA.SHIP = BETA.SHIP
GROUP BY BETA.BATTLE
HAVING COUNT (
SELECT * FROM
(SELECT S.NAME, C.COUNTRY
FROM SHIPS S, CLASSES C WHERE S.CLASS = C.CLASS) X,
(SELECT B.NAME, O.SHIP
FROM BATTLES B, OUTCOMES O WHERE B.NAME = O.BATTLE) Y
WHERE
X.BATTLE = ALPHA.BATTLE AND
Y.BATTLE = BETA.BATTLE AND
A.COUNTRY = ALPHA.COUNTRY)
)> 2

Sadly as the original is not online I cannot test this and it is so complex I am far from 100% confident there is no mistake, but….

# Solving a relational query: part 5

Found: the relational algebra to tackle the problem outlined in part 3.

This is based on a solution to a similar problem found in Database Management Systems, Third Edition, though, unfortunately, that book prints a solution that is not relational algebra at all, as it relies on the order in which results are returned (relational algebra is a form of set algebra and sets have no order).

$\rho (TEMP1, \rho (BIDDER\_ID \rightarrow USER\_ID, \sigma _{amount > 100} (BID)) \Join USER \Join LOT)$

(Rename the attributes in TEMP1 – this is the step that is broken in the book)

$\rho (name \rightarrow TEMP1\_name, TEMP1)$

$\rho(TEMP2, TEMP1 \times ( \rho(BIDDER\_ID \rightarrow USER\_ID, BID) \Join USER \Join LOT ))$

$\Pi _{NAME, EMAIL} ( \sigma _{TEMP1\_LOT\_ID=LOT\_ID \wedge TEMP1\_USER\_ID=USER\_ID \wedge TEMP1\_BID\_ID \neq BID\_ID} (TEMP2))$

# Solving a relational query: part 4

As Paul Rubin pointed out the SQL in the last post was broken – here’s some that, I think (please do not let it be wrong a second time!), by relying on SQL’s coercion of a tuple to a scalar, would work (Paul proposed an even more ambitions way of doing this in a single join):

SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND (SELECT COUNT(*) FROM
BID, LOT WHERE
BID.LOT_ID = B.LOT_ID AND BID.BIDDER_ID = U.USER_ID
AND BID.LOT_ID = LOT.LOT_ID) > 1

Still no clue as to how to do this with relational algebra though.

# Solving a relational query: part 3

Image via Wikipedia

This is starting to depress me, now, because I thought I was full on top of all this material until I tried this question.

So, going back to the database specified in part 1 – I now need to find the relational algebra (not the SQL, I have done that) for: “find the name and email of any user who has made multiple bids for a lot including a bid exceeding £100″.

Various ideas have crossed my mind and some bits are easy eg $USER \Join \sigma _{AMOUNT > 100} \rho (BIDDER\_ID -> USER\_ID,\ BID)$ gets a relation from which we coul;d extract users who bid more than £100 for anything, but how can I get it all to string together.

I think the SQL that works is:

SELECT DISTINCT U.NAME, U.EMAIL
FROM USER U, BID B, LOT L
WHERE U.USER_ID = B.BIDDER_ID AND
B.AMOUNT > 100 AND B.LOT_ID = L.LOT_ID
AND L.LOT_ID IN (
SELECT COUNT(*) > 1 FROM
LOT, BID WHERE
LOT.LOT_ID = L.LOT_ID AND BID.LOT_ID = LOT.LOT_ID
AND BID.BIDDER_ID = U.USER_ID)

# Solving a relational query: part 2

Image via Wikipedia

With thanks to Professor Paul A. Rubin, I think this may be the way to solve this:

This joins LOT to a relation with one attribute (LOT_ID) made up from tuples where LOT_ID should only be returned where no bids have been made that are equal or greater than the reserve price.

Very difficult part question for an exam, though