Distribution, commutation and association in database queries


An image showing the commutativity of addition
Image via Wikipedia

Another reminder note.

The RDBMS query engine relies on these laws to build efficient queries:

Commutation:

Numbers may travel (commute)

a + b = b + a or in relational algebra A \Join B = B \Join A

Association:

Numbers may freely associate

(a + b) + c = a + (b + c) or in relational algebra (A \Join B) \Join C = A \Join (B \Join C)

Distribution:

Operators can be distributed

3 \times (4 + 5) = 3 \times 4 + 3 \times 5 or, in relational algebra \sigma(A \Join B) = \sigma(A) \Join \sigma(B)

A more general statement of these laws:

Commutative law: f(a, b) = f(b, a)
Associative law: f(a, f(b, c)) = f(f(a, b), c)
Distributive law: f(g(a,b)) = g(f(a,b))

Date has an excellent discussion of all this on pp 124 – 127 of SQL and Relational Theory: How to Write Accurate SQL Code

Inconsistent analysis (unrepeatable read)


I did warn that this blog, might be used to remind myself of things: and this is one.

Inconsistent analysis is a problem with databases when an aggregate function is used on data that is being updated by another transaction eg

T1                                            T2

\alpha =SUM(accounts(4 .. 7))

account(1) = \alpha           account(4) = account(4) + 10

account(2) = \alpha + 5

By the final line \alpha is out of date (as account 4 has been updated but the SUM total has not).

The long way round?


This chart represents several constituent comp...
Image via Wikipedia

This SQL works for the problem posed below:

SELECT X.name, X.journeys, X.cname
FROM
(SELECT P.name, COUNT(T.trip_no) AS journeys, C.name AS cname
FROM
Passenger P, Trip T, Company C, Pass_in_trip PIT
WHERE
P.ID_psg = PIT.ID_psg AND PIT.trip_no = T.trip_no
AND T.ID_comp = C.ID_comp
AND
NOT EXISTS (SELECT * FROM
Trip, Pass_in_trip
WHERE Pass_in_trip.ID_psg = P.ID_psg AND Pass_in_trip.trip_no = Trip.trip_no
AND Trip.ID_comp <> C.ID_comp)
GROUP BY P.name, C.name
) X WHERE X.journeys =
(SELECT MAX(Y.travels) FROM
(SELECT COUNT(T2.trip_no) AS travels
FROM
Passenger P2, Trip T2, Company C2, Pass_in_trip PIT2
WHERE
P2.ID_psg = PIT2.ID_psg AND PIT2.trip_no = T2.trip_no
AND T2.ID_comp = C2.ID_comp
AND
NOT EXISTS (SELECT * FROM
Trip, Pass_in_trip
WHERE Pass_in_trip.ID_psg = P2.ID_psg AND Pass_in_trip.trip_no = Trip.trip_no
AND Trip.ID_comp <> C2.ID_comp)
GROUP BY P2.name, C2.name) Y)

Though it seems very long winded and I have seen other ways, involving OUTER JOIN suggested on the internet.

Another SQL puzzle


Time And Date
Image by mobology via Flickr

Once more from http://www.sql-ex.ru which seems to have a phase-of-the-moon related availability.

Database is defined as:

Database schema consists of 4 tables:
Company(ID_comp, name)
Trip(trip_no, ID_comp, plane, town_from, town_to, time_out, time_in)
Passenger(ID_psg, name)
Pass_in_trip(trip_no, date, ID_psg, place)
Company table has ID and name of the company, which transports passengers. Trip table has information about trips: trip number, company ID, plane type, departure city, arrival city, departure time, and arrival time. The passenger table has passenger’s ID and passenger’s name. Pass_in_trip table has information about the flights: trip number, departure date (day), passenger’s ID and his place during the flight. We should note that,
– Any trip is being accomplished every day; duration of a flight is less than a calendar-day (24 hours);
– Time and date are considered comparatively one time zone;
– The departure time and the arrival time are given to within a minute;
– There can be the passengers bearing the same names (for example, Joe Bloggs);
– The place during the flight is a number followed by a letter; the number defines the row number, the letter (a – d) – the place in the row (from the left to the right) in the alphabetical order;
– Relationships and restrictions are shown in the data schema.

And the question is:

Among the clients which only use a single company, find the different passengers who have flown more often than others. Result set: passenger name, number of trips, and company name.

So, I have this SQL:

SELECT P.name, COUNT(T.trip_no) AS journeys, C.name
FROM
Passenger P, Trip T, Company C, Pass_in_trip PIT
WHERE
P.ID_psg = PIT.ID_psg AND PIT.trip_no = T.trip_no
AND T.ID_comp = C.ID_comp
AND
NOT EXISTS (SELECT * FROM
Passenger, Trip, Company, Pass_in_trip
WHERE Pass_in_trip.ID_psg = P.ID_psg AND Pass_in_trip.trip_no = Trip.trip_no
AND Trip.ID_comp <> C.ID_comp)
GROUP BY P.name, C.name

Which correctly, as far as I can see, outputs the names of clients who use only one company, the number of trips they have made and the company name.

But how do I limit that to the MAX?

I have tried adding “HAVING MAX(COUNT(T.trip))” but the RDMS objects – telling me I am trying to evaluate something that does not give a Boolean answer as a Boolean.

Let’s try that again, shall we?


This query works for the previously mentioned SQL problem:

SELECT K.BATTLE
FROM
(SELECT S.name AS SHIP, B.name AS BATTLE, C.country AS COUNTRY
FROM
Ships S, Classes C, Battles B, Outcomes O
WHERE
S.class = C.class AND O.battle = B.name AND O.ship = S.name) K CROSS JOIN
(SELECT S.name AS SHIP, B.name AS BATTLE, C.country AS COUNTRY
FROM
Ships S, Classes C, Battles B, Outcomes O
WHERE
S.class = C.class AND O.battle = B.name AND O.ship = S.name) J
WHERE K.COUNTRY = J.COUNTRY AND K.BATTLE = J.BATTLE
GROUP BY K.BATTLE
HAVING COUNT(K.BATTLE) >=9

But it’s still not perfect, because it would record a positive answer if a single ship from 9 countries took part in a battle or two ships from three countries (something that certainly took place in the Pacific).

So I cheated and looked for clues – and this works properly:

SELECT K.BATTLE
FROM
(SELECT S.name AS SHIP, B.name AS BATTLE, C.country AS COUNTRY
FROM
Ships S, Classes C, Battles B, Outcomes O
WHERE
S.class = C.class AND O.battle = B.name AND O.ship = S.name) K
GROUP BY K.BATTLE
HAVING COUNT(K.COUNTRY) >=3

Solving an old SQL puzzle


HMS Belfast
Image by stewartmorris via Flickr

Ages ago I was puzzled by an online SQL test (sadly no longer online, but here’s the text):

DESCRIPTION

The database of naval ships that took part in World War II is under consideration. The database has the following relations:
Classes(class, type, country, numGuns, bore, displacement)
Ships(name, class, launched)
Battles(name, date)
Outcomes(ship, battle, result)
Ships in classes are arranged to a single project. A class is normally assigned the name of the first ship in the class under consideration (head ship); otherwise, the class name does not coincide with any ship name in the database.
The Classes relation includes the class name, type (bb for a battle ship, or bc for a battle cruiser), country where the ship was built, number of main guns, gun caliber (diameter of the gun barrel, in inches), and displacement (weight in tons). The Ships relation includes the ship name, its class name, and launch year. The Battles relation covers the name and date of a battle the ships participated; while the result of their participation in the battle (sunk, damaged, or unharmed – OK) is in the Outcomes relation. Note: the Outcomes relation may include the ships not included in the Ships relation.

PROBLEM

Point out the battles in which at least three ships from the same country took part.

Well, I now think I can solve it:

SELECT BETA.name
FROM (SELECT S.name, C.country
FROM Ships S, Classes C WHERE S.class = C.class) ALPHA,
(SELECT B.name, O.ship
FROM Battles B, Outcomes O WHERE B.name = O.battle) BETA
WHERE
ALPHA.name = BETA.ship
GROUP BY BETA.name
HAVING COUNT (
SELECT * FROM
(SELECT S.name, C.country
FROM Ships S, Classes C WHERE S.class = C.class) X,
(SELECT B.name, O.ship
FROM Battles B, Outcomes O WHERE B.name = O.battle) Y
WHERE
X.name = ALPHA.name AND
Y.name = BETA.name AND
X.name = B.ship AND
A.country = ALPHA.country
)> 2

SELECT BETA.BATTLE
FROM (SELECT S.NAME, C.COUNTRY
FROM SHIPS S, CLASSES C WHERE S.CLASS = C.CLASS) ALPHA,
(SELECT B.NAME, O.SHIP
FROM BATTLES B, OUTCOMES O WHERE B.NAME = O.BATTLE) BETA
WHERE
ALPHA.SHIP = BETA.SHIP
GROUP BY BETA.BATTLE
HAVING COUNT (
SELECT * FROM
(SELECT S.NAME, C.COUNTRY
FROM SHIPS S, CLASSES C WHERE S.CLASS = C.CLASS) X,
(SELECT B.NAME, O.SHIP
FROM BATTLES B, OUTCOMES O WHERE B.NAME = O.BATTLE) Y
WHERE
X.BATTLE = ALPHA.BATTLE AND
Y.BATTLE = BETA.BATTLE AND
A.COUNTRY = ALPHA.COUNTRY)
)> 2

Sadly as the original is not online I cannot test this and it is so complex I am far from 100% confident there is no mistake, but….

Solving a relational query: part 5


Found: the relational algebra to tackle the problem outlined in part 3.

This is based on a solution to a similar problem found in Database Management Systems, Third Edition, though, unfortunately, that book prints a solution that is not relational algebra at all, as it relies on the order in which results are returned (relational algebra is a form of set algebra and sets have no order).

\rho (TEMP1, \rho (BIDDER\_ID \rightarrow USER\_ID, \sigma _{amount > 100} (BID)) \Join USER \Join LOT)

(Rename the attributes in TEMP1 – this is the step that is broken in the book)

\rho (name \rightarrow TEMP1\_name, TEMP1)

\rho(TEMP2, TEMP1 \times ( \rho(BIDDER\_ID \rightarrow USER\_ID, BID) \Join USER \Join LOT ))

\Pi _{NAME, EMAIL} ( \sigma _{TEMP1\_LOT\_ID=LOT\_ID \wedge TEMP1\_USER\_ID=USER\_ID \wedge TEMP1\_BID\_ID \neq BID\_ID} (TEMP2))