Many people struggle with complex data models that often require multiple tables to be joined together to satisfy a single query. Joining up to three tables is usually not a problem for most DBAs and developers. However, queries that may involve 6 or more tables (including joins on the same table) can often confuse even the most stalwart developer. Troubleshooting such a query can be a major test of faith and fortitude as you scan lines of JOIN expressions and WHERE clauses. I am a visual person, I usually find it easier to solve a problem if I can visualize the various elements and components of the problem. For that reason I developed a query modeling technique several years ago that allows me to diagram even the most puzzling and arcane query problems. I recently modified that modeling 'language' and have converted the main concepts into the UML. My original modeling language was based on ERD diagrams, but it translates very well to UML. I find this approach to query modeling extremely helpful and hopefully it will prove useful to others also.
The basic format is quite simple and is a stereotyped version of a standard UML Class Diagram:

The major elements of the model for a query element (a database table) is show in the diagram on the right. It consists of a box with the name of the table (stereotyped with the label FROM) and the box is divided into two compartments. The first compartment is stereotyped as the SELECT compartment and the second is stereotyped as the WHERE compartment. In the SELECT compartment the fields that will be retrieved from the database by the SQL SELECT statement along with their field aliases as expressed in the general format: "field : alias".
The WHERE compartment lists the constraints that must be placed on the table to limit the return results. These are typically expressed as generic SQL WHERE clauses (see the example below). The dotted box attached to the FROM area represents the alias used to identify the table. In UML this element typically represents the template or generic type of a class, but here it has been stereotyped and reused for an entirely different purpose. As you will see later clearly marking the table alias in this way makes referencing the same table with different aliases as if they were logically two different tables a very intuitive concept. Visualizing tables by their aliases in this way also makes reverse engineering a complex query much more transparent.
A query model is composed of a table definition for each table that will be referenced to satisfy the query, joined by a line that indicates the join criteria (i.e. what key / foreign key relationships need to be expressed in the query). It is important to know that query modeling should be conducted in an agile modeling approach. The model needs only to be sufficiently complete to solve the problem at hand. It is not a design artifact, only a tool for increasing understanding. Typically I develop a model on a whiteboard and treat it as a highly disposable diagram. Once the problem has been solved the model serves no purpose. Creating a model in this way usually takes a matter of minutes and saves a significant amount of time that would otherwise be spent gazing at a convoluted query in SQL form.

This is a full example model of a query to retrieve a user name and the name of their associated organization under some specific criteria of selection. We are trying to extract only people who are associated with organizations identified as consultancies. Further we are only interested in users with an id less than 2000 and excluding the admin user who has the id number of 1.
There are five logical tables that must be referenced to satisfy this query and two of those logical tables are actually the same physical table (named "records"). The two "records" tables are modeled as separate tables with the same FROM designation but with different table aliases (i.e. "user" and "org"). From this we can see that user records and organization records are both stored in the same physical table. These records are related to each other by a generic table called "primaryrel". Further the type of organization is stored in a generic link table called "pickrel" that cross references to a table that stores the types of organizations "orgtypes".
Once the tables that are required to satify the query have been identified you can add the fields that will need to be returned from each table in the FROM compartment of the respective table elements. Note that in our example most of the "utility" tables do not contribute any fields to the query result and are therefore left blank.
Next you add the constraints on the tables in the WHERE compartment of each table. The constraints can be expressed in pseudo-SQL format. In our example query we wanted to retrieve only users who are not the admin (with id 1) and have an id number less than 2000. This requirement is expressed in the WHERE compartment of the "user" table. Similarly other requirements for limiting the results on each tables are placed into the WHERE compartments.
Finally, the tables can be related with JOIN lines and the join criteria added as labels to those lines. Once that has been done the model is basically complete. Sure, you can add the ORDER BY and GROUP BY criteria as a note as done in this diagram, but in most cases this information is obvious and not necessary to add to the model when used as a problem-solving tool.
If you wish to visualize the expected result set for different types of join (INNER, OUTER, etc.) you may find
Jeff Atwood & Ligaya Turmelle's use of Venn Diagrams to visualize joins of some interest.
Building a model of the query in this way is very quick to do - it is also easy to convert an existing query into model format to start analyzing it for troubleshooting or performance analysis. Looking at the diagram above for example you can probably spot two table joins that could potentially be removed, making for a more efficient query (Here's a clue: what if you could hard code the id for a "Consultancy"?). Converting an adjusted model into SQL is also very easy to do. Using the query model above:
First take all the fields and aliases in the FROM compartment of each table element and add them to your SELECT clause and choose the base table for the FROM clause:
SELECT user.id AS uid, user.fname AS firstname, user.lname AS lastname, org.name AS oname, org.address AS oaddr
FROM records user
Then you can start adding the joins from the lines:
INNER JOIN primaryrel AS rel ON user.id = rel.id1
INNER JOIN records AS org ON org.id = rel.id2
INNER JOIN pickrel AS pick ON pick.id = org.type
INNER JOIN orgtypes AS otype ON otype.id = pick.id
Finally add the WHERE clauses from the WHERE compartments:
WHERE user.id <> 1 AND user.id < 2000 AND user.datatype=1
AND rel.relid = 30
AND org.datatype = 2
AND pick.table = "orgtypes"
AND otype.name = "Consultancy"
Once you have constructed those portions of the query you can add any group by and order by clauses and you have your query!
One important note - I have not provided an example of a query that uses "HAVING" statements in the GROUP BY. However, you simply follow the same scheme adding an additional compartment. Hopefully, this modeling approach may prove as useful to you as it has to me over the years.
Comments
Thu, 14.08.2008 16:58
Thanks for the tip. I made a slight mod you might be interested [...]
Mon, 28.07.2008 15:06
Solution (to my issue): Views > Tools > Flush Views Cache It explains that Views doesn't always keep up with changes [...]
Mon, 28.07.2008 14:52
Thanks for this helpful post. I've seen this effect too. I'm running into a different (but related?) issue - the Views [...]
Tue, 15.07.2008 20:25
Oscar, Krista from Calais here, writing to let you know that Calais 2.1 is live. In addition to our ongoing [...]
Tue, 01.07.2008 11:30
Dan, You are absolutely correct and I should have stated this within my post; the described steps within the post [...]
Mon, 30.06.2008 09:45
i wouldnt recomand this at all, because if something happens and the conection is lost u will have your data lost if the [...]
Mon, 09.06.2008 13:42
PDT syntax highlighting support does not seem to work when subclipse is installed, any one else had this problem?
Mon, 09.06.2008 11:56
I didn't mean to imply that you were bashing unit tests.
Mon, 09.06.2008 11:52
My point isn't to bash unit tests, but rather to say there are a bunch of things you should be doing before you get [...]
Mon, 09.06.2008 11:43
I agree with, what I think is, the gist of your argument. That is, if you don't write code that anticipates failure, [...]
Mon, 09.06.2008 08:58
clipse is an open source IDE — or as they put it themselves: “universal toolset for development”. It [...]
Tue, 27.05.2008 12:17
Navigation links should fill their container to ensure ease of selection. A good method for that is to make them [...]
Thu, 22.05.2008 10:35
One of the better comments I've seen in a while: "Although I like PHP, I agree the language is only as good as the [...]
Tue, 20.05.2008 14:03
Oscar, Yahoo's Term Extraction service takes an entire article and returns a few of (what it thinks are) the most [...]
Tue, 20.05.2008 13:13
Hi, Tom Tague from Calais here. First, thanks for taking note of Calais. And integrating an example right within the [...]