Typically, we have an expression of need in natural language, which we are to turn into a query of the database. This gets easier with practice, but some good steps in the thought process are:

  • What columns must be returned: these will make up the SELECT list.
  • In which tables and columns are the data? The tables will be in the FROM ...JOIN part of the query.
  • Do I need intermediary tables to conduct the JOINs, even if they aren't in the final SELECT list? They will also be in the FROM ...JOIN.
  • What constraints are to be imposed? The tables containing this data will also have to be in the FROM ...JOIN part of the query even if they do not feature in the final SELECT list, and these columns will be in the WHERE clause of the query.

Important things to remember:

  • WHERE determines what is put in the result set: no operations (such as arithmetic) will occur on rows selected out with WHERE
  • Use parentheses () to group together the subjects of AND, OR and NOT operators
  • Use <> for "not equals"
  • Remember to use IS NULL and IS NOT NULL to check for NULLs. Be careful of NULLs in general!

With this in mind, here is an example of the process, on this section of a database:

An Instruments table, an Instruments_models table,
        a Measurements table and a Weatherstations table; Instruments and Measurements are joined by a foreign key, as are Weatherstations and Measurements
        and Instruments_models and Instruments
Part of a weather data database in Third Normal Form

The question, in natural language, is "I want the rainfall and temperature measurements from the Ithaca weather station between January 1st 2020 and January 1st 2022". How to proceed? Following the advice above:

  1. The information returned comprises the measurements themselves and presumably the measurement units and measurement type.
  2. The tables containing the returned data are the Measurements and Instrument_models tables.
  3. The query will need to include the Instruments table as it bridges the two via the instrument_id and instrument_model foreign keys.
  4. To restrict the returned dataset to just the required subset of Ithaca weatherstation data, we will need weatherstation_name, which is in both the Measurements and Weatherstations tables; as the weatherstion_location is not needed for this query, there is no need to include the Weatherstations table (note, however, that if the weatherstation identifier was just a surrogate integer key with no semantic information, say weatherstation_id, we'd probably need to link to the Weatherstations table to get the weatherstation's name as that is where it would be). We will need the datetime, also in the Measurements table, and we will need use the measurement_type to restrict to rainfall and temperature measurements. All of these are in tables already identified.

Having identified the tables, the query can be constructed. Deciding in advance that our table aliases will be "M" for Measurements, "I" for Instruments and "IM" for Instrument_models, the SELECT list will look like:

The table collection will look like:

WHERE restrictions will look like:

It's not necessary to construct the query in separate sections, but thinking of the parts separately does allow a visualization of the separate parts. Here's the final query:

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement