In a relational database the data are stored in tables of rows. Each table row comprises a data record; this might be, say, an item, event or relationship between records, with the columns defining what values can be stored for each record. In addition to the data contained in a table, the table also has a name, as do the columns; each column also specifies the allowed datatype for data in that column, and whether or not it needs a value at all. As a practical matter, while columns do have an order within a table and one could in many cases rely on this to address them either implicitly or by ordinal number, it is in normally much better to address them by name. As with other programming, it is good to use names which are meaningful, and to adopt a consistent approach to naming; databases can be long-lived and used by many people, so deciphering what has been done is much easier with clear, consistent names. Be aware that some RDBMSs will care about capital letters and some will not; again, this argues for consistency!

Basic table containing one file per row, consisting of file_name, file_location, file_size and creation_date
A basic table containing file locations: each row is a different file, and the columns are file_name, file_location, file_size and creation_date. Each column will specify a given type of allowed data (datetime, int, etc.). In a production relational database, there may be many tables so it is good to give them informative names
Data types

If a transaction attempts to insert a value into a row which does not comply with the specified datatype, the RDBMS will reject the transaction. There is some variation between different RDBMSs, but the core datatypes available in every RDBMS are:

Common datatypes and their SQL names
Data type Name
String CHAR(X) (X characters) or VARCHAR(X) (up to X characters)
Integer INTEGER or INT
Floats/decimals FLOAT, DOUBLE or DECIMAL
Times/dates DATE, TIME, DATETIME

In addition, most RDBMSs will accept binary data although one should be careful as it can impact performance. In your RDBMS's documentation you may find a Boolean type (BOOLEAN, BOOL), although in practice it may simply be a synonym for a small numeric type, (such as BIT or TINYINT(1)), or you may have to explicitly use an integer instead (as in Oracle and SqLite). Finally, your RDBMS may also have support for complex types such as JSON or XML or even tables themselves! Those types are outside of the scope of this workshop and will generally come with RDBMS-specific SQL, so consult your RDBMS's documentation for details.

What if there's no value for a column? NULLs

In the original relational model, having no value for a given column in a row was disallowed. In the same way that the RDBMS would reject data that didn't match the data type specification of the column, the RDBMS would also reject data which had no value at all for a given column. Most relational databases do, however, allow values to be omitted for those columns that have been explicitly set to allow NULLs (a per-column setting). NULL is a special value that means "no value at all", for whatever reason and although the use of NULLs is widespread in modern relational databases, there are some voices who oppose allowing NULLs at all. In this workshop, we do assume the use of NULLs, but beware as they aren't a normal sort of value; in particular, if two values A and B are both NULL, it is not the case that A=B, and the behaviour of NULLs under sorting operations varies between RDBMSs! It is often risky to make assumptions about why there is no value.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement