Using a few tables to hold all Domain values: Good Idea or Bad?
A "domain" is a set of items that define the permissible values ("discrete values") that a database column may take. For example, Gender may take the values Male, Female or
Unknown. In many applications, each item is defined by a pair of values: an internal code, typically a number, and an accompanying descriptive phrase. For example, many items in questionnaires are scored on a 1-5 Likert Scale, such as:
1= Disagree strongly, 2 = Disagree Somewhat, 3 = Neutral,. 4= Agree somewhat, 5=Agree strongly.
In user interfaces, database columns based on domains are typically presented as
combo boxes, list boxes or radio button lists, depending on the application.
Combo boxes take the least space and are convenient for keyboard-oriented users;
radio-buttons are fastest to enter with the mouse, but take up more screen real
estate.
In a database where a large number of columns are based on domains, one
typically chooses to store the domain values in a table so as to simplify the
presentation and validation logic. There are two ways to do this. In one, the
database column becomes a foreign key into a table of domain values, so we have
one table for each domain. In another, we seek to combine the separate domain
tables into fewer tables in some way.
Perspective 1: Combining Domain Values into Fewer Tables is Bad
Louis Davidson's website
http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/
advocates against using the second approach. The example Davidson uses is a
table GenericDomain consisting of the columns (GenericDomainID int primary key,
RelatedToTable text, RelatedToColumn text, Code int, Description text). Davidson
then goes on to show why this design is unworkable.
While I agree strongly with the other nine issues Davidson emphasizes (and I believe that he performs a very valuable service to the developer community through his site) I would disagree with this one.
First, Davidson's GenericDomain table is not in third-normal form . (To recap,
in third-normal form, all columns of a table depend on the key, the whole key
and nothing but the key.) A domain is a Set of values, not an individual
Code-Description pair. The correct design would be to use three tables:
- Generic_Domains (Domain_ID int primary key, Name text, Description text),
containing a definition of the Domain; - Domain_Values (Domain_ID int, Code int, Description text), containing a
definition of each member (code-description pair) of each domain.
The primary key is (Domain_ID, Code) and Domain_ID is a foreign key into
Domains. - Domain_Mapping (Domain_ID int, RelatedToTable text, RelatedToColumn text)
that documents which Domain is used for a given table and column. (Sometimes, as
in Likert-scale questionnaires, the same Domain is used for multiple columns.)
Davidson's query: "Give me all the domain values for the Customers table" is now
answered very easily using the following query, which uses a two-table join:
select M.RelatedToColumn, DV.Code, DV.Description from Domain_Values
DV, Domain_Mapping M where M.Domain_ID= DV.Domain_ID order by M.RelatedToColumn, DV.Code
Second, Davidson raises the issue that the GenericDomain design makes it messy
to validate data using foreign key constraints. This argument is not valid for
today's mainstream RDBMS technology. Implementing a foreign-key constraint involves creating a Stored Function and using it in a constraint. This is something that both Oracle and MS SQL Server
are capable of.
The stored function illustrated below, CheckDomainCol, (I have used SQL
Server syntax: the code is pasted from SQL Server Management Studio) is general,
parameterized, and needs to be created only once: it can be used multiple times,
once for each column whose values you need to constrain to a specific domain.
The first part of the code, and the code that proves that the constraint works,
is just there to save you the trouble of typing: the critical part of the code -
the stored function, and its use in a constraint - is highlighted in bold.
Domain_ID int IDENTITY(1,1) NOT NULL primary key,
NAME varchar(50) NOT NULL, DESCRIPTION varchar(255) NULL
)
go
CREATE TABLE Domain_Values(
Domain_ID int NOT NULL,
Code int not null,
DESCRIPTION varchar(200) NULL,
DATETIME_LAST_MODIFIED datetime NULL default getdate(),
CONSTRAINT PRIM_Domain_Values PRIMARY KEY CLUSTERED (Domain_ID, Code) )
GO
Create table Domain_Mapping (
Domain_ID int not null,
RelatedToTable varchar(255) not null,
RelatedToColumn varchar(255) not null,
constraint Prim_Dom_Map Primary Key Clustered (Domain_ID, RelatedToTable,
RelatedToColumn)
)
go
ALTER TABLE Domain_Values ADD CONSTRAINT FKEY1 FOREIGN KEY(Domain_ID)
REFERENCES Generic_Domains (Domain_ID) ON DELETE CASCADE
GO
ALTER TABLE Domain_Values CHECK CONSTRAINT FKEY1
go
insert into Generic_Domains (name, description) values('test', 'a test domain')
go
/* the above record gets the primary key value 1 *//span>
insert into Domain_Values(Domain_ID,Code,Description) values (1,1,'Yes');
insert into Domain_Values(Domain_ID,Code,Description) values (1,0,'No');
insert into Domain_Values(Domain_ID,Code,Description) values (1,99,'Unknown');
go
create table CheckTbl (col1 varchar(255), col2 int)
go
insert into Domain_Mapping values(1, 'CheckTbl', 'col2')
go
CREATE FUNCTION CheckDomainCol(@Val int,@Tablename
varchar(255), @ColumnName varchar(255) ) RETURNS bit AS
BEGIN
DECLARE @retval int
SELECT @retval = COUNT(*) FROM Domain_Values DV, Domain_mapping M
where M.Domain_ID = DV.Domain_ID and M.RelatedDV. Code = @Val and
DV.Domain_ID= @Domain_ID
RETURN (@retval <> 0)
END;
GO
ALTER TABLE CheckTbl ADD CONSTRAINT chkCol2 CHECK (dbo.CheckDomainCol (col2, 'CheckTbl', 'col2') = 1);
GO
insert into CheckTbl values(1, 45) /* this will fail */
I fully agree with Davidson that the GenericDomain design approach does not make
sense if you need multiple columns, not just a Code and Description, associated
with each domain value.
In this case, you should create a separate table for that particular domain.
I conclude that the arguments against Generic Domain tables are not valid any
more. (Older versions of MySQL, pre 5.0, do not support stored functions, but if
you are a MySQL developer, I don't see any reason why you should not be using v 5.x or later.)
What are the advantages of a Generic Domain-Value table approach?
The above text should constitute reasonable proof that Domain-Value tables
aren't bad. But to justify using them, there must be some tangible benefits,
not just the absence of negatives. I state these below.
- Efficiency Issues: From the hardware/DBMS perspective, separate
domain tables and consolidated tables are equally efficient. Davidson points out
correctly that even if numerous tables were to be created, these tables are
small enough to be cached, and RDBMS vendor-specific Data Definition Language
can ensure that they will be. I am far more concerned with human
efficiency: the schema will be maintained by developers who come and go, and
whose first recourse to understanding a system will be Entity-Relationship
diagrams and the Schema DDL - the CREATE/ALTER TABLE statements like the
above.
Domain definitions constitute metadata. My personal preference is to have
metadata concentrated in a few tables rather than dispersed across a hundred or
more. With the latter approach, separate domain-value tables become, in my
opinion, the equivalent of pigeon droppings in an E-R diagram: visually, a
domain-value table, which may contain as few as three rows, is as prominent in
the diagram as a table that may contain several million rows, other than the
fact that it contains fewer columns. Only techniques such as the use of
naming conventions, e.g., special prefixes, would indicate its domain-value
nature. - The consolidation of domain-value tables simplifies the creation of
user-interface generation code, or the use of generic routines to present these values in the interface. Such code has to look only in one place to consult
metadata, which is then used to populate the contents of a control such as a
combo-box or radio-button-list. While it is possible to perform
code-generation by consulting the RDBMS's system dictionary (specifically the
foreign key constraints) such code is much more elaborate, and is not
readily portable across RDBMSs. If you are a professional developer building
front-ends to databases, you will maximize your productivity by the use of
frameworks or libraries (some of which you may have developed yourself) unless
you are billing your client by the hour or by the number of lines of code that
you write. - It simplifies the administration of content maintenance greatly .
The Domain values are mostly static, but not guaranteed to
be so: they will change over time. The individuals
responsible for maintaining the domain values will be content experts or
privileged users rather than database developers. It is much simpler to give
such individuals read-write access to the contents of a few generic domain
tables, while most other users of the system will have read-only access to these
tables. Consider, by contrast, the issues of ensuring appropriate
role-based access to hundreds of tables. (Again, if you're a DBA, are the
number of DDL statements that you write reflected in your pay or your level of
importance, or do you care about getting the job done efficaciously?)
The job of a developer who has to build a content-maintenance user interface is
simplified considerably. The Domain / Domain-value tables can be presented for
editing through a simple form/sub-form user. A given domain can be searched by a
keyword that is part of its description, or by the Table or Column that maps to
it. Note: In a usable interface, you want to support search by the
Business Description associated with a Table or Column rather than the
table/column internal names: this is simple enough to do if there is additional
metadata that associates the internal table/column names with business
descriptions.
All three issues above emphasize the importance of looking
at the big picture. The database schema is not only the Database
Administrator's concern: the DBA has to partner with applications developers and
power users/content experts to make sure that the schema is designed so as to
facilitate productive software development and metadata-content maintenance.