<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5726392040504510271</id><updated>2011-07-07T16:45:03.786-07:00</updated><category term='metadata'/><category term='referential integrity'/><category term='domain values'/><title type='text'>Active Metadata</title><subtitle type='html'>This blog discusses issues related to metadata-driven software architectures, including database engine features that impact metadata subschema design. "Metadata-driven" means that a subschema of the database contains machine-readable descriptions of the system that are sufficiently rich to enable software to accomplish tasks, such as data presentation or validation, that might otherwise have to be coded on a case-by-case basis.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://metadata-musings.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5726392040504510271/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://metadata-musings.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Prakash M. Nadkarni</name><uri>http://www.blogger.com/profile/13718313173239117147</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5726392040504510271.post-5736505491404449206</id><published>2009-06-14T09:18:00.000-07:00</published><updated>2009-06-14T09:30:58.321-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='domain values'/><category scheme='http://www.blogger.com/atom/ns#' term='referential integrity'/><title type='text'>Consolidating Domain Values into Generic Tables</title><content type='html'>&lt;h2&gt;Using a few tables to hold all Domain values: Good Idea or Bad?&lt;/h2&gt;&lt;p&gt;A "domain" is a set of items that define the permissible values ("discrete values") that a database column may take. For example, Gender may take the values Male, Female or&lt;br /&gt;Unknown. In many applications, each item is defined by a pair of values: an internal &lt;b&gt;code&lt;/b&gt;, typically a number, and an accompanying &lt;b&gt;descriptive phrase&lt;/b&gt;. For example, many items in questionnaires are scored on a 1-5 Likert Scale, such as:&lt;br /&gt;&lt;br /&gt; 1= Disagree strongly, 2 = Disagree Somewhat, 3 = Neutral,. 4= Agree somewhat, 5=Agree strongly.&lt;br /&gt;&lt;br /&gt;In user interfaces, database columns based on domains are typically presented as&lt;br /&gt;combo boxes, list boxes or radio button lists, depending on the application.&lt;br /&gt;Combo boxes take the least space and are convenient for keyboard-oriented users;&lt;br /&gt;radio-buttons are fastest to enter with the mouse, but take up more screen real&lt;br /&gt;estate.&lt;/p&gt;&lt;p&gt;In a database where a large number of columns are based on domains, one&lt;br /&gt;typically chooses to store the domain values in a table so as to simplify the&lt;br /&gt;presentation and validation logic. There are two ways to do this. In one, the&lt;br /&gt;database column becomes a foreign key into a table of domain values, so we have&lt;br /&gt;one table for each domain. In another, we seek to combine the separate domain&lt;br /&gt;tables into fewer tables in some way.&lt;/p&gt;&lt;h3&gt;Perspective 1: Combining Domain Values into Fewer Tables is Bad&lt;/h3&gt;&lt;p&gt;Louis Davidson's website&lt;br /&gt;&lt;a href="http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/"&gt;http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/&lt;/a&gt;&lt;br /&gt;advocates against using the second approach. The example Davidson uses is a&lt;br /&gt;table GenericDomain consisting of the columns (GenericDomainID int primary key,&lt;br /&gt;RelatedToTable text, RelatedToColumn text, Code int, Description text). Davidson&lt;br /&gt;then goes on to show why this design is unworkable.  &lt;/p&gt;&lt;p&gt;While I agree strongly with the other nine issues Davidson emphasizes (and I believe that he performs a very valuable service to the developer community through his site) I would disagree with this one.&lt;br /&gt;&lt;br /&gt;First, Davidson's GenericDomain table is not in third-normal form . (To recap,&lt;br /&gt;in third-normal form, all columns of a table depend on the key, the whole key&lt;br /&gt;and nothing but the key.) A domain is a &lt;i&gt;Set&lt;/i&gt; of values, not an individual&lt;br /&gt;Code-Description pair. The correct design would be to use three tables: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Generic_Domains&lt;/b&gt; (Domain_ID int primary key, Name text, Description text),&lt;br /&gt;containing a definition of the Domain; &lt;/li&gt;&lt;li&gt;&lt;b&gt;Domain_Values&lt;/b&gt; (Domain_ID int, Code int, Description text), containing a&lt;br /&gt;definition of each &lt;i&gt;member&lt;/i&gt; (code-description pair) of each domain. &lt;br /&gt;The primary key is (Domain_ID, Code) and Domain_ID is a foreign key into&lt;br /&gt;Domains.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Domain_Mapping&lt;/b&gt; (Domain_ID int, RelatedToTable text, RelatedToColumn text)&lt;br /&gt;that documents which Domain is used for a given table and column. (Sometimes, as&lt;br /&gt;in Likert-scale questionnaires, the same Domain is used for multiple columns.)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Davidson's query: "Give me all the domain values for the Customers table" is now&lt;br /&gt;answered very easily using the following query, which uses a two-table join:&lt;br /&gt;&lt;br /&gt; select M.RelatedToColumn, DV.Code, DV.Description from Domain_Values&lt;br /&gt;DV, Domain_Mapping M where M.Domain_ID= DV.Domain_ID order by M.RelatedToColumn, DV.Code&lt;/p&gt;&lt;p&gt;Second, Davidson raises the issue that the GenericDomain design makes it messy&lt;br /&gt;to validate data using foreign key constraints. This argument is not valid for&lt;br /&gt;today's mainstream RDBMS technology. Implementing a foreign-key constraint involves creating a Stored Function and using it in a constraint. This is something that both Oracle and MS SQL Server&lt;br /&gt;are capable of.&lt;br /&gt;&lt;br /&gt;The stored function illustrated below, &lt;b&gt;CheckDomainCol,&lt;/b&gt; (I have used SQL&lt;br /&gt;Server syntax: the code is pasted from SQL Server Management Studio) is general,&lt;br /&gt;parameterized, and needs to be created only once: it can be used multiple times,&lt;br /&gt;once for each column whose values you need to constrain to a specific domain.&lt;br /&gt;&lt;br /&gt;The first part of the code, and the code that proves that the constraint works,&lt;br /&gt;is just there to save you the trouble of typing: the critical part of the code -&lt;br /&gt;the stored function, and its use in a constraint - is highlighted in bold.&lt;/p&gt;&lt;div style="BACKGROUND-COLOR: white" class="small"&gt;&lt;span style="font-size:85%;"&gt;CREATE TABLE Generic_Domains(&lt;br /&gt; Domain_ID int IDENTITY(1,1) NOT NULL primary key,&lt;br /&gt; NAME varchar(50) NOT NULL, DESCRIPTION varchar(255) NULL&lt;br /&gt;)&lt;br /&gt;go&lt;br /&gt;CREATE TABLE Domain_Values(&lt;br /&gt; Domain_ID int NOT NULL,&lt;br /&gt; Code int not null,&lt;br /&gt; DESCRIPTION varchar(200) NULL,&lt;br /&gt; DATETIME_LAST_MODIFIED datetime NULL default getdate(),&lt;br /&gt;CONSTRAINT PRIM_Domain_Values PRIMARY KEY CLUSTERED (Domain_ID, Code) )&lt;br /&gt;&lt;br /&gt; GO&lt;br /&gt;&lt;br /&gt;Create table Domain_Mapping (&lt;br /&gt;   Domain_ID int not null,&lt;br /&gt;   RelatedToTable varchar(255) not null,&lt;br /&gt;   RelatedToColumn varchar(255) not null,&lt;br /&gt;   constraint Prim_Dom_Map Primary Key Clustered (Domain_ID, RelatedToTable,&lt;br /&gt;RelatedToColumn)&lt;br /&gt;)&lt;br /&gt;go&lt;br /&gt;&lt;br /&gt;ALTER TABLE Domain_Values ADD CONSTRAINT FKEY1 FOREIGN KEY(Domain_ID)&lt;br /&gt;   REFERENCES Generic_Domains (Domain_ID) ON DELETE CASCADE&lt;br /&gt;GO&lt;br /&gt;&lt;br /&gt;ALTER TABLE Domain_Values CHECK CONSTRAINT FKEY1&lt;br /&gt;go&lt;br /&gt;&lt;br /&gt;insert into Generic_Domains (name, description) values('test', 'a test domain')&lt;br /&gt;go&lt;br /&gt;/* the above record gets the primary key value 1 *//span&gt;&lt;br /&gt;&lt;br /&gt; insert into Domain_Values(Domain_ID,Code,Description) values (1,1,'Yes');&lt;br /&gt; insert into Domain_Values(Domain_ID,Code,Description) values (1,0,'No');&lt;br /&gt; insert into Domain_Values(Domain_ID,Code,Description) values (1,99,'Unknown');&lt;br /&gt;go&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;b&gt;create table CheckTbl (col1 varchar(255), col2 int)&lt;br /&gt;go&lt;br /&gt;insert into Domain_Mapping values(1, 'CheckTbl', 'col2')&lt;br /&gt;go&lt;br /&gt;CREATE FUNCTION CheckDomainCol(@Val int,@Tablename&lt;br /&gt;varchar(255), @ColumnName varchar(255) ) RETURNS bit AS&lt;br /&gt;BEGIN&lt;br /&gt; DECLARE @retval int&lt;br /&gt;SELECT @retval = COUNT(*) FROM Domain_Values DV, Domain_mapping  M&lt;br /&gt;   where M.Domain_ID = DV.Domain_ID and M.RelatedDV. Code = @Val and&lt;br /&gt;DV.Domain_ID= @Domain_ID&lt;br /&gt;RETURN (@retval &lt;&gt; 0)&lt;br /&gt;END;&lt;br /&gt;&lt;br /&gt;GO&lt;br /&gt;ALTER TABLE CheckTbl ADD CONSTRAINT chkCol2 CHECK (dbo.CheckDomainCol (col2, 'CheckTbl', 'col2') = 1);&lt;/b&gt;&lt;br /&gt;GO&lt;br /&gt;&lt;br /&gt;insert into CheckTbl values(1, 45) /* this will fail */&lt;/span&gt; &lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;p&gt;I fully agree with Davidson that the GenericDomain design approach does not make&lt;br /&gt;sense if you need multiple columns, not just a Code and Description, associated&lt;br /&gt;with each domain value. &lt;br /&gt;In this case, you should create a separate table for that particular domain.&lt;/p&gt;&lt;p&gt;I conclude that the arguments against Generic Domain tables are not valid any&lt;br /&gt;more. (Older versions of MySQL, pre 5.0, do not support stored functions, but if&lt;br /&gt;you are a MySQL developer, I don't see any reason why you should not be using v 5.x or later.)&lt;/p&gt;&lt;h3&gt;What are the advantages of a Generic Domain-Value table approach?&lt;/h3&gt;&lt;p&gt;The above text should constitute reasonable proof that Domain-Value tables&lt;br /&gt;aren't bad. But to justify using them, there must be some tangible &lt;i&gt;benefits&lt;/i&gt;,&lt;br /&gt;not just the absence of negatives. I state these below.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Efficiency Issues: &lt;/b&gt;From the &lt;i&gt;hardware/DBMS&lt;/i&gt; perspective, separate&lt;br /&gt;domain tables and consolidated tables are equally efficient. Davidson points out&lt;br /&gt;correctly that even if numerous tables were to be created, these tables are&lt;br /&gt;small enough to be cached, and RDBMS vendor-specific Data Definition Language&lt;br /&gt;can ensure that they will be. I am far more concerned with &lt;i&gt;human&lt;/i&gt;&lt;br /&gt;efficiency: the schema will be maintained by developers who come and go, and&lt;br /&gt;whose first recourse to understanding a system will be Entity-Relationship&lt;br /&gt;diagrams and the Schema DDL - the CREATE/ALTER TABLE  statements like the&lt;br /&gt;above.&lt;br /&gt;&lt;span class="style1"&gt;Domain definitions constitute &lt;/span&gt;&lt;span class="style2"&gt;&lt;em&gt;metadata&lt;/em&gt;&lt;/span&gt;&lt;span class="style1"&gt;. &lt;/span&gt;My personal preference is to have&lt;br /&gt;metadata concentrated in a few tables rather than dispersed across a hundred or&lt;br /&gt;more.  With the latter approach, separate domain-value tables become, in my&lt;br /&gt;opinion, the equivalent of pigeon droppings in an E-R diagram: visually, a&lt;br /&gt;domain-value table, which may contain as few as three rows, is as prominent in&lt;br /&gt;the diagram as a table that may contain several million rows, other than the&lt;br /&gt;fact that it contains fewer columns. Only techniques such as  the use of&lt;br /&gt;naming conventions, e.g., special prefixes, would indicate its domain-value&lt;br /&gt;nature.&lt;/li&gt;&lt;li&gt;&lt;b&gt;The consolidation of domain-value tables simplifies the creation of&lt;br /&gt;user-interface generation code, or the use of generic routines to present these values in the interface. &lt;/b&gt;Such code has to look only in one place to consult&lt;br /&gt;metadata, which is then used to populate the contents of a control such as a&lt;br /&gt;combo-box or radio-button-list.  While it is possible to perform&lt;br /&gt;code-generation by consulting the RDBMS's system dictionary (specifically the&lt;br /&gt;foreign key constraints) such code is much more elaborate,  and is not&lt;br /&gt;readily portable across RDBMSs. If you are a professional developer building&lt;br /&gt;front-ends to databases, you will maximize your productivity by the use of&lt;br /&gt;frameworks or libraries (some of which you may have developed yourself) unless&lt;br /&gt;you are billing your client by the hour or by the number of lines of code that&lt;br /&gt;you write.&lt;/li&gt;&lt;li&gt;&lt;b&gt;It simplifies the administration of content maintenance greatly . &lt;/b&gt;&lt;br /&gt;&lt;span class="style1"&gt;The Domain values are mostly static, but not guaranteed to&lt;br /&gt;be so: they will change over time.&lt;/span&gt;&lt;b&gt; &lt;/b&gt; The individuals&lt;br /&gt;responsible for maintaining the domain values will be content experts or&lt;br /&gt;privileged users rather than database developers. It is much simpler to give&lt;br /&gt;such individuals read-write access to the contents of a few generic domain&lt;br /&gt;tables, while most other users of the system will have read-only access to these&lt;br /&gt;tables.  Consider, by contrast, the issues of ensuring appropriate&lt;br /&gt;role-based access to hundreds of tables.  (Again, if you're a DBA, are the&lt;br /&gt;number of DDL statements that you write reflected in your pay or your level of&lt;br /&gt;importance, or do you care about getting the job done efficaciously?)&lt;br /&gt;&lt;br /&gt;The job of a developer who has to build a content-maintenance user interface is&lt;br /&gt;simplified considerably. The Domain / Domain-value tables can be presented for&lt;br /&gt;editing through a simple form/sub-form user. A given domain can be searched by a&lt;br /&gt;keyword that is part of its description, or by the Table or Column that maps to&lt;br /&gt;it.  Note: In a usable interface, you want to support search by the&lt;br /&gt;Business Description associated with a Table or Column rather than the&lt;br /&gt;table/column internal names: this is simple enough to do if there is additional&lt;br /&gt;metadata that associates the internal table/column names with business&lt;br /&gt;descriptions.&lt;br /&gt;&lt;br /&gt;&lt;span class="style1"&gt;All three issues above emphasize the importance of looking&lt;br /&gt;at the big picture. &lt;/span&gt;The database schema is not only the Database&lt;br /&gt;Administrator's concern: the DBA has to partner with applications developers and&lt;br /&gt;power users/content experts to make sure that the schema is designed so as to&lt;br /&gt;facilitate productive software development and metadata-content maintenance.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5726392040504510271-5736505491404449206?l=metadata-musings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadata-musings.blogspot.com/feeds/5736505491404449206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://metadata-musings.blogspot.com/2009/06/consolidating-domain-values-into.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5726392040504510271/posts/default/5736505491404449206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5726392040504510271/posts/default/5736505491404449206'/><link rel='alternate' type='text/html' href='http://metadata-musings.blogspot.com/2009/06/consolidating-domain-values-into.html' title='Consolidating Domain Values into Generic Tables'/><author><name>Prakash M. Nadkarni</name><uri>http://www.blogger.com/profile/13718313173239117147</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
