Login | Register
My pages Projects Community openCollabNet

Using Dia to Interact With tedia2sql


Notes on Case

In general, you should try to use lowercase when dealing with tedia2sql and dealing with Dia in preparation for using tedia2sql to generate SQL DDL. Where you don't use lowercase, assume tedia2sql will be case-sensitive. In the near future, tedia2sql will support an option to convert every object, datatype, and keyword in your schema to UPPERCASE or lowercase.

In any case, tedia2sql prefers to get commands in lowercase. Attempts have been made to allow ALL UPPERCASE in such items as "NULL" or "NOT NULL" or whatever, but it is known that the code will perform best if you use lowercase when passing keywords to tedia2sql. In some cases, tedia2sql will convert your input to lowercase when dealing with it.

It bears repeating, soon there will be an option you can pass to tedia2sql to specify you want keywords, datatypes, and the like in ALL UPPERCASE, so you shouldn't sweat this small detail right now.

UML and ERD

tedia2sql can interpret a Dia UML diagram in two different modes. The default is to use the ERD interpretation, where the diamond UML 'aggregation' symbol represents the ERD 'crows foot' symbol for the 'many' end of a one-to-many relationship, and the relationship direction symbol is used to indicate the 'owner' end of a one-to-one, or one-to-zero-or-one relationship. The relationship direction arrow points towards the 'owner' end.

In UML mode, using the -u flag, the conventional UML interpretation of aggregation and composition associations are used. The direction symbol is still used to indicate the 'owner' end of one-to-one associations that do not have aggregation or composition semantics.

Creating the Dia UML Diagram

Start up your copy of Dia. Choose the UML shape set.

It's important to note that you're editing a UML diagram, and not a database ERD. Thus, we don't call things "tables" and "columns," but instead, we call them "classes" and "attributes." However, since we're all DBAs here, sometimes we lapse into table/column terminology when it's convenient or when we forget. For the purpose of this document, they're synonymous. Here's a short list of common synonyms:

  • class == table == relation (== view)
  • attribute == column
  • association == foreign-key relationship == FK constraint
  • primary key == PK constraint
In general, you should find the following instructions mostly intuitive. In fact, you might start creating Classes and Associations right away without reading further, pausing to read these instructions again when you're ready to create views, figure out how to specify datatypes, nullability, primary keys, indexes, permissions, etc.

UML Classes

Create a new class and pick its background colour. Whenever you want to create another class of the same style, copy this generic class and paste it to make the new one.

Edit your class. Set the name to whatever you want. Note that the generated SQL DDL script will use the same case as you name your class here. This doesn't matter on RDBMSs like Oracle and Postgres that are case-insensitive, but it does matter on others like Sybase that are case-sensitive.

If you choose the "Abstract" checkbox, this will represent a view, rather than a table, in the final DDL.

Attributes for Tables

Each Attribute that you create will become a column in your table. The Attribute name and type will become column name/type.

Attribute types should be ANSI SQL 1992 (SQL'92) standard datatypes. Of course you can use datatypes for your target database, but this will limit portability of your schema. If you use SQL'92 datatypes, tedia2sql will automatically pick a datatype for your RDBMS if yours does not support SQL'92 datatypes. For example, if you use "timestamp" as a datatype and generate SQL DDL for Oracle, it will automatically convert this to "date".

The Value field can be "NULL" "NOT NULL" to generate DDL indicating this column is NULLable or not NULLable.

Value can also indicate default values: "default defaultValue" will specify that the column should default to defaultValue. Follow your RDBMS syntax here. In most RDBMSs, if the column type is a string, you must put single-quotes (or even, sometimes, double-quotes) around the default value, thusly: "default 'defaultValue'".

You can mix the NULLability of the column and DEFAULT of the column as well. For example, "default defaultValue NOT NULL" or "default defaultValue NULL" will specify a default value of defaultValue and that the column is not NULLable and NULLable respectively.

If your style (like mine) calls for not capitalising NULL and NOT NULL, you can put them in lowercase. However, tedia2sql will not properly parse mixed-case "NULL" and "NOT NULL". Choose either all uppercase or all lowercase and stick with it.

Set Visibility to "Protected" if you want this Attribute to be part of the Primary Key. This will put the "pound sign" ("#") to the left of the Attribute on the Class object in your diagram, visually indicating it's part of the Primary Key. Do not put "NOT NULL" or any variation into your value field, as tedia2sql will add "NOT NULL" automatically to all attributes that participate in the primary key.

Operations for Tables

Operations are used to create either indexes or permissions (grant) statements.

In the case of creating indexes, you want to fill out the operation as follows:

  • Operation Name: Index name, ie: idx_person_phone
  • Operation Type: Indext type, ie: index or unique index
  • Parameters: Index columns -- Each parameter name is the column name
  • Stereotype: Index type -- For PostgreSQL, you can choose other than btree.

Example statement created:

			create unique index INDEXNAME on CLASSNAME (param1, param2)
		

In the case of creating permissions statments, you should fill out the operation thusly:

  • Operation Name: Permission scope, ie: select, insert, or all
  • Operation Type: Permission type, ie: grant
  • Parameters: Roles given the permission -- each parameter name is a role name given the permission.

Each parameter name given will create a separate permissions statement in the DDL file.

Example statements created:

			grant select,insert,update on CLASSNAME to param1 
grant select,insert,update on CLASSNAME to param2
You can ignore the Stereotype, Visibility, Class Scope, Query, and Inheritance types of the Operation in both Index and Permissions cases.

You can also ignore the Type and Default Value fields for parameters.

Attributes for Views

If you have chosen the Abstract checkbox for the Class, then this represents a view. Each attribute will be a column. If your view joins multiple tables, you should prefix each column with the table name or table alias, ie: table.column.

You do not need to fill in the type or value fields of the attributes, since a view doesn't have such a concept.

Each attribute name will be turned into a single part of the select statement for the view, so you can get creative here, such as:

			tab1.name || '.' || tab2.owner as owner.
		

Operations for Views

Each operation will become part of the from, where, order by, group by, and having clauses of your view. The operation name is the argument to the select section, and the type is the section it belongs to.

For instance, you might want two parts in your where clause. Thus, you'd choose for two different operations:

			name: (tab1.col1 = 'hello') type: where 
name: and (tab2.col1 = 7) type: where
Note that you should include the 'and' or 'or' in the Operation name field for the second and subsequent parts of the where clause.

UML Associations

Make sure both sides of your association are really attached to a connection point on the classes, or your SQL DDL won't properly create foreign key statements; an error message will be printed.

Association Name

If omitted, tedia2sql will generate a name for the association foreign key constraints and and for the centre (join) table for many-to-many associations.

ERD mode

This will be the name of the foreign key created. Commonly, DBAs name their foreign keys to give an idea of what two tables are related, eg: a Person and Account tables are related by a foreign key constriant called fk_prsn_acct. The name of the automatically generated name in this case would be person_fk_accountId, if the name of the primary key is id (realistically, of course, we should expect that Person-Account would be many-to-many, but we ignore for now the possibility of joint accounts).

UML mode

In UML mode, the automatically generated name is always used for the name of the foreign key constriant. The relationship name may be used for documentation purposes.

Role

Each role is the name of an attribute in the class. This must be correct for the foreign-key statement to be syntactically correct. For one-to associations, the 'one' end of the association must be the (comma-separated) name(s) of the primary key's attributes, or the attributes of a unique index, or the name of a unique index on the table. Multiple names in a role must be in the same order as they appear in the table or respectively in the unique index's argument list.

At the other end, the role must be the name(s) of the foreign key attributes.

The types of corresponding primary and foreign key names must be the same.

If the role is omitted at the 'one' end, the primary key is used automatically as the attribute name at that end; if the role is omitted at the other end, a generated name (or names) is used (see Name Generation below).

In the Person/Account/id example above, if the role at the 'one' end is omitted it will use id as the primary key, and personId as the foreign key name if the role name at the 'many' end is omitted.

Automatic key generation

The -p name:type flag allows the user to omit primary keys from the Dia UML diagram; if a primary key is needed in a table, and there is no primary key marked in the class, then the primary key with the given name and type will be added.

The -f flag allows for the automatic addition of the required foreign key columns to the generated tables if the (specified or generated) foreign key names do not exist in the class. The generated entries take their type from the corresponding primary key names.

The key names are generated using the names as they would be expected - if a role name is used, then the generated key uses that name (or names); otherwise it uses the automatically generated name(s) as described above.

Automatic key generation works in both ERD and UML modes, and allows the user to eliminate much of what is really relational DBMS implementation detail from the data model diagram.

Aggregate/Composition

ERD mode

Whichever side is aggregate or composition is the 'many' side in the one-to-many relationship. The mode does not distinguish between aggregate and composition.

To create a one-to-one relationship, do not make either end of the relationship aggregate, but choose the proper direction to decide which is the parent table and which the child. I believe, but am not certain today (October 6th 2002) that the child table is really either 0 or 1 entry, and the parent 1, so that this is really one-to-0,1.

UML mode

Whichever side is aggregate or composition is the 'one' side in the one-to-many relationship. The multiplicity of the other side may be specified in the usual way.

If the relationship is aggregate, a on delete set NULL is added to the foreign key constraint in databases that support it; if it is a composition, then on delete cascade is added instead.

Multiplicity

ERD mode

If you would like to add constraint enforcement clauses, such as Oracle:

alter table child add foreign key (idx_iiparent_id) references parent (id) on delete cascade;

Then add the on delete cascade or other constraint enforcement into the Multiplicity text box. Currently, this constraint enforcement clause is only inserted for InnoDB, Postgres and Oracle, but if your DBMS of choice allows this, by all means submit a (trivial) patch to generate this syntax.

UML mode

In UML mode, Multiplicity is just multiplicity. If it is explicitly specified, it must be 1 (or 1..1) for the 'one' end of an aggregation or composition association.

Many-to-many associations

Both modes allow the specification of many-to-many relationships. They are signified by an unadorned relationship, using the multiplicity as it is intended in a UML diagram. The multiplicity of both ends of the relation must be specified for the association to be treated as many-to-many.

For example, in a more realistic Person/Account/id case where there can be joint accounts, and Persons are not only account-holders, specify the Person side as 1..* in the association (an Account must have at least one account holder), and the Account side as 0..* (Persons are not necessarily account-holders, but may have as many accounts as they wish).

A centre (join) table will be generated; its name will be PersonAccount (or AccountPerson; the A, or left, name is used first), unless the association is named; in which case the name of the association is used as the name of the table.

If role names are not used, and the name of the primary key in both tables is id, then the table will be generated with columns personId and accountId; their types are taken from the types of the corresponding primary keys. The foreign key constraints in the generated table will have an on delete cascade or on delete set NULL clause if it is supported by the database.

Role names can be used to control the names of the columns in the centre table. The role name has the syntax [fkName][:pkName].

The private key part of the role name names the private key attribute(s) (or the name or attribute(s) of a unique index on the table) of the class at that (the referring) end of the association. This part of the role name would normally only be used to refer to a unique index, since the default is to use the private key attributes.

The foreign key part of the name at the referring end of an association is used to name the column that holds the foreign key for the end of the association referred to.

For example, if there is a unique index on accountNumber in Account and a unique index on ssn in Person, then an association between them using the indexes instead of the primary keys could be set up with a role name :accountNumber at the Account end of the association, and :ssn at the Person end. If control of the names of the columns in the centre (join) table is also required, then the role names might be person_ssn:accountNumber at the Account end of the association, and account_acNum:ssn at the Person end.

The foreign key naming scheme is consistent with the convention used for one-to-many relations: The name for the foreign key is at the opposite end of the association from the table being referred to.

One-to-one associations in UML mode

Wherever possible UML mode makes use of multiplicity to determine the 'owner' end of a foreign key relationship. The one instance where this is not possible is in an unadorned (i.e. not composition or aggregate) one-to-one association. In this case, the same convention of using the association direction arrow to indicate the 'owner'; the arrow points in the direction of the 'owner'/table containing the primary key.

Views and associations

In both modes, many-to-many associations are not permitted if either side of the association is a view (marked as a UML abstract class). Otherwise, version 1.2.10 tries to generate the same SQL DDL as 1.2.9 where there are associations involving views. Both cases probably generate bad SQL DDL, since "alter table" (not view) statements are generated, and if the view were alterable, then the constraints on the underlying base tables should apply.

If the "one" (owner) end of a one-to-many (or one-to-one) association is a view, the role must name the "primary key". Primary key and foreign key attributes will not be automatically generated for views if the -p or -f flags are used.

Placeholders - For Multi-page and Multi-file Models

It can be convenient to lay out large data models so that logical parts of them print on single sheets of paper, or even to split large models over several files. tedia2sql versions from 1.2.10b4 allow the definition of "placeholder" classes to make this more convenient (and in the case of multi-file models, possible).

A placeholder class is a class with the same name as a real class, but with no attributes or operations, and with a stereotype <<placeholder>>. Any associations between placeholder classes and real classes (or even between pairs of placeholders, though that is probably poor style) will act as though the association with the placeholder class is actually with its real class counterpart.

If an association is needed between a class and a class on another page, or in another file, create a placeholder class on the page where you want to place the association, and make the association to the placeholder. The association will act as though it refers to the corresponding real class.

Overuse of placeholders can probably render data models almost unreadable.

When a placeholder class is processed, in tedia2sql, its corresponding real class must be accessible, either in the same file, or in another file being processed at the same time. If a.dia contains class MyClass and b.dia contains a placeholder <<placeholder>>MyClass, then both a.dia and b.dia must appear in the argument list of tedia2sql so that <<placeholder>>MyClass can be processed correctly.

UML Components

The UML Component is used to insert initial values into tables in the SQL DDL. This is useful for codes tables and the like. tedia2sql uses the Component 'stereotype' as the table that will be inserted into, including columns if you don't want to insert all column values. The text inside the Component become the list of values that are inserted into the table.

Stereotype

This value simply becomes the table name in the insert statement, thusly:
			insert into stereotype values ( . . . )
		

Component Text

The text you type in simply become the values to put inside the values clause of the insert statement, thusly:
			insert . . . values ( componentText )
		
Each newline denotes a new set of values to insert. You must specify every column in your table, even if the column is nullable or has a default value. If you want to generate a SQL statement thusly:
			insert into stereotype ( col1, col2 ) values ( 1, 2) 
		
Then you should define your Component stereotype as name ( col1, col2 ) and define the columns correctly after that.

UML SmallPackages

The UML SmallPackage is how tedia2sql inserts database-specific SQL that you want to include in your schema. This was specifically implemented for dropping and creating sequences and triggers, but it could be any RDBMS-specific SQL you want.

The SQL can be applied to the database generally, before or after ("pre" and "post" stereotypes), or appended to tables ("table" stereotypes), indexes ("index" stereotypes) and primary keys ("pk" stereotypes).

Stereotype

This value should be a comma-separated list of the databases for which you want to generate the SQL. Use all lowercase. Valid values are: sybase, oracle, db2, postgres, mysql, mssql, informix. In other words, any string valid to pass to the -t option is (once made lowercase) valid here.

Pre and Post

At the end of your comma-separated list of databases, you must also put a colon then "pre" or "post" to instruct tedia2sql to place these special SQL statements before (pre) or after (post) the schema (table) statements.

Typically, sequences would be :pre statements, and triggers would be :post.

The Stereotype might be "oracle,postgres:pre" meaning put these SQL statements before the table create statements, and put this SQL only for Oracle and Postgres databases.

Table

Adds the SmallPackage text as SQL statements between the closing ")" of the table column definitions and the create table statement end. After the comma-separated list of databases, put :table, and then a comma-separated list of table names (including generated table names), enclosed in parentheses, that are to have the extra clauses added.

For example, a Sterotype oracle:table(Person,Account) could be used to add a storage clause to the Person and Account tables for the example above.

		<<oracle:table(Person,Account)>>
		    storage (initial 10M next 1M pctincrease 0)
		

Index

Adds the SmallPackage text as SQL statements between the closing ")" of the column name list and the create index statement end. Similar syntax to the table specification, but use :index and the list of names is the list of index names that are to have the extra SQL clauses.

Primary Key

Adds the SmallPackage text as SQL statements between the closing ")" of the primary key name list and the constraint statement end. Similar syntax to the table specification, but use :pk and the list of names is the list of table names that are to have the extra SQL clauses.

Columns

Adds the SmallPackage text to the column definition part of the named tables. The text is inserted after the column definitions and the primary key constraint (if there is one), and just before the closing parenthesis. Use :columns in the SmallPackage name. For example, to check the type column of PersonType:

		<<oracle:columns(PersonType)>>
		constraint perstype_type
		    check (lower(type)) in ('staff', 'accountholder', 'other')
		

SmallPackage Text

For all of the above, this text is passed as-is into the SQL DDL. Thus, you must create valid SQL for the RDBMSs you put into Stereotype into this area.

Type map

A "typemap" stereotype can be used to remap types used in the UML diagram to types used in the generated SQL. The type map itself is in the SmallPackage text and is a list of semicolon-separated type definitions, with the base type on the left of a colon, and a comma-separated list of new defined types on the right. The database name list makes the mapping specific to the databases listed.

		<<oracle: typemap>>
		int: number(10);
		string, character: varchar2;
		fixed3: number(25.3);
		

The interpretation is recursive (a type may be defined in terms of another defined type), and because the interpretation doesn't occur until after all types have been defined, may be in any order. The interpretation stops when no further mappings are applicable, which means that if a typemap entry does not exist for a type, the type in the UML is passed through unchanged to the SQL. To guard against infinite recursion the number of mappings to remap any single type is limited to about 100.

Type interpretation occurs late; only when the SQL statements are being generated. For other purposes, only the user types are considered. For example, for checking the types of foreign vs private keys in a relationship, tedia2sql will consider int and number(10) to be different types even if you use the typemap above.

Class and Attribute Comments

Version 0.91 (and later) of dia allows comments in classes and on class attributes. Class comments (in the Class tab of the Class Properties popup window) are passed through into the generated SQL - placed just before the table definition for the class. Class attribute comments (in the Attributes tab of Class Properties popup window) are placed inline after the row in the table definition that corresponds to the attribute.

The implementation of this feature allows classes and attributes to not have <comment> attributes in the dia files, so it is backward-compatible with pre 0.91 files that do not have comments on classes.

UML Notes text is not passed through to the SQL, though that would probably be useful.

Name Generation

The name generation code in tedia2sql follows a fairly simple set of rules.

Table Names

The only generated table names are for centre (join) tables in many-to-many associations. They are constructed by concatenating class name on the left (dia 'A') side with the class name on the right (dia 'B') side of the association. The left side name is unmodified; the first character of the right side name is capitalised. So an unnamed many-to-many association between person and account would have a centre table named personAccount.

The automatic name generation can be overridden by naming the association.

Foreign Key Names

A foreign key name is generated from the name of the class that it refers to and the name of the (element of) the primary key of that class.

Any leading capitalised part of the class name is converted to lower case, and the first part of the primary key name is capitalised. The "leading capitalised part" is either a single capital at the start, or the string of all but the last capitals if there are more than one: the leading capitalised part of the class name UKPrimeMinisters is UK (not UKP).

In the example many-to-one relationship Account to Person (a person may have multiple accounts, but there are no joint accounts), if the primary key of Person is id, then the generated foreign key name for the relationship is personId.

Primary key constraint names

Primary key constraint names are built by prefixing the string pk_ to the capitalised class name.The primary key constraint name for Person is pk_Person. This rule is followed for the constraint names in tables with generated names.

Foreign key constraint names

Foreign key constraint names are generated by concatenating the lower-cased (as for the leading part of foreigh key names) referenced class name, the string _fk_, the capitalised class name of the referencing class, followed by the capitalised name (of the first element of) the primaty key of the referenced class; this part after the _fk_ is essentially the capitalised name of the foreign key.

Name shortening

If a generated name is too long, it is shortened to the maximum name allowed by the DBMS.

The name is first shortened by removing vowels from the name; part at a time, starting with the first part.

If this is insufficient, the centre of each of the parts of the name is taken out. Up to this point, names remain more-or-less comprehensible.

If the name is still, a base-64 MD5 checksum is constructed for the whole name and that is used for the name of the table (shortened if necessary by truncation). This results in a most-likely-unique, but incomprehensible, name.

De gustibus non disputandum est

Name conventions in programming are a matter of taste. What's been implemented suits the author of the name generation code and the project he's working on. It may not suit you.

If it doesn't, the code for name generation is all in sub makeName, please feel free to add code to suit other name generation conventions with suitable switches for enabling it. Try not to change the code in ways that might change the table names or column names in existing DIAgrams.