Site Maintenance: Dec 6th 7:00 PM PST to Dec 7th 7:00 AM PST
The Site is scheduled for maintenance and will be unavailable during the period. Sorry about the inconvenience.
There will be a brief maintenance window every Friday at 17:00 Pacific.
For further details, see CollabNet's maintenance and upgrade policy.
Using Dia to Interact With tedia2sql
Notes on Case
In general, you should try to use lowercase when dealing with tedia2sql and
dealing with Dia in preparation for using tedia2sql to generate SQL DDL. Where
you don't use lowercase, assume tedia2sql will be case-sensitive. In the
near future, tedia2sql will support an option to convert every object, datatype,
and keyword in your schema to UPPERCASE or lowercase.
In any case, tedia2sql prefers to get commands in lowercase.
Attempts have been made to allow ALL UPPERCASE in such items as "NULL" or
"NOT NULL" or whatever, but it is known that the code will perform best if you
use lowercase when passing keywords to tedia2sql. In some cases, tedia2sql will
convert your input to lowercase when dealing with it.
It bears repeating, soon there will be an option you can pass to tedia2sql to
specify you want keywords, datatypes, and the like in ALL UPPERCASE, so you
shouldn't sweat this small detail right now.
UML and ERD
tedia2sql can interpret a Dia UML diagram in two different
modes. The default is to use the ERD interpretation, where
the diamond UML 'aggregation' symbol represents the ERD
'crows foot' symbol for the 'many' end of a one-to-many
relationship, and the relationship direction symbol
is used to indicate the 'owner' end of a one-to-one,
or one-to-zero-or-one relationship. The relationship direction
arrow points towards the 'owner' end.
In UML mode, using the -u flag, the conventional UML interpretation
of aggregation and composition associations are used. The direction symbol
is still used to indicate the 'owner' end of one-to-one associations
that do not have aggregation or composition semantics.
Creating the Dia UML Diagram
Start up your copy of Dia. Choose the UML shape set.
It's important to note that you're editing a UML diagram, and not a database
ERD. Thus, we don't call things "tables" and "columns," but instead, we call
them "classes" and "attributes." However, since we're all DBAs here, sometimes
we lapse into table/column terminology when it's convenient or when we forget.
For the purpose of this document, they're synonymous. Here's a short list of
- class == table == relation (== view)
- attribute == column
- association == foreign-key relationship == FK constraint
- primary key == PK constraint
In general, you should find the following instructions mostly intuitive. In
fact, you might start creating Classes and Associations right away without
reading further, pausing to read these instructions again when you're ready
to create views, figure out how to specify datatypes, nullability, primary keys,
indexes, permissions, etc.
Create a new class and pick its background colour. Whenever you want to create
another class of the same style, copy this generic class and paste it to make
the new one.
Edit your class. Set the name to whatever you want. Note that the generated
SQL DDL script will use the same case as you name your class here. This doesn't
matter on RDBMSs like Oracle and Postgres that are case-insensitive, but it
does matter on others like Sybase that are case-sensitive.
If you choose the "Abstract" checkbox, this will represent a view, rather than
a table, in the final DDL.
Attributes for Tables
Each Attribute that you create will become a column in your table. The Attribute
name and type will become column name/type.
Attribute types should be ANSI SQL 1992 (SQL'92) standard datatypes. Of course you can
use datatypes for your target database, but this will limit portability of your
schema. If you use SQL'92 datatypes, tedia2sql will automatically pick
a datatype for your RDBMS if yours does not support SQL'92 datatypes. For example,
if you use "timestamp" as a datatype and generate SQL DDL for Oracle, it will
automatically convert this to "date".
The Value field can be "NULL" "NOT NULL" to generate DDL indicating this
column is NULLable or not NULLable.
Value can also indicate default values: "default defaultValue" will specify
that the column should default to defaultValue. Follow your RDBMS syntax
here. In most RDBMSs, if the column type is a string, you must put single-quotes
(or even, sometimes, double-quotes) around the default value, thusly: "default 'defaultValue'".
You can mix the NULLability of the column and DEFAULT of the column as well. For
example, "default defaultValue NOT NULL" or "default defaultValue NULL"
will specify a default value of defaultValue and that the column is
not NULLable and NULLable respectively.
If your style (like mine) calls for not capitalising NULL and NOT NULL, you can
put them in lowercase. However, tedia2sql will not properly parse mixed-case "NULL"
and "NOT NULL". Choose either all uppercase or all lowercase and stick with it.
Set Visibility to "Protected" if you want this Attribute to be
part of the Primary Key. This will put the "pound sign" ("#")
to the left of the Attribute on the Class object in your
diagram, visually indicating it's part of the Primary Key. Do
not put "NOT NULL" or any variation into your value field, as
tedia2sql will add "NOT NULL" automatically to all attributes
that participate in the primary key.
Operations for Tables
Operations are used to create either indexes or permissions (grant) statements.
In the case of creating indexes, you want to fill out the operation as follows:
- Operation Name: Index name, ie: idx_person_phone
- Operation Type: Indext type, ie: index or unique index
- Parameters: Index columns -- Each parameter name is the column name
- Stereotype: Index type -- For PostgreSQL, you can choose other than btree.
Example statement created:
create unique index INDEXNAME on CLASSNAME (param1, param2)
In the case of creating permissions statments, you should fill out the operation thusly:
- Operation Name: Permission scope, ie: select, insert, or all
- Operation Type: Permission type, ie: grant
- Parameters: Roles given the permission -- each parameter name is
a role name given the permission.
Each parameter name given will create a separate permissions statement in the
Example statements created:
grant select,insert,update on CLASSNAME to param1
grant select,insert,update on CLASSNAME to param2
You can ignore the Stereotype, Visibility, Class Scope, Query, and Inheritance types of
the Operation in both Index and Permissions cases.
You can also ignore the Type and Default Value fields for parameters.
Attributes for Views
If you have chosen the Abstract checkbox for the Class, then this represents a
view. Each attribute will be a column. If your view joins multiple tables, you
should prefix each column with the table name or table alias, ie: table.column
You do not need to fill in the type or value fields of the attributes, since
a view doesn't have such a concept.
Each attribute name will be turned into a single part of the select statement
for the view, so you can get creative here, such as:
tab1.name || '.' || tab2.owner as owner.
Operations for Views
Each operation will become part of the from, where, order by, group by, and having
clauses of your view. The operation name is the argument to the select section,
and the type is the section it belongs to.
For instance, you might want two parts in your where clause. Thus, you'd choose
for two different operations:
name: (tab1.col1 = 'hello') type: where
name: and (tab2.col1 = 7) type: where
Note that you should include the 'and' or 'or' in the Operation name field for
the second and subsequent parts of the where clause.
Make sure both sides of your association are really attached to a connection
point on the classes, or your SQL DDL won't properly create foreign key
statements; an error message will be printed.
If omitted, tedia2sql will generate a name for the association
foreign key constraints and and for the centre (join) table
for many-to-many associations.
This will be the name of the foreign key created. Commonly,
DBAs name their foreign keys to give an idea of what two tables
are related, eg: a Person and Account tables are related
by a foreign key constriant called fk_prsn_acct
The name of the automatically generated name in this case
would be person_fk_accountId
, if the name of the
primary key is id
(realistically, of course, we
should expect that Person-Account would be many-to-many,
but we ignore for now the possibility of joint accounts).
In UML mode, the automatically generated name is always used
for the name of the foreign key constriant. The relationship
name may be used for documentation purposes.
Each role is the name of an attribute in the class. This must
be correct for the foreign-key statement to be syntactically
correct. For one-to
associations, the 'one' end of
the association must be the (comma-separated) name(s) of the
primary key's attributes, or the attributes of a unique index,
or the name of a unique index on the table. Multiple names in a
role must be in the same order as they appear in the table or
respectively in the unique index's argument list.
At the other end, the role must be the name(s) of the foreign key
The types of corresponding primary and foreign key names must be the same.
If the role is omitted at the 'one' end, the primary key
is used automatically as the attribute name at that end;
if the role is omitted at the other end, a generated name
(or names) is used (see Name Generation below).
In the Person/Account/id example above,
if the role at the 'one' end is omitted it will use id
as the primary key, and personId as the foreign key
name if the role name at the 'many' end is omitted.
Automatic key generation
The -p name:type
flag allows the user to omit primary keys
from the Dia UML diagram; if a primary key is needed in a table, and
there is no primary key marked in the class, then the primary key
with the given name and type will be added.
The -f flag allows for the automatic addition of the required
foreign key columns to the generated tables if the (specified or
generated) foreign key names do not exist in the class. The generated
entries take their type from the corresponding primary key names.
The key names are generated using the names as they would be expected
- if a role name is used, then the generated key uses that name (or names);
otherwise it uses the automatically generated name(s) as described above.
Automatic key generation works in both ERD and UML modes,
and allows the user to eliminate much of what is really
relational DBMS implementation detail from the data
Whichever side is aggregate or composition is the 'many' side in
the one-to-many relationship. The mode does not distinguish between
aggregate and composition.
To create a one-to-one relationship, do not make either end of the relationship
aggregate, but choose the proper direction to decide which is the parent
table and which the child. I believe, but am not certain today (October 6th 2002)
that the child table is really either 0 or 1 entry, and the parent 1, so that
this is really one-to-0,1.
Whichever side is aggregate or composition is the 'one'
side in the one-to-many relationship. The multiplicity of
the other side may be specified in the usual way.
If the relationship is aggregate, a
on delete set
NULL is added to the foreign key constraint in
databases that support it; if it is a composition, then
on delete cascade is added instead.
If you would like to add constraint enforcement clauses, such as Oracle:
alter table child add foreign key (idx_iiparent_id) references parent (id) on delete cascade;
Then add the
on delete cascade or other constraint
enforcement into the Multiplicity text box. Currently, this
constraint enforcement clause is only inserted for InnoDB,
Postgres and Oracle, but if your DBMS of choice allows this,
by all means submit a (trivial) patch to generate this syntax.
In UML mode, Multiplicity is just multiplicity. If it is
explicitly specified, it must be 1 (or 1..1) for the 'one'
end of an aggregation or composition association.
Both modes allow the specification of many-to-many relationships.
They are signified by an unadorned relationship, using the multiplicity
as it is intended in a UML diagram. The multiplicity of both ends of
the relation must be specified for the association to be treated as
For example, in a more realistic Person/Account/id
case where there can be joint accounts, and Persons are not only
account-holders, specify the Person side as 1..* in the association
(an Account must have at least one account holder), and the Account
side as 0..* (Persons are not necessarily account-holders, but may
have as many accounts as they wish).
A centre (join) table will be generated; its name will be
PersonAccount (or AccountPerson; the A, or left, name is
used first), unless the association is named; in which case
the name of the association is used as the name of the table.
If role names are not used, and the name of the primary key
in both tables is id, then the table will be generated
with columns personId and accountId; their
types are taken from the types of the corresponding primary
keys. The foreign key constraints in the generated table will
on delete cascade or
set NULL clause if it is supported by the database.
Role names can be used to control the names of the
columns in the centre table. The role name has the
The private key part of the role name names the private key
attribute(s) (or the name or attribute(s) of a unique index
on the table) of the class at that (the referring)
end of the association. This part of the role name would normally
only be used to refer to a unique index, since the default is to use
the private key attributes.
The foreign key part of the name at the referring
end of an association is used to name the column that holds
the foreign key for the end of the association referred
For example, if there is a unique index on accountNumber
in Account and a unique index on ssn in
Person, then an association between them using the
indexes instead of the primary keys could be set up with a
:accountNumber at the Account
end of the association, and
:ssn at the Person
end. If control of the names of the columns in the centre
(join) table is also required, then the role names might be
person_ssn:accountNumber at the Account
end of the association, and
at the Person end.
The foreign key naming scheme is consistent with the
convention used for one-to-many relations: The name for the
foreign key is at the opposite end of the association from
the table being referred to.
One-to-one associations in UML mode
Wherever possible UML mode makes use of multiplicity
to determine the 'owner' end of a foreign key
relationship. The one instance where this is not possible
is in an unadorned (i.e. not composition or aggregate)
one-to-one association. In this case, the same convention
of using the association direction arrow to indicate
the 'owner'; the arrow points in the direction of the
'owner'/table containing the primary key.
Views and associations
In both modes, many-to-many associations are not permitted
if either side of the association is a view (marked as
a UML abstract class). Otherwise, version 1.2.10 tries
to generate the same SQL DDL as 1.2.9 where there are
associations involving views. Both cases probably generate
bad SQL DDL, since "alter table" (not view) statements
are generated, and if the view were alterable, then the
constraints on the underlying base tables should apply.
If the "one" (owner) end of a one-to-many (or one-to-one)
association is a view, the role must name the "primary
key". Primary key and foreign key attributes will not be
automatically generated for views if the -p or
-f flags are used.
Placeholders - For Multi-page and Multi-file Models
It can be convenient to lay out large data models so
that logical parts of them print on single sheets of
paper, or even to split large models over several files.
tedia2sql versions from 1.2.10b4 allow the definition
of "placeholder" classes to make this more convenient
(and in the case of multi-file models, possible).
A placeholder class is a class with the same name as a
real class, but with no attributes or operations, and with
Any associations between placeholder classes and real
classes (or even between pairs of placeholders, though
that is probably poor style) will act as though the
association with the placeholder class is actually with
its real class counterpart.
If an association is needed between a class and a
class on another page, or in another file, create
a placeholder class on the page where you want to
place the association, and make the association to the
placeholder. The association will act as though it refers
to the corresponding real class.
Overuse of placeholders can probably render data models almost unreadable.
When a placeholder class is processed, in tedia2sql,
its corresponding real class must be accessible,
either in the same file, or in another file
being processed at the same time. If
a.dia contains class
b.dia contains a placeholder
must appear in the argument list of tedia2sql so that
<<placeholder>>MyClass can be
The UML Component is used to insert initial values into
tables in the SQL DDL. This is useful for codes tables
and the like. tedia2sql uses the Component 'stereotype'
as the table that will be inserted into, including columns
if you don't want to insert all column values. The text
inside the Component become the list of values that are
inserted into the table.
This value simply becomes the table name in the insert
insert into stereotype values ( . . . )
The text you type in simply become the values to put
inside the values clause of the insert statement, thusly:
insert . . . values ( componentText )
Each newline denotes a new set of values to insert. You
must specify every column in your table, even if the
column is nullable or has a default value. If you want
to generate a SQL statement thusly:
insert into stereotype ( col1, col2 ) values ( 1, 2)
Then you should define your Component stereotype as name ( col1, col2 )
and define the columns correctly after that.
The UML SmallPackage is how tedia2sql inserts
database-specific SQL that you want to include in your
schema. This was specifically implemented for dropping
and creating sequences and triggers, but it could be
any RDBMS-specific SQL you want.
The SQL can be applied to the database generally, before
or after ("pre" and "post" stereotypes), or appended
to tables ("table" stereotypes), indexes ("index"
stereotypes) and primary keys ("pk" stereotypes).
This value should be a comma-separated list of the
databases for which you want to generate the SQL. Use all
lowercase. Valid values are: sybase, oracle,
db2, postgres, mysql, mssql,
informix. In other words, any string valid to
pass to the -t option is (once made lowercase)
Pre and Post
At the end of your comma-separated list of databases,
you must also put a colon then "pre" or "post" to
instruct tedia2sql to place these special SQL statements
before (pre) or after (post) the schema
Typically, sequences would be :pre statements, and
triggers would be :post.
The Stereotype might be "oracle,postgres:pre" meaning put
these SQL statements before the table create statements,
and put this SQL only for Oracle and Postgres databases.
Adds the SmallPackage text as SQL statements between
the closing ")" of the table column definitions
and the create table statement end.
After the comma-separated list of databases, put
:table, and then a comma-separated list of
table names (including generated table names), enclosed
in parentheses, that are to have the extra clauses added.
For example, a Sterotype
oracle:table(Person,Account) could be used to
add a storage clause to the Person and Account tables for
the example above.
storage (initial 10M next 1M pctincrease 0)
Adds the SmallPackage text as SQL statements between the
closing ")" of the column name list and the create
index statement end. Similar syntax to the table
specification, but use
:index and the list
of names is the list of index names that are to have
the extra SQL clauses.
Adds the SmallPackage text as SQL statements between
the closing ")" of the primary key name list and the
constraint statement end. Similar syntax to the
table specification, but use
:pk and the
list of names is the list of table names that are to
have the extra SQL clauses.
Adds the SmallPackage text to the column definition
part of the named tables. The text is inserted
after the column definitions and the primary key
constraint (if there is one), and just before the
closing parenthesis. Use
:columns in the
SmallPackage name. For example, to check the type
column of PersonType:
check (lower(type)) in ('staff', 'accountholder', 'other')
For all of the above, this text is passed as-is into the SQL
DDL. Thus, you must create valid SQL for the RDBMSs you put
into Stereotype into this area.
A "typemap" stereotype can be used to remap types used in
the UML diagram to types used in the generated SQL. The
type map itself is in the SmallPackage text and is a list
of semicolon-separated type definitions, with the base type
on the left of a colon, and a comma-separated list of new
defined types on the right. The database name list makes the
mapping specific to the databases listed.
string, character: varchar2;
The interpretation is recursive (a type may be defined
in terms of another defined type), and because the
interpretation doesn't occur until after all types have
been defined, may be in any order. The interpretation
stops when no further mappings are applicable, which
means that if a typemap entry does not exist for a type,
the type in the UML is passed through unchanged to the
SQL. To guard against infinite recursion the number of
mappings to remap any single type is limited to about 100.
Type interpretation occurs late; only when the
SQL statements are being generated. For other purposes,
only the user types are considered. For example,
for checking the types of foreign vs private keys in a
relationship, tedia2sql will consider
number(10) to be different
types even if you use the
Class and Attribute Comments
Version 0.91 (and later) of dia allows comments in classes and on
class attributes. Class comments (in the Class tab of
the Class Properties popup window) are passed through
into the generated SQL - placed just before the table
definition for the class. Class attribute comments
(in the Attributes tab of Class Properties popup window)
are placed inline after the row in the table definition
that corresponds to the attribute.
The implementation of this feature allows classes and
attributes to not have
in the dia files, so it is backward-compatible with pre 0.91
files that do not have comments on classes.
UML Notes text is not passed through to the SQL, though
that would probably be useful.
The name generation code in tedia2sql follows a fairly
simple set of rules.
The only generated table names are for centre
(join) tables in many-to-many associations. They are
constructed by concatenating class name on the left
(dia 'A') side with the class name on the right (dia
'B') side of the association. The left side name is
unmodified; the first character of the right side name
is capitalised. So an unnamed many-to-many association
have a centre table named
The automatic name generation can be overridden by naming
Foreign Key Names
A foreign key name is generated from the name of the
class that it refers to and the name of the
(element of) the primary key of that class.
Any leading capitalised part of the class name
is converted to lower case, and the first part of the
primary key name is capitalised. The "leading capitalised
part" is either a single capital at the start, or the
string of all but the last capitals if there are more
than one: the leading capitalised part of the class
In the example many-to-one relationship
Person (a person
may have multiple accounts, but there are no joint
accounts), if the primary key of
id, then the generated foreign key name
for the relationship is
Primary key constraint names
Primary key constraint names are built by prefixing the
pk_ to the capitalised class name.The
primary key constraint name for
pk_Person. This rule is followed for
the constraint names in tables with generated names.
Foreign key constraint names
Foreign key constraint names are generated by
concatenating the lower-cased (as for the leading part
of foreigh key names) referenced class name, the
_fk_, the capitalised class name of
the referencing class, followed by the capitalised
name (of the first element of) the primaty key of the
referenced class; this part after the
is essentially the capitalised name of the foreign key.
If a generated name is too long, it is shortened to the
maximum name allowed by the DBMS.
The name is first shortened by removing vowels from the name;
part at a time, starting with the first part.
If this is insufficient, the centre of each of the
parts of the name is taken out. Up to this point, names
remain more-or-less comprehensible.
If the name is still, a base-64 MD5 checksum
is constructed for the whole name and that is used
for the name of the table (shortened if necessary by
truncation). This results in a most-likely-unique,
but incomprehensible, name.
De gustibus non disputandum est
Name conventions in programming are a matter of
taste. What's been implemented suits the author of the
name generation code and the project he's working on. It
may not suit you.
If it doesn't, the code for name generation is all in
sub makeName, please feel free to add code
to suit other name generation conventions with suitable
switches for enabling it. Try not to change the code in
ways that might change the table names or column names
in existing DIAgrams.