Reverse Engineer a DB [4] – Visio

The professional edition of MS Visio has a database reverse-engineering feature; see the edition comparison chart. If new to the series, you may want to scroll down to the first two posts of the series, where I describe the sample database and the ODBC connection setup. What follows is a step-by-step reverse engineering procedure.

Step 1

Start a new file by selecting File –> NEW –> Software and Database –> Database Model Diagram.


 

Step 2

Once a new file opens, give it a name and save the file. After that, select Database –> Reverse Engineer from the top menu bar.


 

Step 3

The Reverse Engineering Wizard opens. First we have to select a driver to connect to the database — select Microsoft SQL Server from the list.


 

Step 4

When the driver is selected, the Data Source list is populated with entries from the ODBC  list which use the driver. I have added the database to the ODBC list in a previous post — select CodeDB from the list.


 

Step 5

Enter the password and click OK to continue.


 

Step 6

Select which objects to reverse-engineer. Visio offers tables, views, indexes, constraints and code (stored procedures and functions). Select all and click Next.


 

Step 7

A list of tables to include; as in the previous example (post) I have skipped the sysdiagrams table.


 

Step 8

A list of stored procedures to include. There is a small trick here: procedures starting with ‘sp_’ are internal to SQL Server. It is a bad practice to name your procedures using the ‘sp_’ prefix, so mine all start with  ‘p_’.


 

Step 9

Select Yes to add object shapes to the current page.


 

Step 10

Review the summary and click Finish.


 

Step 11

The initially generated diagram (left) looks quite good. By rearranging shapes and tinkering with the display options, I have created the one on the right. When moving shapes around, make sure not to accidentally disconnect a connector line from a shape: the connector represents a relationship object between tables, and disconnecting a line may remove a foreign key from a table.


 

Step 12

To inspect table columns, keys, triggers etc., click on a table in the diagram or in the table list.


 

Step 13

To examine stored procedures, switch to the code tab and double-click on a procedure name; the code editor opens. Among other features, the editor allows for specification of a mirror file in which to store the code outside Visio.


 

Reverse Engineer a DB [3] – Visual UML

In the last two posts I have described an existing database and created an ODBC connection to it, now it’s time to actually do some reverse-engineering. For me, the primary purpose of reverse-engineering is to document a database — the first step in any kind of refactoring or upgrade.

This should not be too complicated, the database was originally designed using Visual UML. An object modeling tool, Visual UML is not just for databases, it generates or reverse-engineers Java, VB/VBA, C++, etc. It is great for getting handle on projects where developers did not have much time for documentation. I have an older version (2.9.2) which still works fine, a newer (5.3) trial version is available for download here.

Step 1

To start with a new model, select File —> New Model, enter the model name and click OK.

Step 2

Now it is time to select a target language for the model, we need the SQL Server.

Step 3

From the Tools menu, select Reverse Engineer –> Database

Step 4

From ODBC DSN select the CodeDB from the dropdown list. The list is populated from ODBC data sources and we have added the CodeDB database as an ODBC source in the previous post.

Step 5

Fill-in the User ID, Password and the Database name – make sure they match what was entered during ODBC source definition.

Step 6

A list of tables from the database appears. I have selected all tables except the sysdiagram table, this one belongs to the system.

Step 7

Select yes to create a diagram.

 

Leave everything default and click OK. A new diagram is created and is quite messy, so I will have to clean it up later.

 

Step 8

Let us look at the list of objects created.

  • Tables are created as classes;
  • Database is an equivalent of a package;
  • Associations are created out of foreign key constraints;
  • There is one Class Diagram object that was created during the last step.

Time to clean-up the diagram. After some re-arranging and selecting what to display and what to hide, here is the result – not bad. You can compare it with the original from which the database was created. There are some minor differences:

  • In the original model, the primary key constraint was hidden (display), so it does not show explicitly in table attributes.
  • I did not explicitly create indexes for primary keys  in design stage, they were automatically created when the database was created. I could have decided not to process indexes during reverse-engineering, or to simply hide them from the diagram.
  • Arrows are reversed. On the reverse-engineered model they follow direction of one-to-many relationships, by default the "many" side has an arrow. On my original model, I have set up arrows to point into primary key, the ’1′ side of the relationship.  This is just my preference, similar to the key side of an SQL server diagram; arrow directions can be reversed by editing relationship properties.

My version of Visual UML did not offer to reverse-engineer stored procedures, however according to the website the latest version does support stored procedures and triggers.

When it comes to documenting databases, I find that understanding tables and relationships is the most important part, stored procedures and triggers can be uploaded using Management Studio or similar tool.


To be continued.

Reverse Engineer a DB [2] – ODBC setup

Database tools and applications frequently use ODBC for connection to databases. In the previous post I have described a database that we will play with. The database named CodeDB is located on a server named DAMIR5, running MS SQL Server 2005. This article provides a step-by-step procedure for creating an ODBC connection to the database.

Note: this setup is done on a "remote PC" (not the server) — the same PC that will be used to test all previously mentioned database tools.

Step 1

Under Administrative Tools select Data Sources (ODBC). The Data Source Administrator opens.
You may also use  Control Panel –-> Administrative Tools –> Data Sources (ODBC).

Step 2

The System DSN tab lists data sources available to all users on the machine, including services. This is where you would usually want to add a new source. If you want to create a data source for a current user only, use the User DSN tab. Click the Add button to create a new data source.

Step 3

Create New Data Source dialog opens. This dialog has a list of drivers for different databases, for MS SQL server 2005 use SQL Native Client. Select the driver and click Finish.

Step 4

A new dialog, for configuring the SQL Native Client opens.

  • Enter a name for the data source — I used CodeDB
  • Something for description
  • Server name or an IP address, for local database use "localhost"

Step 5

On the next dialog we define the user name and the password for the database, enter whatever is appropriate for your database.

Step 6

Select the default database and check the ANSI options.

Step 7

Almost there, leave everything as per default and click Finish.

Step 8

The summary, click on the Test Data Source button.

Step 9

And voilà, click OK to close the dialog.

Step 10

We are back at the beginning, this time the new CodeDB data source is listed under System Data Sources. Click OK to finish and exit the dialog. From this point on, this database connection is available to any application which can use an ODBC data source.

Reverse Engineer a DB [1]

In the next few posts I will tinker with database modelling and reverse-engineering tools. I plan to use: Visio, Visual UML, Erwin, Toad, Oracle Designer, DB Explorer, Squirrel, and maybe some more. A Google search for "database modeling tools" returns quite a list. But, one step at a time; first thing we need is a database.

A few years ago, I put together a simple code repository for one of my Excel VBA projects. The idea was to store separate chunks of code in a database and dynamically assemble the project code as required. The model I chose follows a typical book layout: sentence, paragraph, chapter, book. Originally I designed it in Visual UML — see the diagram — and implemented it in MS SQL Server.

Entity (Table) Book Equivalent Excel VBA Object
Snippet Sentence Chunk of code, one line minimum
Block Paragraph Procedure, Function, Comment block
Module Chapter Code module, Class, Form
Project Book VBA Project

 

Snippet is a chunk of code, minimum one line, up to a whole procedure. Block has many snippets, one snippet can belong to many blocks. For example, if a block represents a subroutine, a snippet may be ‘End Sub’. Block_mm_Snippet is a many-to-many relationship table and also holds the position of a snippet within a block. The Module table represents a code module, a class or a form in VBA. One module can have many blocks, one block can belong to many modules — keep in mind that not all modules have to belong to the same project. A project has many modules, a module can belong to several projects.

Although it worked fine for me, the model is far from ideal. It is obviously missing a few constraints. Snippet TXT field should be unique, so should relationship keys in Module_mm_Block and Project_mm_Module tables to prevent repeating a procedure within a module, or repeating a module within a project. I could also add version fields to Block, Module, and Project tables. It will be interesting to see how changes can be made in the physical or logical layer in a diagram and deployed back to the existing database.

To be continued.

 

Bulk change post categories in WordPress

The following code moves all WordPress posts from one category to another. Both categories must already exists in the database. The from_category is not deleted form the database, only posts are moved. Needless to say, do backup your database before trying this.
Make sure to specify category slugs (not names) in the first two lines of the code. Comments tell the story; you may also want to look at the database model.

-- Move all post from one category to another;
-- use category SLUG, not name.
-- Define categories here.
SET @from_cat_slug = _utf8'from_category_slug_here';
SET @to_cat_slug = _utf8'to_category_slug_here';
 
-- Fetch term_taxonomy_id for both categories.
SELECT @from_cat_tax_id := tax.`term_taxonomy_id`
    FROM `wp_term_taxonomy` AS tax JOIN
         `wp_terms` AS trm ON tax.`term_id` = trm.`term_id`
    WHERE trm.`slug` = @from_cat_slug
        AND tax.taxonomy = 'category';
 
SELECT @to_cat_tax_id := tax.`term_taxonomy_id`
    FROM `wp_term_taxonomy` AS tax JOIN
         `wp_terms` AS trm ON tax.`term_id` = trm.`term_id`
    WHERE trm.`slug` = @to_cat_slug
        AND tax.taxonomy = 'category'
 
-- Create table to hold post ids.
CREATE TEMPORARY TABLE `ds_cat_mrg` (`id` BIGINT);
 
-- List all posts linked to the from_category.
INSERT INTO ds_cat_mrg (`id`)
    SELECT `object_id`
    FROM `wp_term_relationships`
    WHERE `term_taxonomy_id` = @from_cat_tax_id;
 
-- Delete links between these posts and both categories.
DELETE FROM `wp_term_relationships`
    WHERE `object_id` IN (SELECT `id` FROM `ds_cat_mrg`)
    AND `term_taxonomy_id`
               IN (@to_cat_tax_id, @from_cat_tax_id);
 
-- Insert links to the new category.
INSERT INTO `wp_term_relationships`
        (`object_id`, `term_taxonomy_id`)
    SELECT `id`, @to_cat_tax_id FROM `ds_cat_mrg`;

 

Coral8 – Randomly Sampling a Stream

The example below randomly samples an input stream and inserts samples into a named window for later processing. A variable named Dice is "rolled" each time a new row enters the input stream. A row is copied into the window if Dice = 1.

-- Variable for random number
CREATE VARIABLE INTEGER Dice;
CREATE VARIABLE INTEGER DiceSides = 6;
 
-- Input stream
CREATE INPUT STREAM StreamIn
SCHEMA (
    Val INTEGER
);
 
-- Each time a row enters the input stream,
-- roll the Dice
ON StreamIn
SET Dice =
    TO_INTEGER(RANDOM() * TO_FLOAT(DiceSides)) + 1;
 
-- Some data entering the input stream
ATTACH INPUT ADAPTER RMG
    TYPE RandomTuplesGeneratorAdapterType
TO STREAM StreamIn
PROPERTIES
    RATE     = "20"
    ,ROWCOUNT = "600"
;
 
-- Named window for samples
CREATE WINDOW WR
SCHEMA(
    Val INTEGER
    ,Dice INTEGER
)
KEEP 150 ROWS
;
 
-- Randomly sample the input stream
INSERT
    WHEN Dice = 1 THEN WR
SELECT Val, Dice
    FROM StreamIn
;


Here is a result, out of 600 messages that entered the input stream, 96 ended in the window; I was using a six-sided die — not bad.

 

Bloated WordPress Database

It all started when Google crawler reported a few long-to-load pages on the site. After some poking around, I finally looked at the database. The wp_options table was three times bigger than the wp_posts table — oops. Seems that the wp_options is also used for the RSS cache — so it eventually bloats.

As a quick fix, you may simply run this query to get rid of the RSS cache entries.

DELETE
  FROM `wp_options`
  WHERE `option_name` LIKE 'rs_%'
  AND`autoload` = 'no'
  AND CHAR_LENGTH(`option_name`) > 25;

After that optimize the table using:

OPTIMIZE TABLE `wp_options`;

In my case, the wp_options table went from 950 kb down to 47 kb; almost 20 times smaller. For finer control of what to delete — and if you do not like SQL — try the WP-Options-Manager plug-in. The manager also allows cleanup of junk data, left behind by inactive plug-ins.

Somewhere around version 2.6, WordPress introduced auto-save and revisions — you may or may not like these. The following query lists all revisions in the wp_posts table.

SELECT *
  FROM `wp_posts`
  WHERE `post_type` = 'revision';

The auto-save and revisions can be disabled using this plug-in. And a final warning — before you start SQL scripts — make sure you have the latest database backup somewhere safe; bad things do happen.

Update:

The database backup dropped from 209 kb to 86 kb — by about 60%.

 

Read from DB – Coral8 [8]

As previously discussed, the Coral8 Read from Database Adapter is best suited for replay of historical records, or for testing algorithms against existing data.

For this article, I will create a program to monitor a measurement stream and raise alarms based on a statistical index. The adapter will be used to read existing data from one of my databases and test the program. I have some real-life measurements generated during assembly of car parts. Among other columns, the table of measurements includes:

  • Record Time — time-stamp;
  • Serial Number — unique part id;
  • Piston Press Force — a measurement.

Here is a SQL snippet:

CREATE TABLE dbo.PartData(
    RecordTime DATETIME
    ,SerialNumber VARCHAR(13)
    ,PistonPressForce DECIMAL(8,3)
    -- more columns here
);

This will do for the example, now the to-do list:

  1. Create an input stream.
  2. Attach the read-database adapter to the stream.
  3. Test the connection to the DB.
  4. Create a stream for statistical data (average, sigma, Cpk).
  5. Create a query to calculate statistics over a window of data rows.
  6. Test the statistics.
  7. Create an output stream for alarm messages.
  8. Create a query to raise alarm messages.
  9. Test alarms.

 

1. The Input Stream

The schema of the input stream matches database columns. Note the column-type mappings between SQL Server and Coral8. Stream fields have same names as database columns — this is necessary for the adapter type.

-- Input stream
CREATE INPUT STREAM StreamIn
SCHEMA (
    RecordTime TIMESTAMP
    ,SerialNumber STRING
    ,PistonPressForce FLOAT
);

 

2. The Adapter

As in the previous post, I have used point-and-click to insert the adapter; check out the adapter form.

-- Read from DB adapter
ATTACH INPUT ADAPTER ReadFromDB
    TYPE ReadFromDatabaseAdapterType
TO STREAM StreamIn
PROPERTIES
    DBNAME = "RCI4122_1"
    ,TABLE = "PartData"
    ,TIMESTAMPCOLUMN = "RecordTime"
    ,RATE = "1"
    ,WHERECLAUSE =
        "PistonPressForce > 0"
    ,TIMESTAMPCOLUMNINITVALUE =
        "2008-01-01"
;


The adapter parameters:

Field Comment
DB Connection Name Name of the database connection as specified in the coral8-services.xml file. For more details see the previous post.
Table or View Name of the database table.
Where Clause The expression in the WHERE clause of the query.
Loop Count How many times to loop through the data.
Rate Number of reads per second.
Timestamp column Name of the time-stamp column in the database.
Timestamp column initial value Initial value for the time-stamp. In the example above, the adapter reads all records since New Year 2008.


The adapter example is an equivalent of running this SQL query:

USE RCI4122_1;
SELECT RecordTime, SerialNumber, PistonPressForce
  FROM PartData
  WHERE RecordTime > '2008-01-01'
    AND PistonPressForce > 0
ORDER BY RecordTime;

 

3. Test the Connection to the DB

Looks OK.

 

4. Stream for Statistical Data

I would like to calculate the following statistics for the measurement:

  • average,
  • sigma (standard deviation) ,
  • Cpk — process capability index — is a distance from the average to the nearest specification limit measured in three-sigma units.

The stream definition:

-- Local stream for stats
CREATE LOCAL STREAM StreamStats
SCHEMA (
    RecordTime TIMESTAMP
    ,p_Avg FLOAT
    ,p_Std FLOAT
    ,p_CpkUp FLOAT
    ,p_CpkLo FLOAT
);

The Cpk is broken into the p_CpkUp and the p_CpkLo. The p_CpkUp is the distance from the average to the upper limit, and the p_CpkLo is the distance to the lower limit; distances are in three-sigma units.

 

5. Query to Calculate Statistics

Process limits could be read from a database or a file, but using pre-defined variables is simpler.

-- Lower and Upper limits
CREATE VARIABLE FLOAT p_LSL = 2.485;
CREATE VARIABLE FLOAT p_USL = 2.515;

The query creates a sliding window of last 35 rows and inserts calculated statistics into StreamStats. The 35 is frequently used as a minimum sample size for this type of statistics to make sense. The query updates after each new row of data (new part) and calculates statistics over the last 35 parts.

-- Calculate stats
INSERT INTO StreamStats
SELECT
  MAX(StreamIn.RecordTime)
  ,AVG(StreamIn.PistonPressForce)
  ,STDDEVIATION(StreamIn.PistonPressForce)
  ,(p_USL - AVG(StreamIn.PistonPressForce))/
     (3 * STDDEVIATION(StreamIn.PistonPressForce))
  ,(AVG(StreamIn.PistonPressForce) – p_LSL)/
     (3 * STDDEVIATION(StreamIn.PistonPressForce))
FROM StreamIn
    KEEP 35 ROWS
;

 

6. Test the Statistics

This works nice, now I will raise an alarm when the Cpk drops below 1.33 — when the distance between the average and a limit is less than four sigma.

 

7. Output Stream for Alarm Messages

The output stream has: an alarm message, a time-stamp and the value which raised the alarm.

-- Output stream for Cpk alarm
CREATE OUTPUT STREAM StreamOut
SCHEMA (
    RecordTime TIMESTAMP
    ,p_Cpk FLOAT
    ,AlmMsg STRING
);

 

8. Query to Raise Alarms

Selects all rows from StreamStats with the Cpk below 1.33 and inserts the time-stamp, the smaller Cpk value and the alarm message "Low Cpk" into StreamOut.

-- Monitor Cpk and raise alarm
INSERT INTO StreamOut
    SELECT RecordTime, MIN(p_CpkUp, p_CpkLo), 'Low Cpk'
FROM StreamStats
    WHERE p_CpkUp < 1.33
       OR p_CpkLo < 1.33
;

 

9. Test Alarms

The final test shows alarm messages in the output stream, generated when the Cpk drops below 1.33.

 

The next step would be to attach an output adapter to this stream to log alarms and send notifications. The code still needs some optimization; once that is done the input adapter can be replaced by a live-measurement feed to generate alarms in real-time.