De los Tilos sabiduría

bases += patrones += relaciones += datos

Build a Head Version model with Oracle SQL Developer Modeler 4.0

without comments

In the previous post we started with the generic steps in OSDM to create a historic version of your relational model. This post will explain the next four steps we need to create a Head Version Model.

  1. then create a new relational model with one table that contains the ‘default columns’
  2. then forward engineer the surrogated logical model to the new relational model and use the table with the ‘default columns’ as template
  3. split the tables using the split table wizard to get the correct tables
  4. extend the primary key of the ‘version/history’ tables

The new relational model will be named ‘example_head_version’. I will add a ‘defaultcolumns’ template table with the columns:

  • dwh_valid_from, the start date of the version validity
  • dwh_valid_to, the end date of the validity
  • dwh_status, the status of the record. I think it is a kind of a ‘valid’ or ‘deleted’ / ‘voided’ indicator to get a continuous time line
  • dwh_source, the original source of the record

The first three columns should land in the ‘Version’ tables and the last one lands in the ‘Head’ tables.
1. Create a new relational model with one table that contains the ‘default columns’



Now that we have a new relational model with the ‘defaultcolumns’ template table we can forward engineer the logical model.
2. then forward engineer the surrogated logical model to the new relational model and use the table with the ‘default columns’ as template



The forward generated relational model now looks like this:



Now we can start with splitting the tables via the spit table wizard. We split of the ‘Version’ tables and the remaining part is then the ‘Head’ table. The ‘Head’ tables contain the business key and the attributes that are not going to change over time. The ‘Version’ table contains the dynamic or changing attributes over time.
3. split the tables using the split table wizard to get the correct tables

Split of the product version table.




The product table has no foreign keys so we go on to the next step in the wizard to move the ‘dynamic’ Price attribute and the ‘defaultcolumns’ needed for the ‘Version’ table.



Split the Interest version table.



The Interest table’s foreign keys are part of the business keys and have to stay in the ‘Head’ table. They are not dynamic and do not change over time! Let’s go on with the attributes.


Now the Interest version table is created we go on with the Employee table.
Split the Employee version table.



In the Employee table the foreign keys are not part of the business key and are qualified as dynamic. We have to add them to the ‘Version’ table to track the changes. Let’s go on and add the dynamic attributes.



As you can see I left the Gender attribute in the ‘Head’ table, because I assume it does not change. The table split of the Department table is similar to the split of the Product table.
Finally we are at the last step.
4. extend the primary key of the ‘version/history’ tables
I’ve also added some classifications with some coloring. Then the resulting model is:



Looks OK to me :-) The only thing left is to show the mappings from the ‘source’ model to the ‘history’ model.

The mappings on table level for Employee:



The mappings on attribute level for the business key of Employee:




And finally the mappings for a dynamic attribute of Employee:



The resulting model is stored in the branch head_version in this Github repo.
Nice, we have now mappings from source to target relational model via the logical model. Just some steps away of generating ETL based on these mappings. Next post the ‘regular’ Anchor Vault. Even more splitting of tables ;-)

Written by delostilos

April 9th, 2014 at 12:10 am

Build history models with Oracle SQL Developer Modeler 4.0, intro

without comments

Oracle SQL Developer Modeler (OSDM) is a nice free data modeling tool with a lot of nice features. I’m going to use a set of these features to create historic versions of a sample model. Hans Hultgren and Martijn Evers have made a classification of Ensemble Modeling Forms or Styles of Data Vault modeling historical data warehouse modeling styles. The classifications I will use:

  • – Classical (a.k.a. Dimensional) Data Vault of Dan Linstedt
  • – Anchored Data Vault (a.k.a.) Anchor Vault
    • – the strong version , with no end dated links
      • used in the open source dwh automation tool Quipu and explained by my college Lulzim
    • – the weak version, with end dated links
    • elementary Anchor Vault, all attributes are split in separate tables except the business key
    • Anchor Modeling, all attributes are split in separate tables including the business key
      • Focal point modeling, externalizes the business key but groups attributes

    I’m going to build two versions of a weak Anchor Vault version:

    • a Head Version model, this is the maximum grouped version of an Anchor Vault
    • a ‘regular’ Anchor Vault, this is a less grouped version where we split the foreign keys and the attributes

    The last one will be an Anchor Model, the most split version of them all.

    The general approach used for all the three models in OSDM is:

    1. reverse engineer a (source) model in OSDM into a Relational model
    2. then forward engineer the relational model a to logical model
    3. then in the logical model ‘surrogate’ the model

    This is the starting point for all of the models and is equal for all of the models. The next steps are used for all of the three variants, but are slightly different for each model type:

    1. then create a new relational model with one table that contains the ‘default columns’
    2. then forward engineer the surrogated logical model to the new relational model and use the table with the ‘default columns’ as template
    3. split the tables using the split table wizard to get the correct tables
    4. extend the primary key of the ‘version/history’ tables

    The nice thing of this approach in OSDM is that at the end we have a mapping between the ‘source model’ and the ‘historical model’ via the logical model.
    Let’s get started with the first three generic steps. First we have a DDL of the source model:

    CREATE TABLE Department
    ( DepName VARCHAR (255) NOT NULL
    , Budget  DECIMAL (12,2)
    )
    ;
    
    ALTER TABLE Department ADD CONSTRAINT Department_PK PRIMARY KEY ( DepName )
    ;
    
    CREATE TABLE Employee
    ( EmpName      VARCHAR (255) NOT NULL
    , Gender       CHAR (1)
    , DepName      VARCHAR (255) NOT NULL
    , Job          VARCHAR (255)
    , HoursPerWeek SMALLINT
    , Manager      VARCHAR (255)
    )
    ;
    
    ALTER TABLE Employee
    ADD CONSTRAINT Employee_PK
    PRIMARY KEY ( EmpName )
    ;
    
    CREATE TABLE Interest
    ( EmpName  VARCHAR (255) NOT NULL
    , ProdName VARCHAR (255) NOT NULL
    , Degree   SMALLINT
    )
    ;
    
    ALTER TABLE Interest
    ADD CONSTRAINT Interests_PK
    PRIMARY KEY ( EmpName, ProdName )
    ;
    
    CREATE TABLE Product
    ( ProdName VARCHAR (255) NOT NULL
    , Price    DECIMAL (12,2)
    )
    ;
    
    ALTER TABLE Product
    ADD CONSTRAINT Product_PK
    PRIMARY KEY ( ProdName )
    ;
    
    ALTER TABLE Employee
    ADD CONSTRAINT Employee_Department_FK 
    FOREIGN KEY (DepName ) REFERENCES Department ( DepName ) ;
    
    ALTER TABLE Interest
    ADD CONSTRAINT Interest_Employee_FK FOREIGN KEY ( EmpName
    ) REFERENCES Employee ( EmpName )
    ;
    
    ALTER TABLE Interest
    ADD CONSTRAINT Interest_Product_FK
    FOREIGN KEY ( ProdName ) REFERENCES Product ( ProdName )
    ;
    
    ALTER TABLE Employee
    ADD CONSTRAINT Manager_FK
    FOREIGN KEY ( Manager ) REFERENCES Employee ( EmpName )
    ;
    

    1. Import the DDL file into a Relational model. I used the DB2/UDB 7.1 setting to import this DDL correctly. The result is a nice model:


    Ok, now we have the model in OSDM.
    2. We can forward engineer it to a logical model.


    3. The last generic step is to surrogate the model. I made a nice JavaScript you can use to automate this part. You can add this as a custom transformation script:



    In the screenshot above the ‘Mozilla Rhino’ engine is not there but the ‘Rhino’ engine is there on my Ubuntu machine. Somehow the same javascript engine it is reported different. The logical model know looks like this:



    The model in this state is preserved in this GitHub repository as master.
    All the next versions will be saved as a branch of the master model. That’s it for now. The next posts will be:

    Written by delostilos

    April 8th, 2014 at 9:27 am

    Filling out timelines in SQL

    without comments

    In this post I’ll show a method to fill out timelines in SQL. We’ll define a timeline as a set of periods without gaps from ‘0001-01-01′ to ‘9999-12′. I was reading Filling out and slicing timelines in one or more tables. It also describes a pattern to fill the gaps in a timeline. I will describe a different pattern to fill the timeline.
    Let’s pick one of the example tables used, the customer_shipping_address:

    Create a table and insert records into it. I’m using PostgreSQL as database platform here and used pgModeler to create the table.

    CREATE TABLE public.customer_shipping_address
    ( customer_id smallint
    , valid_from date
    , valid_to date
    , address varchar(255)
    , CONSTRAINT pk_customer_s PRIMARY KEY (customer_id,valid_from)
    );
    
    INSERT INTO public.customer_shipping_address
    ( customer_id , valid_from                        , valid_to                          , address ) VALUES
    ( 1           , TO_DATE('2004-01-01','YYYY-MM-DD'), TO_DATE('2005-01-01','YYYY-MM-DD'), 'West Avenue 1'),
    ( 1           , TO_DATE('2005-01-01','YYYY-MM-DD'), TO_DATE('9999-12-31','YYYY-MM-DD'), 'The Docks 4'),
    ( 2           , TO_DATE('2004-05-01','YYYY-MM-DD'), TO_DATE('2005-09-01','YYYY-MM-DD'), 'East Avenue 2'),
    ( 2           , TO_DATE('2006-01-11','YYYY-MM-DD'), TO_DATE('2006-05-01','YYYY-MM-DD'), 'Arlington Road 7');
    

    It has several gaps in the timeline. The timeline of customer with id 2 does not start at ‘0001-01-01′ and it does not end at ‘9999-12-31′. The two periods defined for customer id 2 also have a gap between them.
    The pattern I’m going to show uses three queries in a union all construction to fill the timeline. The fist one is easy, just get the all the records.

    -- get the original timelines
    select customer_id
         , address
         , valid_from
         , valid_to
         , false as filled
      from customer_shipping_address
    

    The second part is a query that will fill the timeline from ‘0001-01-01′ to the first valid_from date we can find for a customer id.

    -- fill the first part of the timeline
    -- 0001-01-01 -> first valid_from
    select customer_id
         , null as address
         , cast('0001-01-01' as date)
         , min(valid_from)
         , true as filled
      from customer_shipping_address
     group
        by customer_id
        -- if the first valid_from is not equal to 0001-01-01 we have to fill the first part of the timeline
    having min(valid_from) <> cast('0001-01-01' as date)
    

    The last query detect if there is a gap after a period and fill it.

    -- fill the gaps that occur after a period, including
    -- last valid-to -> 9999-12-31
    select customer_id
         , null as address
         , valid_to as valid_from
         , expected_valid_to as valid_to
         , true as filled
      from (select customer_id
                 , valid_from
                 , valid_to
                 , lead(valid_from,1,cast('9999-12-31' as date))
                   over(partition by customer_id
                            order by valid_from) expected_valid_to
              from customer_shipping_address
           ) check_gaps
        -- if the expected valid_to is not equal to the real valid_to we have a gap to fill
     where expected_valid_to <> valid_to
    

    Now we’ll combine the three queries into one with a union all construction and order the result by customer_id and valid_from to get a nice ordered result set.

    -- get the original timelines
    select customer_id
         , address
         , valid_from
         , valid_to
         , false as filled
      from customer_shipping_address
     union all
    -- fill the first part of the timeline
    -- 0001-01-01 -> first valid_from
    select customer_id
         , null as address
         , cast('0001-01-01' as date)
         , min(valid_from)
         , true as filled
      from customer_shipping_address
     group
        by customer_id
        -- if the first valid_from is not equal to 0001-01-01 we have to fill the first part of the timeline
    having min(valid_from) <> cast('0001-01-01' as date)
     union all
    -- fill the gaps that occur after a period, including
    -- last valid-to -> 9999-12-31
    select customer_id
         , null as address
         , valid_to as valid_from
         , expected_valid_to as valid_to
         , true as filled
      from (select customer_id
                 , valid_from
                 , valid_to
                 , lead(valid_from,1,cast('9999-12-31' as date))
                   over(partition by customer_id
                            order by valid_from) expected_valid_to
              from customer_shipping_address
           ) check_gaps
        -- if the expected valid_to is not equal to the real valid_to we have a gap to fill
     where expected_valid_to <> valid_to
    -- order the periods in the timelines
     order by 1,3
    

    Now let’s check the result and see what happend.

    customer_id address valid_from valid_to filled
    1 January 1, 0001 January 1, 2004 true
    1 West Avenue 1 January 1, 2004 January 1, 2005 false
    1 The Docks 4 January 1, 2005 December 31, 9999 false
    2 January 1, 0001 May 1, 2004 true
    2 East Avenue 2 May 1, 2004 September 1, 2005 false
    2 September 1, 2005 January 11, 2006 true
    2 Arlington Road 7 January 11, 2006 May 1, 2006 false
    2 May 1, 2006 December 31, 9999 true

    Cool all the gaps are filled and we have a complete timeline that makes joining over timelines easier :)
    I’ve created the query as a view in the database, os now I can reuse the query. The HTML export was made via DBeaver a nice universal query tool.

    Update: I’ve had a discussion on Linkedin in the Temporal Data group. We came up with a variant of my query that does not use a union all but a 3 record table to generate the same result, but only with one scan of the table:

    select customer_id
         , case when rec_type = 1 
                then address 
                else null 
           end as address
         , case rec_type when 1 then valid_from
                         when 2 then valid_to   
                         else cast('0001-01-01' as date) 
           end as valid_from
         , case rec_type when 1 then valid_to
                         when 2 then next_valid_from 
                         else min_valid_from 
           end as valid_to
         , case when rec_type = 1 
                then false 
                else true 
           end as filled
      from (select customer_id
                 , address
                 , valid_from
                 , valid_to
                 , lead(valid_from,1,cast('9999-12-31' as date))
                   over(partition by customer_id
                            order by valid_from) next_valid_from             
                 , min(valid_from)
                   over(partition by customer_id) min_valid_from
              from customer_shipping_address 
           ) check_gaps
    -- rec_type 1 is the original table
    -- rec_type 2 fills the gaps after the periods
    -- rec_type 3 fills the first part of the timeline
     cross 
      join generate_series(1,3) as three_record_table(rec_type)    
    -- if the expected valid_to is not equal to the real valid_to we have a gap to fill
     where (rec_type = 1)
        or (rec_type = 2 and valid_to < next_valid_from)
        or (rec_type = 3 and valid_from = min_valid_from and min_valid_from <> cast('0001-01-01' as date))
       -- order the periods in the timelines
     order by 1,3 
    

    Written by delostilos

    July 2nd, 2013 at 12:05 am

    Posted in SQL

    Tagged with

    Grails with Envers

    without comments

    I needed some form of  revision system for a Grails project I’m working on. So I searched the Internet hoping to find a nice solution for it and found an entry on stackoverflow: How to manage object revisions in Grails?. The options mentioned:

    1. the Grails Envers plugin
    2. the Gvers plugin
    3. build yourself
    4. use a version control system

    Grails is build on top of Hibernate so the Envers option should be the logical choice. Sadly the Grails Envers plugin does not work :(

    The author of the stackoverflow post chooses the Gvers plugin. I’ve tried it and it works, but it does not use a central revision number like subversion does.

    Searching on Grails and Envers gave me an extra link: Grails 1.1.1 + Envers 1.2.1.GA. They have tried, but it only works for JPA annotated classes.

    Let’s try it myself and see what happens. So I installed the latest Grails version 1.3.7. It uses hibernate 3.3 under the covers. So I went to the Envers site and downloaded the 1.2.2.GA release. Put the envers-1.2.2.ga-hibernate-3.3.jar in the Grails library folder. Envers uses hibernate event listeners. Let’s find out how to add them. Luckily there is an easy way to do this as of version 1.3.4 (Allow plugins to easily insert hibernate event listeners). This is what I’ve put in grails-app > conf > spring > resources.groovy:

    beans = {
        auditListener(org.hibernate.envers.event.AuditEventListener)
        hibernateEventListeners(org.codehaus.groovy.grails.orm.hibernate.HibernateEventListeners) {
            listenerMap = ['post-insert':auditListener,
                           'post-update':auditListener,
                           'post-delete':auditListener,
                           'pre-collection-update':auditListener,
                           'pre-collection-remove':auditListener,
                           'post-collection-recreate':auditListener]
        }
    }
    

    Know let’s test it and create a domain class.

    package nl.delostilos
    
    import org.hibernate.envers.Audited
    
    @Audited
    class Entity {
    
     String name
    
    }
    

    Create a scaffolded controller:

    package nl.delostilos
    
    class EntityController {
    
     def scaffold = true
    }
    

    Ok, ready to go. Started the grails application and check what’s in the database. Cool, grails made the table ‘entity’ and envers added the tables ‘entity_aud’ and ‘revinfo’. Now let’s insert some data via the scaffolded interface and see what happens.The dat is inserted in the grails table but not in the envers tables :(
    What’s happening? I found the answer in this post on the envers forum Spring and Envers. It mentions that you have to do everything transactional, so that envers can record it.

    Ok, let’s make the scaffolded controller actions transactional. First install the templates:

    grails install-templates
    

    In the folder src > templates > scaffolding You’ll find the Controller.groovy template. We’ll make the save, update and the delete transactional.
    My code snippet of the save action:

       def save = {
            def ${propertyName} = new ${className}(params)
            ${className}.withTransaction{
                if (${propertyName}.save(flush: true)) {
                    flash.message = "\${message(code: 'default.created.message', args: [message(code: '${domainClass.propertyName}.label', default: '${className}'), ${propertyName}.id])}"
                    redirect(action: "show", id: ${propertyName}.id)
                }
                else {
                    render(view: "create", model: [${propertyName}: ${propertyName}])
                }
            }
        }
    

    The update action code snippet:

        def update = {
            def ${propertyName} = ${className}.get(params.id)
            if (${propertyName}) {
                if (params.version) {
                    def version = params.version.toLong()
                    if (${propertyName}.version > version) {
                        <% def lowerCaseName = grails.util.GrailsNameUtils.getPropertyName(className) %>
                        ${propertyName}.errors.rejectValue("version", "default.optimistic.locking.failure", [message(code: '${domainClass.propertyName}.label', default: '${className}')] as Object[], "Another user has updated this ${className} while you were editing")
                        render(view: "edit", model: [${propertyName}: ${propertyName}])
                        return
                    }
                }
                ${propertyName}.properties = params
                ${className}.withTransaction{
                    if (!${propertyName}.hasErrors() && ${propertyName}.save(flush: true)) {
                        flash.message = "\${message(code: 'default.updated.message', args: [message(code: '${domainClass.propertyName}.label', default: '${className}'), ${propertyName}.id])}"
                        redirect(action: "show", id: ${propertyName}.id)
                    }
                    else {
                        render(view: "edit", model: [${propertyName}: ${propertyName}])
                    }
                }
            }
            else {
                flash.message = "\${message(code: 'default.not.found.message', args: [message(code: '${domainClass.propertyName}.label', default: '${className}'), params.id])}"
                redirect(action: "list")
            }
        }
    

    And finally the delete action code snippet:

        def delete = {
            def ${propertyName} = ${className}.get(params.id)
            if (${propertyName}) {
                try {
                    ${className}.withTransaction{
                        ${propertyName}.delete(flush: true)
                    }
                    flash.message = "\${message(code: 'default.deleted.message', args: [message(code: '${domainClass.propertyName}.label', default: '${className}'), params.id])}"
                    redirect(action: "list")
    
                }
                catch (org.springframework.dao.DataIntegrityViolationException e) {
                    flash.message = "\${message(code: 'default.not.deleted.message', args: [message(code: '${domainClass.propertyName}.label', default: '${className}'), params.id])}"
                    redirect(action: "show", id: params.id)
                }
            }
            else {
                flash.message = "\${message(code: 'default.not.found.message', args: [message(code: '${domainClass.propertyName}.label', default: '${className}'), params.id])}"
                redirect(action: "list")
            }
        }
    

    OK, know let’s try again. Added some data via the scaffolded controller and … Yes, it works :) the data is audited in the ‘entity_aud’ table and the ‘revinfo’ table has a revision added. Cool. Now I’ll have to find out how to change some envers properties/behavior and make the default GORM actions transactional.

    Update: Lucas Ward wrote a new Grails-Envers-plugin:)
    Now you won’t have to add the Spring stuff in the resource.groovy file anymore. It’s added by the plugin, and it add’s some handy methods on the GORM classes. It also adds the envers library. You still have to make every action to the database transactional to have it recorded in the envers tables.

    Written by delostilos

    March 15th, 2011 at 9:04 am

    Posted in Grails

    Tagged with ,

    Hello

    without comments

    Hi,

    My first blog on my own site :)

    Written by delostilos

    August 27th, 2010 at 8:36 pm

    Posted in Other