History | Ebean

Videos

Introduction

Walk through the @History / SQL2011 support with Oracle and Postgres

Comparison to Hibernate Envers

Comparison with the approach taken by Hibernate Envers

Overview

SQL2011 introduced temporal extensions to SQL including AS OF SYSTEM TIME and VERSIONS BETWEEN SYSTEM TIME.

Ebean has support for this extension to SQL. The support for this falls into 2 general cases where some databases have SQL2011 support built in (Oracle, DB2, MS SQL Server 2016) or where it is not built in (Postgres, MySql) and Ebean generates history tables/triggers/views to support the feature.

Current support in Ebean

Oracle via Total recall
Postgres via Triggers/history table
MySql via Triggers/history table
H2 via Triggers/history table

History tables

For Postgres, MySql and H2 (and in general databases that don't have SQL2011 support built in) Ebean generates DDL for History tables and associated triggers and views. This is a common "Temporal database design" approach that has been used for some time and not specific to Ebean. For example, this approach is already implemented as a Postgres extension - arkhipov/temporal_tables.

In general the history table approach results in:

A "history" table created that has the same (or very similar) structure as the "base table"
The "history" table does not have constraints by default (no primary key, foreign key or indexes)
2 extra columns added to the base table and history table for "effective start" and "effective end" timestamp range. Postgres has a timestamp range type so that is preferred to 2 separate columns for the Postgres implementation.
A view is used to union all combine the "base table" and the "history table" to simplify history queries
Triggers are used on update and delete to copy the existing row data from the "base table" to the associated "history table"
Schema changes to the "base table" (add column etc) result in mirrored changes to the "history table" and potentially updates to the associated triggers

Compared to Change Log

Transactional

With @History changes are transactional and have the associated guarantees - changes can't be missed or lost.

Not bypassed

With @History the database handles all changes. If other frameworks (not Ebean) or stored procedures or raw JDBC is used to update the database all these changes are captured by the database triggers. Although Ebean makes it easier to query and generate the appropriate DDL the capturing of changes is a database function and not strictly tied to Ebean or application code.

Easy to query

With @History it is easy to query and view the data "As of" a given timestamp and view the versions and historic changes for a given bean/row. This is also possible with Change log but requires more work (loading the change log into say ElasticSearch and creating appropriate queries to return as of and version type data).

DB Overhead

A downside to @History relative to using Change log is that it does introduce extra costs to the database in terms of storage (history tables taking storage space) and in terms of impact to response time as updates and deletes now also copy data into the history tables.

The approach with Oracle Total Recall reduces the impact to response time by processing the copy of data from redo into the history tables in the background and in batch and that probably means it can generally be used more widely (on more base tables) than the traditional trigger based approach.

Schema changes

A downside to @History relative to using Change log is that schema changes (like adding a column) can mean more work / complexity as changes need to also be reflected in associated history tables and triggers. The DB migration support in Ebean mitigates this by generating all the appropriate changes to history tables and triggers etc.

@History

Add the @History annotation to the entity beans that you want history support on. In the case of Postgres and MySql this means that Ebean will generate DDL (triggers, views etc) to support history on the underlying tables. For Oracle adding @History implies that the table has had flashback archive already assigned to the table.

@HistoryExclude

Add the @HistoryExclude annotation on entity bean properties that should be excluded from history. This is expected to be used on large text or blob columns where there can be a relatively high storage cost associated with keeping the history values and excluding these columns is desired.

ManyToMany intersection

In the case of intersection (bridge) tables for ManyToMany relationships by default the intersection tables also have history. That is, if @History is put on an entity bean then by default the intersection tables for @ManyToMany properties have history support (associated history tables, views and triggers).

To exclude history from intersection tables @HistoryExclude should be put on the @ManyToMany property.

As of query

A Timestamp can be used with Query.asOf(Timestamp) and Ebean will generate the query such that the returning object graph represents the state as of the given timestamp. More accurately for tables with history the query returns rows representing the table as of the given timestamp and for other tables included in the query the rows returned represent 'current values'.

// asOf some time in the past like 1 hour ago, 1 week ago, 1 month ago etc
Timestamp asOf = ...;

Customer customer =
    Customer.find.query()
        .asOf(asOf)
        .fetch("billingAddress")
        .where().eq("name", "jim")
        .findOne();

Scenario: Customer and Address both have history

In the above query if both Customer Address have history then both the customer data and address data will be returned 'As of' the given timestamp.

Scenario: Customer has history but Address does not

In the above query if Customer has history but Address does not then the customer data will be returned 'As of' the given timestamp but the address data will be returned using 'current data'.

Scenario: Address lazy loaded

The AS OF timestamp is propagated to all secondary queries including lazy loading queries. That is, the AS OF timestamp applies to the entire object graph loaded by and not just the "origin query". If Address is lazy loaded and Address has history the address data will also be 'as of' the original timestamp.

Versions between

Query.findVersionsBetween() is used to return a list of versions for a given object over time. The version beans returned contain a "diff" to the prior version as well as the effective start and effective end timestamps.

Timestamp start = ...;
Timestamp end = ...;

List<Version<Customer>> customerVersions =
    Customer.find.query()
      .where()
      .idEq(42)
      .findVersionsBetween(start, end);

for (Version<Customer> customerVersion : customerVersions) {
  Customer bean = customerVersion.getBean();
  Map<String, ValuePair> diff = customerVersion.getDiff();
  Timestamp effectiveStart = customerVersion.getStart();
  Timestamp effectiveEnd = customerVersion.getEnd();
}

Who and When

It is generally expected that the entity beans should contain @WhoCreated, @WhoModified, @WhenCreated and @WhenModified properties.

/**
 * Common properties used by many entity beans.
 */
@MappedSuperclass
public class BaseModel {

  @Id
  Long id;

  @Version
  Long version;

  @WhenCreated
  Timestamp whenCreated;

  @WhenModified
  Timestamp whenModified;

  @WhoCreated
  String whoCreated;

  @WhoModified
  String whoModified;
  ...

@History
@Entity
@Table(name="customer")
public class Customer extends BaseModel {

  @Size(max = 100)
  String name;
  ...

Postgres

Postgres has an existing arkhipov/temporal_tables extension and Postgres/Ebean users could use this extension. By default Ebean does not use this extension but instead generates similar triggers. The difference between using arkhipov/temporal_tables or not comes down to DDL generation. Ultimately it would be good for Ebean to support DDL generation for arkhipov/temporal_tables but that isn't built into Ebean yet.

The other notable aspect of history support in Postgres is that we can make use of the timestamp range type with expected benefits in terms on indexing performance relative to using 2 separate timestamp columns.

MySql

There is a concern with MySql in the case where DDL changes need to be applied to a production database. In the case where Triggers need to change due to a schema change (like add column) MySql has no 'create or replace trigger' and instead a table lock needs to be held to drop and create the trigger. Holding a table lock in a production database might be a problem and restrict schema changes.

Oracle

Rob's View: Oracle Total recall has some nice advantages over the explicit trigger/history table approach. The main advantages being:

Performance : low foreground cost - History table population occurs in the background and results in a much lower overhead/impact on foreground response time
Performance : batch processing - History table population can be processed in batch reducing the overhead/impact for many small updates
Admin : reduced DDL - Table alterations (add columns etc) are handled automatically reducing the admin costs of maintaining triggers/views/history tables.