6.3. Hooks and Operations

6.3.1. Generalities

Paraphrasing the emacs documentation, let us say that hooks are an important mechanism for customizing an application. A hook is basically a list of functions to be called on some well-defined occasion (this is called running the hook).

6.3.1.1. Hooks

In CubicWeb, hooks are subclasses of the Hook class. They are selected over a set of pre-defined events (and possibly more conditions, hooks being selectable appobjects like views and components). They should implement a __call__() method that will be called when the hook is triggered.

There are two families of events: data events (before / after any individual update of an entity / or a relation in the repository) and server events (such as server startup or shutdown). In a typical application, most of the hooks are defined over data events.

Also, some Operation may be registered by hooks, which will be fired when the transaction is commited or rolled back.

The purpose of data event hooks is usually to complement the data model as defined in the schema, which is static by nature and only provide a restricted builtin set of dynamic constraints, with dynamic or value driven behaviours. For instance they can serve the following purposes:

  • enforcing constraints that the static schema cannot express (spanning several entities/relations, exotic value ranges and cardinalities, etc.)

  • implement computed attributes

It is functionally equivalent to a database trigger, except that database triggers definition languages are not standardized, hence not portable (for instance, PL/SQL works with Oracle and PostgreSQL but not Sqlite).

Hint

It is a good practice to write unit tests for each hook. See an example in Unit test by example

6.3.1.2. Operations

Operations are subclasses of the Operation class that may be created by hooks and scheduled to happen on precommit, postcommit or rollback event (i.e. respectivly before/after a commit or before a rollback of a transaction).

Hooks are being fired immediately on data operations, and it is sometime necessary to delay the actual work down to a time where we can expect all information to be there, or when all other hooks have run (though take care since operations may themselves trigger hooks). Also while the order of execution of hooks is data dependant (and thus hard to predict), it is possible to force an order on operations.

So, for such case where you may miss some information that may be set later in the transaction, you should instantiate an operation in the hook.

Operations may be used to:

  • implements a validation check which needs that all relations be already set on an entity

  • process various side effects associated with a transaction such as filesystem udpates, mail notifications, etc.

6.3.2. Events

Hooks are mostly defined and used to handle dataflow operations. It means as data gets in (entities added, updated, relations set or unset), specific events are issued and the Hooks matching these events are called.

You can get the event that triggered a hook by accessing its event attribute.

6.3.2.3. Non data events

Hooks called on server start/maintenance/stop event (e.g. server_startup, server_maintenance, before_server_shutdown, server_shutdown) have a repo attribute, but their `_cw` attribute is None. The server_startup is called on regular startup, while server_maintenance is called on cubicweb-ctl upgrade or shell commands. server_shutdown is called anyway but connections to the native source is impossible; before_server_shutdown handles that.

Hooks called on backup/restore event (eg server_backup, server_restore) have a repo and a timestamp attributes, but their `_cw` attribute is None.

6.3.3. API

6.3.3.1. Hooks control

It is sometimes convenient to explicitly enable or disable some hooks. For instance if you want to disable some integrity checking hook. This can be controlled more finely through the category class attribute, which is a string giving a category name. One can then uses the deny_all_hooks_but() and allow_all_hooks_but() context managers to explicitly enable or disable some categories.

The existing categories are:

  • security, security checking hooks

  • worfklow, workflow handling hooks

  • metadata, hooks setting meta-data on newly created entities

  • notification, email notification hooks

  • integrity, data integrity checking hooks

  • activeintegrity, data integrity consistency hooks, that you should never want to disable

  • syncsession, hooks synchronizing existing sessions

  • syncschema, hooks synchronizing instance schema (including the physical database)

  • email, email address handling hooks

  • bookmark, bookmark entities handling hooks

Nothing precludes one to invent new categories and use existing mechanisms to filter them in or out.

6.3.3.2. Hooks specific predicates

class cubicweb.server.hook.match_rtype(*expected, **more)[source]

accept if the relation type is found in expected ones. Optional named parameters frometypes and toetypes can be used to restrict target subject and/or object entity types of the relation.

Parameters
  • *expected

    possible relation types

  • frometypes – candidate entity types as subject of relation

  • toetypes – candidate entity types as object of relation

class cubicweb.server.hook.match_rtype_sets(*expected)[source]

accept if the relation type is in one of the sets given as initializer argument. The goal of this predicate is that it keeps reference to original sets, so modification to thoses sets are considered by the predicate. For instance

MYSET = set()

class Hook1(Hook):
    __regid__ = 'hook1'
    __select__ = Hook.__select__ & match_rtype_sets(MYSET)
    ...

class Hook2(Hook):
    __regid__ = 'hook2'
    __select__ = Hook.__select__ & match_rtype_sets(MYSET)

Client code can now change MYSET, this will changes the selection criteria of Hook1 and Hook1.

6.3.3.3. Hooks and operations classes

class cubicweb.server.hook.Hook(req, event, **kwargs)[source]

Base class for hook.

Hooks being appobjects like views, they have a __regid__ and a __select__ class attribute. Like all appobjects, hooks have the self._cw attribute which represents the current connection. In entity hooks, a self.entity attribute is also present.

The events tuple is used by the base class selector to dispatch the hook on the right events. It is possible to dispatch on multiple events at once if needed (though take care as hook attribute may vary as described above).

Note

Do not forget to extend the base class selectors as in:

class MyHook(Hook):
  __regid__ = 'whatever'
  __select__ = Hook.__select__ & is_instance('Person')

else your hooks will be called madly, whatever the event.

class cubicweb.server.hook.Operation(cnx, **kwargs)[source]

Base class for operations.

Operation may be instantiated in the hooks’ __call__ method. It always takes a connection object as first argument (accessible as .cnx from the operation instance), and optionally all keyword arguments needed by the operation. These keyword arguments will be accessible as attributes from the operation instance.

An operation is triggered on connections set events related to commit / rollback transations. Possible events are:

  • precommit:

    the transaction is being prepared for commit. You can freely do any heavy computation, raise an exception if the commit can’t go. or even add some new operations during this phase. If you do anything which has to be reverted if the commit fails afterwards (eg altering the file system for instance), you’ll have to support the ‘revertprecommit’ event to revert things by yourself

  • revertprecommit:

    if an operation failed while being pre-commited, this event is triggered for all operations which had their ‘precommit’ event already fired to let them revert things (including the operation which made the commit fail)

  • rollback:

    the transaction has been either rolled back either:

    • intentionally

    • a ‘precommit’ event failed, in which case all operations are rolled back once ‘revertprecommit’’ has been called

  • postcommit:

    the transaction is over. All the ORM entities accessed by the earlier transaction are invalid. If you need to work on the database, you need to start a new transaction, for instance using a new internal connection, which you will need to commit.

For an operation to support an event, one has to implement the <event name>_event method with no arguments.

The order of operations may be important, and is controlled according to the insert_index’s method output (whose implementation vary according to the base hook class used).

class cubicweb.server.hook.LateOperation(cnx, **kwargs)[source]

special operation which should be called after all possible (ie non late) operations

class cubicweb.server.hook.DataOperationMixIn(*args, **kwargs)[source]

Mix-in class to ease applying a single operation on a set of data, avoiding creating as many operations as there are individual modifications. The body of the operation must then iterate over the values that have been stored in a single operation instance.

You should try to use this instead of creating on operation for each value, since handling operations becomes costly on massive data import.

Usage looks like:

class MyEntityHook(Hook):
    __regid__ = 'my.entity.hook'
    __select__ = Hook.__select__ & is_instance('MyEntity')
    events = ('after_add_entity',)

    def __call__(self):
        MyOperation.get_instance(self._cw).add_data(self.entity)


class MyOperation(DataOperationMixIn, Operation):
    def precommit_event(self):
        for bucket in self.get_data():
            process(bucket)

You can modify the containercls class attribute, which defines the container class that should be instantiated to hold payloads. An instance is created on instantiation, and then the add_data() method will add the given data to the existing container. Default to a set. Give list if you want to keep arrival ordering. You can also use another kind of container by redefining _build_container() and add_data()

More optional parameters can be given to the get_instance operation, that will be given to the operation constructor (for obvious reasons those parameters should not vary accross different calls to this method for a given operation).

Note

For sanity reason get_data will reset the operation, so that once the operation has started its treatment, if some hook want to push additional data to this same operation, a new instance will be created (else that data has a great chance to be never treated). This implies:

  • you should always call get_data when starting treatment

  • you should never call get_data for another reason.

6.3.4. Example using dataflow hooks

We will use a very simple example to show hooks usage. Let us start with the following schema.

class Person(EntityType):
    age = Int(required=True)

We would like to add a range constraint over a person’s age. Let’s write an hook (supposing yams can not handle this natively, which is wrong). It shall be placed into mycube/hooks.py. If this file were to grow too much, we can easily have a mycube/hooks/… package containing hooks in various modules.

from cubicweb import ValidationError
from cubicweb.predicates import is_instance
from cubicweb.server.hook import Hook

class PersonAgeRange(Hook):
     __regid__ = 'person_age_range'
     __select__ = Hook.__select__ & is_instance('Person')
     events = ('before_add_entity', 'before_update_entity')

     def __call__(self):
         if 'age' in self.entity.cw_edited:
             if 0 <= self.entity.age <= 120:
                return
             msg = self._cw._('age must be between 0 and 120')
             raise ValidationError(self.entity.eid, {'age': msg})

In our example the base __select__ is augmented with an is_instance selector matching the desired entity type.

The events tuple is used to specify that our hook should be called before the entity is added or updated.

Then in the hook’s __call__ method, we:

  • check if the ‘age’ attribute is edited

  • if so, check the value is in the range

  • if not, raise a validation error properly

Now let’s augment our schema with a new Company entity type with some relation to Person (in ‘mycube/schema.py’).

class Company(EntityType):
     name = String(required=True)
     boss = SubjectRelation('Person', cardinality='1*')
     subsidiary_of = SubjectRelation('Company', cardinality='*?')

We would like to constrain the company’s bosses to have a minimum (legal) age. Let’s write an hook for this, which will be fired when the boss relation is established (still supposing we could not specify that kind of thing in the schema).

class CompanyBossLegalAge(Hook):
     __regid__ = 'company_boss_legal_age'
     __select__ = Hook.__select__ & match_rtype('boss')
     events = ('before_add_relation',)

     def __call__(self):
         boss = self._cw.entity_from_eid(self.eidto)
         if boss.age < 18:
             msg = self._cw._('the minimum age for a boss is 18')
             raise ValidationError(self.eidfrom, {'boss': msg})

Note

We use the match_rtype selector to select the proper relation type.

The essential difference with respect to an entity hook is that there is no self.entity, but self.eidfrom and self.eidto hook attributes which represent the subject and object eid of the relation.

Suppose we want to check that there is no cycle by the subsidiary_of relation. This is best achieved in an operation since all relations are likely to be set at commit time.

from cubicweb.server.hook import Hook, DataOperationMixIn, Operation, match_rtype

def check_cycle(session, eid, rtype, role='subject'):
    parents = set([eid])
    parent = session.entity_from_eid(eid)
    while parent.related(rtype, role):
        parent = parent.related(rtype, role)[0]
        if parent.eid in parents:
            msg = session._('detected %s cycle' % rtype)
            raise ValidationError(eid, {rtype: msg})
        parents.add(parent.eid)


class CheckSubsidiaryCycleOp(Operation):

    def precommit_event(self):
        check_cycle(self.session, self.eidto, 'subsidiary_of')


class CheckSubsidiaryCycleHook(Hook):
    __regid__ = 'check_no_subsidiary_cycle'
    __select__ = Hook.__select__ & match_rtype('subsidiary_of')
    events = ('after_add_relation',)

    def __call__(self):
        CheckSubsidiaryCycleOp(self._cw, eidto=self.eidto)

Like in hooks, ValidationError can be raised in operations. Other exceptions are usually programming errors.

In the above example, our hook will instantiate an operation each time the hook is called, i.e. each time the subsidiary_of relation is set. There is an alternative method to schedule an operation from a hook, using the get_instance() class method.

class CheckSubsidiaryCycleHook(Hook):
    __regid__ = 'check_no_subsidiary_cycle'
    events = ('after_add_relation',)
    __select__ = Hook.__select__ & match_rtype('subsidiary_of')

    def __call__(self):
        CheckSubsidiaryCycleOp.get_instance(self._cw).add_data(self.eidto)

class CheckSubsidiaryCycleOp(DataOperationMixIn, Operation):

    def precommit_event(self):
        for eid in self.get_data():
            check_cycle(self.session, eid, self.rtype)

Here, we call add_data() so that we will simply accumulate eids of entities to check at the end in a single CheckSubsidiaryCycleOp operation. Values are stored in a set associated to the ‘check_no_subsidiary_cycle’ transaction data key. The set initialization and operation creation are handled nicely by add_data().

A more realistic example can be found in the advanced tutorial chapter Step 2: security propagation in hooks.

6.3.5. Inter-instance communication

If your application consists of several instances, you may need some means to communicate between them. Cubicweb provides a publish/subscribe mechanism using ØMQ. In order to use it, use add_subscription() on the repo.app_instances_bus object. The callback will get the message (as a list). A message can be sent by calling publish() on repo.app_instances_bus. The first element of the message is the topic which is used for filtering and dispatching messages.

class FooHook(hook.Hook):
    events = ('server_startup',)
    __regid__ = 'foo_startup'

    def __call__(self):
        def callback(msg):
            self.info('received message: %s', ' '.join(msg))
        self.repo.app_instances_bus.add_subscription('hello', callback)
def do_foo(self):
    actually_do_foo()
    self._cw.repo.app_instances_bus.publish(['hello', 'world'])

The zmq-address-pub configuration variable contains the address used by the instance for sending messages, e.g. tcp://*:1234. The zmq-address-sub variable contains a comma-separated list of addresses to listen on, e.g. tcp://localhost:1234, tcp://192.168.1.1:2345.

6.3.6. Hooks writing tips

6.3.6.1. Reminder

You should never use the entity.foo = 42 notation to update an entity. It will not do what you expect (updating the database). Instead, use the cw_set() method or direct access to entity’s cw_edited attribute if you’re writing a hook for ‘before_add_entity’ or ‘before_update_entity’ event.

6.3.6.2. How to choose between a before and an after event ?

before_* hooks give you access to the old attribute (or relation) values. You can also intercept and update edited values in the case of entity modification before they reach the database.

Else the question is: should I need to do things before or after the actual modification ? If the answer is “it doesn’t matter”, use an ‘after’ event.

6.3.6.3. Validation Errors

When a hook which is responsible to maintain the consistency of the data model detects an error, it must use a specific exception named ValidationError. Raising anything but a (subclass of) ValidationError is a programming error. Raising it entails aborting the current transaction.

This exception is used to convey enough information up to the user interface. Hence its constructor is different from the default Exception constructor. It accepts, positionally:

  • an entity eid (not the entity itself),

  • a dict whose keys represent attribute (or relation) names and values an end-user facing message (hence properly translated) relating the problem.

raise ValidationError(earth.eid, {'sea_level': self._cw._('too high'),
                                  'temperature': self._cw._('too hot')})

6.3.6.4. Checking for object created/deleted in the current transaction

In hooks, you can use the added_in_transaction() or deleted_in_transaction() of the session object to check if an eid has been created or deleted during the hook’s transaction.

This is useful to enable or disable some stuff if some entity is being added or deleted.

if self._cw.deleted_in_transaction(self.eidto):
   return

6.3.6.5. Peculiarities of inlined relations

Relations which are defined in the schema as inlined (see Relation type for details) are inserted in the database at the same time as entity attributes.

This may have some side effect, for instance when creating an entity and setting an inlined relation in the same rql query, then at before_add_relation time, the relation will already exist in the database (it is otherwise not the case).