cubicweb.dataimport

Package containing various utilities to import data into cubicweb.

Utilities

cubicweb.dataimport.count_lines(stream_or_filename)[source]
cubicweb.dataimport.ucsvreader_pb(stream_or_path, encoding='utf-8', delimiter=', ', quotechar='"', skipfirst=False, withpb=True, skip_empty=True, separator=None, quote=None)[source]

same as ucsvreader() but a progress bar is displayed as we iter on rows

cubicweb.dataimport.ucsvreader(stream, encoding='utf-8', delimiter=', ', quotechar='"', skipfirst=False, ignore_errors=False, skip_empty=True, separator=None, quote=None)[source]

A csv reader that accepts files with any encoding and outputs unicode strings

if skip_empty (the default), lines without any values specified (only separators) will be skipped. This is useful for Excel exports which may be full of such lines.

cubicweb.dataimport.callfunc_every(func, number, iterable)[source]

yield items of iterable one by one and call function func every number iterations. Always call function func at the end.

cubicweb.dataimport.lazytable(*args, **kwargs)
cubicweb.dataimport.lazydbtable(*args, **kwargs)
cubicweb.dataimport.mk_entity(*args, **kwargs)

Sanitizing/coercing functions

cubicweb.dataimport.optional(*args, **kwargs)
cubicweb.dataimport.required(*args, **kwargs)
cubicweb.dataimport.todatetime(*args, **kwargs)
cubicweb.dataimport.call_transform_method(*args, **kwargs)
cubicweb.dataimport.call_check_method(*args, **kwargs)

Integrity functions

cubicweb.dataimport.check_doubles(*args, **kwargs)
cubicweb.dataimport.check_doubles_not_none(*args, **kwargs)

Object Stores

class cubicweb.dataimport.ObjectStore[source]

Store objects in memory for faster validation (development mode)

But it will not enforce the constraints of the schema and hence will miss some problems

>>> store = ObjectStore()
>>> user = store.prepare_insert_entity('CWUser', login=u'johndoe')
>>> group = store.prepare_insert_entity('CWUser', name=u'unknown')
>>> store.prepare_insert_relation(user, 'in_group', group)
commit()[source]

Nothing to commit for this store.

finish()[source]

Nothing to do once import is terminated for this store.

flush()[source]

Nothing to flush for this store.

prepare_insert_entity(etype, **data)[source]

Given an entity type, attributes and inlined relations, return an eid for the entity that would be inserted with a real store.

prepare_insert_relation(eid_from, rtype, eid_to, **kwargs)[source]

Store into the relations attribute that a relation rtype exists between entities with eids eid_from and eid_to.

prepare_update_entity(etype, eid, **kwargs)[source]

Given an entity type and eid, updates the corresponding fake entity with specified attributes and inlined relations.

class cubicweb.dataimport.RQLObjectStore(cnx, commit=None)[source]

Bases: object

Store that works by making RQL queries, hence with all the cubicweb’s machinery activated.

commit()[source]

Commit the database transaction.

finish()[source]

Nothing to do once import is terminated for this store.

flush()[source]

Nothing to flush for this store.

prepare_insert_entity(*args, **kwargs)[source]

Given an entity type, attributes and inlined relations, returns the inserted entity’s eid.

prepare_insert_relation(eid_from, rtype, eid_to, **kwargs)[source]

Insert into the database a relation rtype between entities with eids eid_from and eid_to.

prepare_update_entity(etype, eid, **kwargs)[source]

Given an entity type and eid, updates the corresponding entity with specified attributes and inlined relations.

rql(*args)[source]

Execute a RQL query. This is NOT part of the store API.

class cubicweb.dataimport.NoHookRQLObjectStore(cnx, metagen=None)[source]

Bases: cubicweb.dataimport.stores.RQLObjectStore

Store that works by accessing low-level CubicWeb’s source API, with all hooks deactivated. It may be given a metadata generator object to handle metadata which are usually handled by hooks.

Arguments: - cnx, a connection to the repository - metagen, optional MetadataGenerator instance

prepare_insert_entity(etype, **kwargs)[source]

Given an entity type, attributes and inlined relations, returns the inserted entity’s eid.

prepare_insert_relation(eid_from, rtype, eid_to, **kwargs)[source]

Insert into the database a relation rtype between entities with eids eid_from and eid_to.

class cubicweb.dataimport.SQLGenObjectStore(cnx, dump_output_dir=None, nb_threads_statement=1)[source]

Bases: cubicweb.dataimport.stores.NoHookRQLObjectStore

Controller of the data import process. This version is based on direct insertions throught SQL command (COPY FROM or execute many).

>>> store = SQLGenObjectStore(cnx)
>>> store.create_entity('Person', ...)
>>> store.flush()

Initialize a SQLGenObjectStore.

Parameters:

  • cnx: connection on the cubicweb instance
  • dump_output_dir: a directory to dump failed statements for easier recovery. Default is None (no dump).
create_indexes(etype)[source]

Recreate indexes for a given entity type

drop_indexes(etype)[source]

Drop indexes for a given entity type

flush()[source]

Flush data to the database

Import Controller

class cubicweb.dataimport.CWImportController(store, askerror=0, catcherrors=None, tell=<function wrapped>, commitevery=50)[source]

Bases: object

Controller of the data import process.

>>> ctl = CWImportController(store)
>>> ctl.generators = list_of_data_generators
>>> ctl.data = dict_of_data_tables
>>> ctl.run()
index(name, key, value, unique=False)[source]

create a new index

If unique is set to True, only first occurence will be kept not the following ones

iter_and_commit(datakey)[source]

iter rows, triggering commit every self.commitevery iterations