Darktable2Piwigo
Loading...
Searching...
No Matches
Darktable to Piwigo Web Version

Introduction

This Python code connects two well known Open Source products for Photographers, namely the Photo editing software darktable and the photo gallery software for the web Piwigo.

There exists already a working python script for this purpose which is running on a desktop computer using a UI. There are however, some drawbacks to the existing app:

  • Some of the necessary steps (e.g. linking the tags between both applications) are currently implemented in Microsoft Excel using PowerQuery.
  • The process at the moment includes a number of distinct steps for the user, i.e., editing & tagging the pictures, generating the JPG-images, identifying new tags, uploading new tags and then uploading new images to Piwigo.
  • Although this works reliably, the steps are cumbersome and moreover error-prone when it comes to assigning images to multiple albums in Piwigo.
  • Some functionality which has recently been introduced in Piwigo (updating tags) does not exist.

Objective

For this re-aerchitectured web-based version of the app the objective is to reduce the user effort to the first step above, i.e., editing and tagging the images. Everything else should be fully automated (as much as possible with reasonable effort), so that all changes in darktable are automatically synchronized to Piwigo

Architectural Principles

  • The master for the synchronization to Piwigo is always darktable (the darktable databases). This applies to tags, images to be uploaded / deleted as well as any album assignments
  • The app will be implemented in object-oriented python using SOLID principles
  • The album structure of Piwigo is mastered by a (parametrized) branch of the darktable tag tree
  • All data will be persisted in a mysql database
  • All database interaction will be accomplished using the SQLalchemy ORM and models libraries
  • All functionality will be implemented as a service using FastApi
  • The user frontend will be implemented by either:
    • Budibase or
    • Flask / Django

Necessary Design decisions

  • Replicate the entire darktable db or only parts: Only parts (see Diagram)
  • Decided against importing locations table. This table requires locations (geographical areas) to be manually created in the maps module. Each location created also creates a tag with the name / hierarchy: darktable|locations|location_name|sub_location_name - instead we use the self-created branch of the tag tree representing locations and pulling the respective geocoordinates from the Google Places API

Challenges / Open issues

  • Creation of JPG images from RAW files can potentially accomplished through the command line - check if possible (include the need for an XMP file) using e.g., the linuxio.darktableimage (proof of concept was successful, but need to update all Foldernames to Linux Path!)
  • Is there a requirement to manage albums in Piwigo which are not part of the darktable database (e.g. albums which are generated/popluated with mobile uploads during travel) - yes there is; and the solution is rather simple. As we use darktable db as the master we will just not touch anything in Piwigo except it is connected to darktable (no back synchronization from Piwigo to darktable)
  • Assignment to the latest image album is currently managed through a flag - use a tag in the future or the inherent Piwigo menu items? - We just add the album Latest Images to the album tree in darktable and assign the respective tag to images in darktable accordingly
  • We need to create a number of views / tables that are combining / comparing darktable and Piwigo data. We can do this either using views in the mysql database or by creating respective sql queries representing joins. Both solutions would require to build SQLModel files for these combined data - which for now we cannot think of a solution to create them automatically like for the sources.
  • A fair number of images do not have location data. But we have attached a location tag to (almost) each image - the location tags are extracted from the darktable db, geocoordinates are generated for each location using the Google places API, and the data is loaded to the mysql database into the table t_locations_dtdata. We need however logic to attach this data to the respective images, by (1) updating the respective fields of an image record in the darktable db and (2) updating the EXIF data of each image on disk (so that we can subsequently use it in immich) - this is quite a challenge for the moment.

Development / Testing / Production

The approach to work with different sets of environment variables has been completely reworked. We only use two environment variables in a *.env* file to decide which environment type we run on (production or test) and if the program runs in a docker container or not (mainly used for pytest testing; not every time spinning up a docker container to run a test is convenient).

The actual paramaters for each use case are - depending on the *.env* variables from various config.yaml files, which are loaded and inject into a Config class.

.env file:

        ENVIRONMENT=TEST
        DOCKER=True

and the generation of the *Config class:

        def create_config_env() -> Config:
            # Get environment variables
            env = os.getenv("ENVIRONMENT", "PRODUCTION")
            docker = os.getenv("DOCKER", "False") == "True"

            # Set the folder path based on whether running in Docker
            if docker:
                folder = Path("/app")
            else:
                folder = Path("/home/ubuntu/development/darktable")

            # Determine the configuration file path based on the environment
            if env == "PRODUCTION":
                config_file_path = folder / "config_prod.yaml"
            elif env == "TEST" and docker:
                config_file_path = folder / "tests" / "supporting" / "config_test_docker.yaml"
            else:
                config_file_path = folder / "tests" / "supporting" / "config_test.yaml"

            # Load the YAML configuration file
            config_yaml = yaml.safe_load(config_file_path.read_text())

            # Create database descriptors and add them to the configuration
            database_desc = create_db_descriptors(config_yaml)
            config_yaml["DATABASE_DESC"] = database_desc

            # Create and return the Config object
            config = Config(**config_yaml)
        return config

Workpackages

All graphics / diagramms / sketches for this project can be found here

Documentation is generated using Doxygen and is available in Personal OneDrive under Daten/Documentation/Darktable [Link]()

Class hierarchies / Process Groups

NamedTuples

All major data containers are implemented as NamedTuples in shared.configuration

Class diagramm shared.configurartion

Creation of APIRouter Objects

Overview of classes in module services/database

Database Management

Databases / connections are handeld in the Database class. Each database is managed in its own class providing an Engine object. Session objects are only created when required (in the DatabaseTable object) using a context manager

Just to repeat that here: Each SA Base object can hold mutiple models in its metadata (Base.metadata.tables). Each model is connected to one (or more in principal) Session and each Session is connected to an Engine

Database Tables

The class DatabaseTable combines a SA model and a database allowing us to easily retrieve or write data to/from a physical database table. This allows us to implement all CRUD operations in a DatabaseTable object.

DataObject

A data object is meant to implement all data and methods connecting data of the same type from a source (darktable or Piwigo) to a destination (Mysql). The DataObject class implements the necessary load and compare methods to synchronize data between source and destination. Each DataObject also gets injected a TransformerBase Class which provides the functionality to change / amend data in the destination compared to source. The transformer functionality is injected into the DatabaseObject using a TransformerBase object, which only contains the abstract method call. Data object specific transformers are implemented as child classes to TransformerBase and (for now) only carry the transform function

APIRouter Objects

To create FastAPI endpoints for each DataObject we create a RouterBase Class. A RouterBase Class gets the respective DataObject injected as argument; the initializer function extracts the required response models for source and destination. For now each RouterBase provides only three methods, but can be easily extended.

Creating SQLModel Data Models

In order to keep things as easy as possible we are using SQLModel as it allows us to combine the required SQLAlchemy database models and the (pydantic) response models we need for FastAPI into a single model file (SQLModels are both pydantic and sqlalchemy objects).

Overview of the Model Creation Process

Model files are generated automatically. The creation of a usable SA MetaData() is handeld by a ModelCreatorXXX class; at the time of writing, there only exists a working version for ModelCreatorDatabase (in the folder tests/supporting/shards exists an incomplete piece of code allowing to create code from JSON / Dataframes).

The MetaData() object is injected into the class ModelCreator() which takes care of all actions required to create the models.py files for source and destination models. Destination models can be enriched with additonal columns or existing columns can be replaced. The intended generic principle is however to create a 1:1 copy of the darktable database table and copy it over to Mysql allowing only additional columns to be added for image / album management purposes. There are however some cases where we want to exclude certain columns from the copy (e.g. images->color_matrix which has a BLOB format and causes problems) or replace existing column (definitions) with something different (the positions column of some darktable tables is format <int> but needs to be <BIGINT> in Mysql).

Source table models can also be cloned so that we can create derived models, like for Tags, Albums and Locations which use the same base table in darktable (tags).

As this requires a rather complex logic (see next paragraph) there are two helper classes which are used by ModelCreator: ColumnCreator() and (currently) two flavors of DialectHandler() (for sqlite and mysql respectively). Both classes bundle specific tasks in the model creation and hence reduce complexity of the code (while hopefully increasing readability).

The complexity of this task arises from the fact that:

  • we have a single destination database (self-generarted Mysql) and multiple source "databases" (darktable SQLite and Piwigo API calls, reflecting the hidden database layer of Piwigo)
  • we have database models (for SQLAlchemy) reflecting database tables and FastAPI response models required to return data from our own API calls
  • we have overlapping Object names in destination and sources (like Images which are quite obviously a common term / object occuring in all three databases)
  • for the relationships (ForeignKeys) between the tables we need to generate the classnames and table names at all levels (columns, imports) etc

In order to avoid confusion (and issues with same names for objects in the SQLModel registry) we have to use a naming convention that identifies source and destination:

  • for source objects we use SourceObject as classnames (DtlibImages, PwImages)
  • for destination object we use SourceObjectDestination as classnames (DtLibImagesMysql, PwImagesMysql)
  • following the SQLModel paradigms we do not differentiate between SQLAlchemy data models and FastAPI response models. However, for the response models of FastAPI we do need to avoid the cyclic references described below. Where required, a Parent Model (containing only the items which should be returned in a relationship request) and a Child including all items of a model are created (the Child gets a Full added to the classname SourceObjectDestinationFull for example). Only the (full) Child is implemented as table in the database.

Registries

As we need to manage quite a large number of databases, models, transformers amd routers a registry for each of these is created.

Database objects are registered under their respective DatabaseDescriptor.name tag. All databases are defined in a single dictionary named db_descriptor (which is generated as part of the Config class) holding a DatabaseDescriptor object for each database.

For Models each directory containing files that implement models holds an *__init__.py* file which imports all models of the respective folder under their ClassNameFolderName /(e.g. the model file containing the Images() class in module / folder models.dt is imported as ImagesDt()). Then to create the model registry, all of these modules are imported and the classes are registered using their ClassnameFoldername (e.g. ImagesDt, Imagesmysql). With the model we also register a key to the respective database, holding the engine (and allowing us to create the respective session) for a specific model. In consequence we can, knowing only the classname, load a complete database table (model, base, and engine). Note however, that in the DatabaseTable class we have the option to connect a particular model with a databae of our choice (which we use for testing purposes, for example).

For TransformerBase childs (see above) an equivalent method as for Models is used. However, as we do not need to know anything about the database, the database key is omitted.

The same approach is also taken for RouterBase classes which are collected in a RouterRegistry class.

In consequence we can automatically create a DatabaseTable by just providing the Model classname which describes the table we want to retrieve / write to. From the model registry we get the model class itself and the db_key which in turn is used to get the database from the database registry.

For a data object we provide the source and the destination DatabaseTable. From the source database table we get retrieve the model_name which we in turn use to get the related TransformerBase class from the transformer registry.

For the creation of a RouterBase class we only need to provide a DataObject and the defined FastAPI endpoints are created automatically.

This coupling of registries and objects is on the one hand very convenient and on the other hand not ideal as it raises a certain risk of error. Using class naming generated by *__init__.py* and keys of dicts is for sure error-prone and needs to be watched carefully. For now however we do not have a better solution.

Another design topic is how we design the registries - Chat GPT always suggests to use getter methods for the registry elements. They are consequently implemented in all registries. In the code however calling objects and elements from the registries are usually just called by keys in either ["key"] or *.key* notation.

Pydantic / FastAPI response models (in folders models/...)

There are some foreign key relationships implemented in the darktable database. This needs to be considered when implementing the respective SQLAlchemy / Response Models.For both parent (mandatory) and child (optional) we have to include the data retrieved from the relationship into the response model file of the respective endpoint. In order to accomplish this just add the model type of the relationship data into the response models respectively:

dtlib_images.py: 

        class DtlibImagesMysqlFull(DtlibImagesMysql, table=True):
            __tablename__ = "t_images_dtlib"

            film_id: int = Field(foreign_key='t_filmrolls_dtlib.id', primary_key=True)
            film_rolls: "DtlibFilmRollsMysql" = Relationship()
            width:  int | None = Field(nullable=True)
            height:  int | None = Field(nullable=True)
            maker_id: int = Field(foreign_key='t_makers_dtlib.id', primary_key=True)
            makers: "DtlibMakersMysql" = Relationship()
            model_id: int = Field(foreign_key='t_models_dtlib.id', primary_key=True)
            models: "DtlibModelsMysql" = Relationship()
            lens_id: int = Field(foreign_key='t_lens_dtlib.id', primary_key=True)
            lens: "DtlibLensMysql" = Relationship()

The model names (here DtlibFilmrollsMysql or DtlibLensMysql) have to be in quotes only if we refer to them in other files. In earlier python versions the type List has to be imported from typing and must be used to get a list of references. From Python 3.10 onwards we can use the standard type list. Similary, the Optional[type] was replaced by type | None in latest python versions (NB that this has not been consistently implemented throughout the code). The SQLModel Config class uses from_attributes property which replaces orm_true in newer Pydantic versions.

Fixing Cyclic References and Recursions

In order to add the related fields into the response we need to avoid cyclic references at two places:

  1. As all our destination data schemes are in separated files and we need (parts of) the models to be included into each other (parent schema needs to import child schema and child schema can import parents schema) we need to make sure Pydantic can validate the models without throwing cyclic import errors.
  2. If we include all data from the parent in the child and vice versa and then retrieve all child or all parent records we run into the issue of recursion. i.e., we will load data infinitely as each parent includes each child which includes all parents.

Fixing cyclic imports

Pydantic provides a function which allows to put the imports at the back of each file or in the *__init__.py* file and then update the model later in order to avoid cyclic references. N.B. that this method changed from update_forward_refs() to model_rebuild() and the earlier one does not work anymore in higher Pydanctic versions. We change the *__init__.py* file to include our DestinationDataSchemas:

__init__.py:

        import pkgutil
        import importlib
        import os
        import inspect
        from sqlmodel import SQLModel
        from shared.configuration import log

        # pylint: disable=W1203

        # Get the current package (models)
        package = os.path.dirname(__file__)

        # Iterator over all modules in the package
        for _, module_name, _ in pkgutil.iter_modules([package]):

            # Dynamically import the module
            module_name = f".{module_name}"
            module = importlib.import_module(module_name, package=__name__)

            # Get all the classes defined in the module
            for name, obj in inspect.getmembers(module, inspect.isclass):
                # Ensure that the class is a subclass of SQLModel but not SDLModel itself
                if issubclass(obj, SQLModel) and obj != SQLModel:

                    alias = f"{name}"
                    log.logger.debug(f"Adding class {alias} to global namespace for {obj.__module__}")

                    # Add the class with the alias to the global namespace
                    globals()[alias] = obj

            # In order for SQLModel relationships to work we need to rebuild all models
            for name, obj in globals().items():
                if isinstance(obj, type) and issubclass(obj, SQLModel) and obj != SQLModel:
                    obj.model_rebuild()

With this method the import displayed above works wihtout cyclic import errors. N.B. that we also need to include the block

        if TYPE_CHECKING:
            from schemas import Objectname

in the DestinationDataModel definition (see above) in order to only import the respective models for type checking purposes.

For a good explanation of the importing / cyclic reference issue see the explanation in the Tutorial for SQLModel.

Fixing Recursion

The easiest way to fix recursion is to load only one record at a time. This allows to keep the schema as is and not run into recursion errors. This is however not the use case we want to accomplish here.

In order to avoid recursions in the use case at hand, we split the models for each object in a part which does not include the relationship data recursively and a full model including all data. For pulling the data elements through the relationship we only pull the first part of the split model. In order to accomplish this we use class inheritance:

    class DtlibImagesMysql(SQLModel):
        id: int = Field(primary_key=True)
        filename: str | None = Field(sa_column=Column(CustomString(), nullable=True))
        datetime_taken:  int | None = Field(sa_column=Column(CustomInteger(), nullable=True))


        model_config = ConfigDict(from_attributes=True, arbitrary_types_allowed=True, protected_namespaces=())   # type: ignore

    class DtlibImagesMysqlFull(DtlibImagesMysql, table=True):
        __tablename__ = "t_images_dtlib"

In this example we only use the DtlibImagesMysql class as a reference in the images response model. N.B. the inheritance in the second model DtlibImagesMysqlFull - with this we can read all relationship data correctly. As outlined above only the second model class is implemented as a table in the database.

Queries / API endpoints using queries with relationships

In principle there are two different ways to retrieve the data from the mysql database (where data is joined through foreign keys):

  1. use the pandas.read_sql_query method like in:
         df = pd.read_sql_query(
             sql=session_local.query(TaggedImages).statement,
             con=session_local.get_bind(TaggedImages).connect(),
         )
    
  2. use the sqlalchemy.query query method and convert into a dataframe:
         result = session_local.query(TaggedImages).all()
         data_dict_list = [item.__dict__ for item in result]
         df = pd.DataFrame(data_dict_list)
    
    Both have their advantages and disadvantages. pandas.read_sql_query produces a dataframe with all related query objects (so from the example above one will get all fields from tagged_images_dt, t_images_dt, t_tags_dt, t_filmrolls_dt, a total of 59 columns). Columns can only be constrained in the query() statement by using the class_name.field_name terminology (TaggedImages.imgid, Images.filename, Filmrolls.folder) - this is a severe limitation in the flexibility of the query as we need to determine a (limited) number of output columns upfront including their relationship information. And this output then needs to be linked to the DestinationDataSchema manually.

The second method produces a result object that also includes the relationships, but as their models:

        df.iloc[0]:
        _sa_instance_state    <sqlalchemy.orm.state.InstanceState object at ...
        id                                                                    1
        position                                                   176093659136
        imgid                                                                31
        tagid                                                                19
        hash                                                              31|19
        filename              <models.images_dt.DestinationDataModel object ...
        tagname               <models.tags_dt.DestinationDataModel object at...
        Name: 0, dtype: object

this is not very handy if we directly want to access the query fields (df.iloc[0]['filename'].filename), but we can define the output of the API endpoint directly by using the DestinationDataSchema and the models referenced therein:

        schemas/tagged_images_dt.py
        class SourceDataSchema(BaseModel):

            imgid : int
            tagid : int
            position: int
            hash: str

        class DestinationDataSchema(SourceDataSchema):

            id: int
            filename: "Images"
            tagname: "Tags"

where Images and Tags do refer to the respective DestinationDataModels defined for the related objects (see also above)

darktable database partial

Synchronize tables from darktable into dt2pw mysql db

For most of the tables we have to calculate a hash in order to identify changed rows. Only the images table in library.db has a changed_at timestamp we can use to identify changes.

Approach:

  1. Load the data from the sqlite table
  2. calculate the hash (where required) in memory. The hash includes only columns which exist in sqlite db
  3. load the reduced data (only selected columns) from the mysql database; optionally compute the checksum (or load from db)
  4. identify the records to be updated, inserted or deleted in the sqlite database based on checksum
    1. left outer join with left = sqlite data and right = mysql data on
      1. id if existing (all tables except mapping tables)
      2. hash (all mapping tables)
    2. mark delete and update using pandas _merge field
      1. left_only = insert
      2. right_only = delete
      3. both -> compare hash if existing
  5. filter update and insert from step before
  6. transform update and insert data records
  7. execute update and insert in mysql database
  8. filter delete records
  9. delete identified records from mysql database

darktable business logic

Create a tag hierarchy in German and English language

The initial tag hierarchy provided in darktable is mono-lingual only.

We are using the synonyms field of a tag to provide a translation (and also synonyms in a secondary language). The synonyms string for normal tags must be structured as follows:

‍synonym1, synonym2...synonym_n (in PL) |SL: translation, synonym1, synonym2...synonym_n (in SL)

where PL and SL stand for primary and secondary language respectively, and must be replaced with the non-capitalized ISO-2-letter codes (en, de) of the corresponding languages. All parts can be left out, i.e. the synonym field can also be left completely empty. In this case the primary language values will be used for the secondary language as well

The translation into the secondary is separated by the separator *|(2-digit-lanuage-code):*, e.g., *|en:* for english language.

The name field in the darktable table tags contains the full hierarchy of a given tag in the primary language (here: DE) separated by "|". In order to translate this full hierarchy to the secondary language (here: EN) we split the name field using split('|') and the resulting list is mapped against a translation dictionary (which holds the key (name_p): value(name_s) pairs from the tag table). This works fine but can run into pitfalls, e.g. if there accidentally are different translation for a tag entered or if a word can have more than one meaning (name miller versus profession miller).

Create image information in Piwigo from darktable metadata

As we use the image name in Piwigo in order to store the correlation between Piwigo id and darktable id we have to use the comment field in Piwigo in order to add more information to any image.

In darktable we will use the metadata area to enter the data which will show up in Piwigo comments. Data will be entered in the title field of the metadata area in darktable. Languages are separated by '||'; if the separator is not included in the title the text will be considered to be in the primary language

Create private images in Piwigo by using darktable notes

Besides making a complete category / album visible only to logged in users we also implement the possibility to make individual images private (i.e. guest users will see all non-private images, if logged in users will see both private and non-private images)

In order to identify images which are supposed to be private we will use the notes field in the metadata area of darktable. If the word (sic!) 'private' is included in the notes, the image will be set to non-public in Piwigo

Create a album table from the tag hierarchy tree below a keyword

We want to store the album hierarchy for piwigo by the means of a separate branch in darktable tags table. The first element of this branch can be parametrized. Any hierarchy below the DT_ALBUM_TAG will be taken as the album hierarchy for Piwigo

  • We need to split of everything left of the keyword (keyword is allowed to be part of a hierarchy itself )
  • We want the following output:
    • album_name_p
    • album_name_s -> same as for all other tags, use |(language): separator
    • album_hierarchy; path information is also available in piwigo data (pw_album_fullname), hence we should use the same format as for the tage
    • album_description_p -> both descriptions are entered in the synonyms field. They are separated from the translation with |dsc: and the two languages primary||secondary.
    • album_description_s (see above)
    • private_flag: indicated by separator *|priv: 1*| - this will default to not private-

The synonyms string for normal tags must be structured as follows:

‍synonym1, synonym2...synonym_n (in PL) |SL: translation, synonym1, synonym2...synonym_n (in SL) |dsc: (album description in PL)||(album description in SL) |priv:

where PL and SL stand for primary and secondary language respectively. All parts can be left out, i.e. the synonym field can also be left empty. In this case the primary language values will be used for the secondary language as well. NB that the album description will be empty in both cases

Appendix

Modules

models

Class diagram of models module

routers

Class diagram of routers module

services

database

Class diagram of database module

factories

Class diagram of factories module

shared

Class diagram of shared module

transformers

Class diagram of transformers module

------------------— TO BE REWORKED ------------------—

Piwigo database partial

Combining database partial

Album Image darttable

Purpose: prepare the data for a smooth upload to Piwigo. But: want a list of all images in all albums. We can aggregate later. Objectives:

  1. name in both languages (dt_id/pw_id)
  2. description / comment in both languages
  3. list of tags as ids
  4. Author (metadata, id = 0)
  5. change_timestamp (identify update) -— For humans only ;-) --—
  1. width
  2. height
  1. filename
  2. maker
  3. model
  4. datetime_taken
  5. album_hierarchy_p (we do not the individual album name or secondary language)
  6. taglist as text (really)

Populate / update darktable db

CRUD processes for Piwigo db

Delta generation

Identify images to be updated in Piwigo

Generic data flow:

  1. Run updateable-endpoint for darktable tags, albums - do not synchronize
  2. Run through update of Piwigo objects tags, albums as required (create, update, delete)
  3. Run synchronize for darktable object to update MySQL
  4. Run synchronize for darktable metadata, tagged_images, color_label, filmrolls
  5. Run updateable for darktable album_images

Identify alhums to be updated Piwigo

N.B. Action on Piwigo needs to happen before we update the MySQL database tables Use /darktable/albums/updateable endpoint to identify inserts, updates and deletes. We want a fully mastered list so we can just run respective update-functions using a join through the t_category_album_map table. In parallel update t_category_album_map table:

  • update: no action
  • delete: remove line
  • insert: create line

N.B. We do not impacted Piwigo generated albums with this process, as we do not

Idenfify tags to be updated in Piwigo

N.B. Action on Piwigo needs to happen before we update the MySQL database tables Use /darktable/tags/updateable endpoint to identify inserts, updates and deletes. We want a fully snychronized list so we can just run respective update-functions using a join through the t_tags_map table. In parallel update t_tag_maps table:

  • update not action
  • delete remove line
  • insert create line

Synchronize delta