Darktable2Piwigo
|
This Python code connects two well known Open Source products for Photographers, namely the Photo editing software darktable and the photo gallery software for the web Piwigo.
There exists already a working python script for this purpose which is running on a desktop computer using a UI. There are however, some drawbacks to the existing app:
For this re-aerchitectured web-based version of the app the objective is to reduce the user effort to the first step above, i.e., editing and tagging the images. Everything else should be fully automated (as much as possible with reasonable effort), so that all changes in darktable are automatically synchronized to Piwigo
The approach to work with different sets of environment variables has been completely reworked. We only use two environment variables in a *.env* file to decide which environment type we run on (production or test) and if the program runs in a docker container or not (mainly used for pytest testing; not every time spinning up a docker container to run a test is convenient).
The actual paramaters for each use case are - depending on the *.env* variables from various config.yaml files, which are loaded and inject into a Config class.
.env file:
ENVIRONMENT=TEST DOCKER=True
and the generation of the *Config class:
def create_config_env() -> Config: # Get environment variables env = os.getenv("ENVIRONMENT", "PRODUCTION") docker = os.getenv("DOCKER", "False") == "True" # Set the folder path based on whether running in Docker if docker: folder = Path("/app") else: folder = Path("/home/ubuntu/development/darktable") # Determine the configuration file path based on the environment if env == "PRODUCTION": config_file_path = folder / "config_prod.yaml" elif env == "TEST" and docker: config_file_path = folder / "tests" / "supporting" / "config_test_docker.yaml" else: config_file_path = folder / "tests" / "supporting" / "config_test.yaml" # Load the YAML configuration file config_yaml = yaml.safe_load(config_file_path.read_text()) # Create database descriptors and add them to the configuration database_desc = create_db_descriptors(config_yaml) config_yaml["DATABASE_DESC"] = database_desc # Create and return the Config object config = Config(**config_yaml) return config
All graphics / diagramms / sketches for this project can be found here
Documentation is generated using Doxygen and is available in Personal OneDrive under Daten/Documentation/Darktable [Link]()
All major data containers are implemented as NamedTuples in shared.configuration
Databases / connections are handeld in the Database class. Each database is managed in its own class providing an Engine object. Session objects are only created when required (in the DatabaseTable object) using a context manager
Just to repeat that here: Each SA Base object can hold mutiple models in its metadata (Base.metadata.tables). Each model is connected to one (or more in principal) Session and each Session is connected to an Engine
The class DatabaseTable combines a SA model and a database allowing us to easily retrieve or write data to/from a physical database table. This allows us to implement all CRUD operations in a DatabaseTable object.
A data object is meant to implement all data and methods connecting data of the same type from a source (darktable or Piwigo) to a destination (Mysql). The DataObject class implements the necessary load and compare methods to synchronize data between source and destination. Each DataObject also gets injected a TransformerBase Class which provides the functionality to change / amend data in the destination compared to source. The transformer functionality is injected into the DatabaseObject using a TransformerBase object, which only contains the abstract method call. Data object specific transformers are implemented as child classes to TransformerBase and (for now) only carry the transform function
To create FastAPI endpoints for each DataObject we create a RouterBase Class. A RouterBase Class gets the respective DataObject injected as argument; the initializer function extracts the required response models for source and destination. For now each RouterBase provides only three methods, but can be easily extended.
In order to keep things as easy as possible we are using SQLModel as it allows us to combine the required SQLAlchemy database models and the (pydantic) response models we need for FastAPI into a single model file (SQLModels are both pydantic and sqlalchemy objects).
Model files are generated automatically. The creation of a usable SA MetaData() is handeld by a ModelCreatorXXX class; at the time of writing, there only exists a working version for ModelCreatorDatabase (in the folder tests/supporting/shards exists an incomplete piece of code allowing to create code from JSON / Dataframes).
The MetaData() object is injected into the class ModelCreator() which takes care of all actions required to create the models.py files for source and destination models. Destination models can be enriched with additonal columns or existing columns can be replaced. The intended generic principle is however to create a 1:1 copy of the darktable database table and copy it over to Mysql allowing only additional columns to be added for image / album management purposes. There are however some cases where we want to exclude certain columns from the copy (e.g. images->color_matrix which has a BLOB format and causes problems) or replace existing column (definitions) with something different (the positions column of some darktable tables is format <int> but needs to be <BIGINT> in Mysql).
Source table models can also be cloned so that we can create derived models, like for Tags, Albums and Locations which use the same base table in darktable (tags).
As this requires a rather complex logic (see next paragraph) there are two helper classes which are used by ModelCreator: ColumnCreator() and (currently) two flavors of DialectHandler() (for sqlite and mysql respectively). Both classes bundle specific tasks in the model creation and hence reduce complexity of the code (while hopefully increasing readability).
The complexity of this task arises from the fact that:
In order to avoid confusion (and issues with same names for objects in the SQLModel registry) we have to use a naming convention that identifies source and destination:
As we need to manage quite a large number of databases, models, transformers amd routers a registry for each of these is created.
Database objects are registered under their respective DatabaseDescriptor.name tag. All databases are defined in a single dictionary named db_descriptor (which is generated as part of the Config class) holding a DatabaseDescriptor object for each database.
For Models each directory containing files that implement models holds an *__init__.py* file which imports all models of the respective folder under their ClassNameFolderName /(e.g. the model file containing the Images() class in module / folder models.dt is imported as ImagesDt()). Then to create the model registry, all of these modules are imported and the classes are registered using their ClassnameFoldername (e.g. ImagesDt, Imagesmysql). With the model we also register a key to the respective database, holding the engine (and allowing us to create the respective session) for a specific model. In consequence we can, knowing only the classname, load a complete database table (model, base, and engine). Note however, that in the DatabaseTable class we have the option to connect a particular model with a databae of our choice (which we use for testing purposes, for example).
For TransformerBase childs (see above) an equivalent method as for Models is used. However, as we do not need to know anything about the database, the database key is omitted.
The same approach is also taken for RouterBase classes which are collected in a RouterRegistry class.
In consequence we can automatically create a DatabaseTable by just providing the Model classname which describes the table we want to retrieve / write to. From the model registry we get the model class itself and the db_key which in turn is used to get the database from the database registry.
For a data object we provide the source and the destination DatabaseTable. From the source database table we get retrieve the model_name which we in turn use to get the related TransformerBase class from the transformer registry.
For the creation of a RouterBase class we only need to provide a DataObject and the defined FastAPI endpoints are created automatically.
This coupling of registries and objects is on the one hand very convenient and on the other hand not ideal as it raises a certain risk of error. Using class naming generated by *__init__.py* and keys of dicts is for sure error-prone and needs to be watched carefully. For now however we do not have a better solution.
Another design topic is how we design the registries - Chat GPT always suggests to use getter methods for the registry elements. They are consequently implemented in all registries. In the code however calling objects and elements from the registries are usually just called by keys in either ["key"] or *.key* notation.
There are some foreign key relationships implemented in the darktable database. This needs to be considered when implementing the respective SQLAlchemy / Response Models.For both parent (mandatory) and child (optional) we have to include the data retrieved from the relationship into the response model file of the respective endpoint. In order to accomplish this just add the model type of the relationship data into the response models respectively:
dtlib_images.py: class DtlibImagesMysqlFull(DtlibImagesMysql, table=True): __tablename__ = "t_images_dtlib" film_id: int = Field(foreign_key='t_filmrolls_dtlib.id', primary_key=True) film_rolls: "DtlibFilmRollsMysql" = Relationship() width: int | None = Field(nullable=True) height: int | None = Field(nullable=True) maker_id: int = Field(foreign_key='t_makers_dtlib.id', primary_key=True) makers: "DtlibMakersMysql" = Relationship() model_id: int = Field(foreign_key='t_models_dtlib.id', primary_key=True) models: "DtlibModelsMysql" = Relationship() lens_id: int = Field(foreign_key='t_lens_dtlib.id', primary_key=True) lens: "DtlibLensMysql" = Relationship()
The model names (here DtlibFilmrollsMysql or DtlibLensMysql) have to be in quotes only if we refer to them in other files. In earlier python versions the type List has to be imported from typing and must be used to get a list of references. From Python 3.10 onwards we can use the standard type list. Similary, the Optional[type] was replaced by type | None in latest python versions (NB that this has not been consistently implemented throughout the code). The SQLModel Config class uses from_attributes property which replaces orm_true in newer Pydantic versions.
In order to add the related fields into the response we need to avoid cyclic references at two places:
Pydantic provides a function which allows to put the imports at the back of each file or in the *__init__.py* file and then update the model later in order to avoid cyclic references. N.B. that this method changed from update_forward_refs() to model_rebuild() and the earlier one does not work anymore in higher Pydanctic versions. We change the *__init__.py* file to include our DestinationDataSchemas:
__init__.py: import pkgutil import importlib import os import inspect from sqlmodel import SQLModel from shared.configuration import log # pylint: disable=W1203 # Get the current package (models) package = os.path.dirname(__file__) # Iterator over all modules in the package for _, module_name, _ in pkgutil.iter_modules([package]): # Dynamically import the module module_name = f".{module_name}" module = importlib.import_module(module_name, package=__name__) # Get all the classes defined in the module for name, obj in inspect.getmembers(module, inspect.isclass): # Ensure that the class is a subclass of SQLModel but not SDLModel itself if issubclass(obj, SQLModel) and obj != SQLModel: alias = f"{name}" log.logger.debug(f"Adding class {alias} to global namespace for {obj.__module__}") # Add the class with the alias to the global namespace globals()[alias] = obj # In order for SQLModel relationships to work we need to rebuild all models for name, obj in globals().items(): if isinstance(obj, type) and issubclass(obj, SQLModel) and obj != SQLModel: obj.model_rebuild()
With this method the import displayed above works wihtout cyclic import errors. N.B. that we also need to include the block
if TYPE_CHECKING: from schemas import Objectname
in the DestinationDataModel definition (see above) in order to only import the respective models for type checking purposes.
For a good explanation of the importing / cyclic reference issue see the explanation in the Tutorial for SQLModel.
The easiest way to fix recursion is to load only one record at a time. This allows to keep the schema as is and not run into recursion errors. This is however not the use case we want to accomplish here.
In order to avoid recursions in the use case at hand, we split the models for each object in a part which does not include the relationship data recursively and a full model including all data. For pulling the data elements through the relationship we only pull the first part of the split model. In order to accomplish this we use class inheritance:
class DtlibImagesMysql(SQLModel): id: int = Field(primary_key=True) filename: str | None = Field(sa_column=Column(CustomString(), nullable=True)) datetime_taken: int | None = Field(sa_column=Column(CustomInteger(), nullable=True)) model_config = ConfigDict(from_attributes=True, arbitrary_types_allowed=True, protected_namespaces=()) # type: ignore class DtlibImagesMysqlFull(DtlibImagesMysql, table=True): __tablename__ = "t_images_dtlib"
In this example we only use the DtlibImagesMysql class as a reference in the images response model. N.B. the inheritance in the second model DtlibImagesMysqlFull - with this we can read all relationship data correctly. As outlined above only the second model class is implemented as a table in the database.
In principle there are two different ways to retrieve the data from the mysql database (where data is joined through foreign keys):
df = pd.read_sql_query( sql=session_local.query(TaggedImages).statement, con=session_local.get_bind(TaggedImages).connect(), )
result = session_local.query(TaggedImages).all() data_dict_list = [item.__dict__ for item in result] df = pd.DataFrame(data_dict_list)Both have their advantages and disadvantages. pandas.read_sql_query produces a dataframe with all related query objects (so from the example above one will get all fields from tagged_images_dt, t_images_dt, t_tags_dt, t_filmrolls_dt, a total of 59 columns). Columns can only be constrained in the query() statement by using the class_name.field_name terminology (TaggedImages.imgid, Images.filename, Filmrolls.folder) - this is a severe limitation in the flexibility of the query as we need to determine a (limited) number of output columns upfront including their relationship information. And this output then needs to be linked to the DestinationDataSchema manually.
The second method produces a result object that also includes the relationships, but as their models:
df.iloc[0]: _sa_instance_state <sqlalchemy.orm.state.InstanceState object at ... id 1 position 176093659136 imgid 31 tagid 19 hash 31|19 filename <models.images_dt.DestinationDataModel object ... tagname <models.tags_dt.DestinationDataModel object at... Name: 0, dtype: object
this is not very handy if we directly want to access the query fields (df.iloc[0]['filename'].filename), but we can define the output of the API endpoint directly by using the DestinationDataSchema and the models referenced therein:
schemas/tagged_images_dt.py class SourceDataSchema(BaseModel): imgid : int tagid : int position: int hash: str class DestinationDataSchema(SourceDataSchema): id: int filename: "Images" tagname: "Tags"
where Images and Tags do refer to the respective DestinationDataModels defined for the related objects (see also above)
For most of the tables we have to calculate a hash in order to identify changed rows. Only the images table in library.db has a changed_at timestamp we can use to identify changes.
Approach:
The initial tag hierarchy provided in darktable is mono-lingual only.
We are using the synonyms field of a tag to provide a translation (and also synonyms in a secondary language). The synonyms string for normal tags must be structured as follows:
synonym1, synonym2...synonym_n (in PL) |SL: translation, synonym1, synonym2...synonym_n (in SL)
where PL and SL stand for primary and secondary language respectively, and must be replaced with the non-capitalized ISO-2-letter codes (en, de) of the corresponding languages. All parts can be left out, i.e. the synonym field can also be left completely empty. In this case the primary language values will be used for the secondary language as well
The translation into the secondary is separated by the separator *|(2-digit-lanuage-code):*, e.g., *|en:* for english language.
The name field in the darktable table tags contains the full hierarchy of a given tag in the primary language (here: DE) separated by "|". In order to translate this full hierarchy to the secondary language (here: EN) we split the name field using split('|') and the resulting list is mapped against a translation dictionary (which holds the key (name_p): value(name_s) pairs from the tag table). This works fine but can run into pitfalls, e.g. if there accidentally are different translation for a tag entered or if a word can have more than one meaning (name miller versus profession miller).
As we use the image name in Piwigo in order to store the correlation between Piwigo id and darktable id we have to use the comment field in Piwigo in order to add more information to any image.
In darktable we will use the metadata area to enter the data which will show up in Piwigo comments. Data will be entered in the title field of the metadata area in darktable. Languages are separated by '||'; if the separator is not included in the title the text will be considered to be in the primary language
Besides making a complete category / album visible only to logged in users we also implement the possibility to make individual images private (i.e. guest users will see all non-private images, if logged in users will see both private and non-private images)
In order to identify images which are supposed to be private we will use the notes field in the metadata area of darktable. If the word (sic!) 'private' is included in the notes, the image will be set to non-public in Piwigo
We want to store the album hierarchy for piwigo by the means of a separate branch in darktable tags table. The first element of this branch can be parametrized. Any hierarchy below the DT_ALBUM_TAG will be taken as the album hierarchy for Piwigo
The synonyms string for normal tags must be structured as follows:
synonym1, synonym2...synonym_n (in PL) |SL: translation, synonym1, synonym2...synonym_n (in SL) |dsc: (album description in PL)||(album description in SL) |priv:
where PL and SL stand for primary and secondary language respectively. All parts can be left out, i.e. the synonym field can also be left empty. In this case the primary language values will be used for the secondary language as well. NB that the album description will be empty in both cases
Purpose: prepare the data for a smooth upload to Piwigo. But: want a list of all images in all albums. We can aggregate later. Objectives:
Generic data flow:
N.B. Action on Piwigo needs to happen before we update the MySQL database tables Use /darktable/albums/updateable endpoint to identify inserts, updates and deletes. We want a fully mastered list so we can just run respective update-functions using a join through the t_category_album_map table. In parallel update t_category_album_map table:
N.B. We do not impacted Piwigo generated albums with this process, as we do not
N.B. Action on Piwigo needs to happen before we update the MySQL database tables Use /darktable/tags/updateable endpoint to identify inserts, updates and deletes. We want a fully snychronized list so we can just run respective update-functions using a join through the t_tags_map table. In parallel update t_tag_maps table: