Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • slapos slapos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Merge requests 122
    • Merge requests 122
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Jobs
  • Commits
Collapse sidebar
  • nexedi
  • slaposslapos
  • Merge requests
  • !1679

Open
Created Nov 05, 2024 by Xavier Thompson@xavier_thompsonOwner7 of 18 tasks completed7/18 tasks
  • Report abuse
Report abuse

Draft: erp5: Introduce mariadb replication at SlapOS level

  • Overview 90
  • Commits 33
  • Changes 28

1. Remove mariadb_update service

Instead, initialize databases and users on creation, and run updater and apply timezones info on every (re)start. This covers the actions that mariadb_update used to handle.

In particular: before this, mariadb_update would regularly overwrite any changes to a user (e.g. password change) made through direct interaction with mariadb. Now the configuration in SlapOS is really only an initial configuration.

This is a prerequisite to mariadb replication because mariadb_update was a) interfering with replication and b) overwriting the users replicated from a primary.

To facilitate these changes, component/mariadb now exposes a template script for the mariadbd service, with ready hooks to take actions on database creation and on database (re)start.


2. Allow requesting a mariadb set-up to replicate another mariadb

Using parameters of the form:

'replication': {
  'bootstrap-url': 'http(s)://<recent-backup-of-primary>,
  'primary-url': 'mysql://<replication-user>:<password>@<ip>:<port>',
  'seconds-behind-master-threshold': <integer, defaults to 0>,
}

This takes effect on mariadb database creation - when no data exists yet. That way existing data cannot be deleted by setting or changing the replication parameters after the fact.

A promise checks that the state of the running mariadb matches the requested state (replica/primary, replication source); but if not, the mariadb database will not automatically converge without human intervention once ~/srv/mariadb directory exists.

The bootstrap-url may be omitted: this skips replication bootstrap and requires that all binlogs be still available on the primary. This is useful when the primary is recent and may not have a ready backup for bootstrap yet.

Finally, a mariadb replica can optionally disable TCP access:

'replication': {
  # ...
  'allow-tcp-connections-on-replica': True or False, 
}

Add option allow-tcp-connections-on-replica, set to true by default. This option concerns only replica mariadbs: TCP connections are always enabled when replication parameters are unset, even if the database is actually in replication state in contradiction with the parameters.

This option corresponds to skip-networking in mariadb configuration; this setting is static, so when it changes the mariadb process will be automatically restarted by SlapOS to apply the new configuration.

Note: disabling TCP connections on replicas with this option currently breaks the property that takoever can be done without having to change the instance parameters and reprocess the partition, as until then the taken-over mariadb will still have TCP disabled and remain unusable.

TODO:

  • Allow a replica mariadb to stop replicating and become a primary without requiring manual login to the instance and manual operations on the DB (e.g. by providing a url where the user can click to perform this action). This will be a necessary step of an eventual automated takeover procedure.

  • Find a better solution for mariadb_update functionality. See #1.

  • Make the mariadb_replication promise avoid needless partition processing (bang): currently, it the will trigger a bang when the state of mariadb (replica/primary, replication source) does not match the expected state (corresponding to the parameters), even though SlapOS only controls the initial state on database creation, and reprocessing the partition will by-design not make it converge to the expected state.

  • For mariadb replicas requested with allow-tcp-connections-on-replica=false (which results in skip-networking being written in the config file), find a way to takoever without needing to edit its instance parameters and reprocess the partition. This requires a way to restart mariadb with different parameters with different options, using only the privileges of the partition. This could maybe be done by wrapping the mariadb service in a wrapper program (maybe an ad-hoc script, maybe supervisord) that allows restarting mariadb with skip-networking enabled or disabled as appropriate. Note that when allow-tcp-connections-on-replica=true, takoever does not require editing the instance parameters nor reprocessing (which is the main reason true is the current default).


3. Automate mariadb replication bootstrapping

Make any mariadb (replica or primary) a) statically serve recent backups (dumps) on the same IP as the mariadb server and b) have a configured replication_user with random password, and publish two corresponding connection parameters replication-bootstrap-url and replication-primary-url, to be used to setup a replica mariadb.

TODO:

  • Use mariabackup instead of dumps to allow fast bootstrapping of a replica. This will affect the replica-initialisation logic as well.

  • Propagate these mariadb connection parameters in erp5 root instance.
    --> mariabd-replication-primary-url and mariadb-replication-bootstrap-url


4. Authenticate and Encrypt Primary <---> Replica communications with TLS

Feature

  • Use TLS on public IPv6: a) serve the backups with TLS on IPv6 and b) proxy the mariadb server with TLS on IPv6. Another option would be to enable TLS in mariadb directly, but this allows decoupling mariadb user configuration from TLS configuration, allows to make sure all users are protected by TLS, and allows using TLS on IPv6 and not on IPv4.

Each mariadb instance (whether in primary or replica mode) now has by default a dedicated caucased server. This caucased is conceptually responsible for authenticating and encrypting access to that mariadb instance — _ although currently only when that access is over IPv6_.

Besides the caucased itself, a caucase user certificate (user meaning admin) for this caucased is automatically issued inside the same instance. This certificate is then used by a dedicated service inside the instance to sign Certificate Signing Requests (CSR) that are passed via instance parameters.

To control whether caucased is enabled and to pass CSRs to sign, mariadb now takes parameters of the form:

"caucased": {
  "enable": true or false, true by default,
  "csr-to-sign": <PEM-encoded string representing one or more CSRs>
}

In addition, two caucase service certificates are also automatically issued in the instance. Conceptually these are used to authenticate and encrypt access to the mariadb server (ideally it should be one, but as a first step it was easier to have two, one for authenticating access to mariadb itself and one for authenticating access to the bootstrap (HTTP) server, because different bundling an naming conventions are expected).

The url of the caucased is published in connection parameters under replication-caucased-url. When a replica instance is requested to replicate from a mariadb that uses caucased, it should receive that caucased-url in replication parameters like this:

"replication": {
  "caucased-url": <primary-caucased-url>,
  [...]
} 

It then requests a certificate on the primary's caucased and publishes the corresponding CSR under caucased-csr-to-sign. This can then be passed in the instance parameters of the primary to make the primary caucased "semi-automatically" approve that CSR. After that the replica will obtain and keep up-to-date a certificate that it can use to connect to the primary.

This scheme allows to establish a secure encrypted connection with mutual authentication (mTLS) between two SlapOS instances, with zero knowledge in the SlapOS master. It is generic and could be reused for any instance-to-instance authentication in SlapOS. To obtain access to resources of an instance protected in this way, one has to prove they have the right to modify its instance parameters.

To enable access on IPv6 and IPv4 simultaneously, each mariadb now also has by default a proxy that listens on IPv6 and forwards connections to the mariadb server.

  • If caucased is disabled, TLS is not enforced on IPv6 and this proxy is a single HAProxy that exposes both the mariadb server and the bootstrap (HTTP) server on IPv6.
  • If caucased is enabled, TLS is enforced on IPv6. HAProxy is then used to proxy the bootstrap (HTTP) server and enforce TLS. ProxySQL is used to proxy the mariadb server and enforce TLS. ProxySQL is used in this case instead of HAProxy because HAProxy is not compatible with the STARTTLS-like protocol the mariadb replica will use to connect.

These proxies are controlled by instance parameters of the form:

"haproxy": {
  "enable": true or false, true by default
}

Despite the name, this also affects ProxySQL.

If caucased is enabled but haproxy is disabled, the caucase service certificates are unused, and the mariadb server and the bootstrap (HTTP) server only listen on IPv4 (unless legacy parameter use-ipv6 is used, in which case they only listen on IPv6).

TODO

  • Use only one service certificate on the primary side, instead of two currently.

  • Instead of auto-approving the service certificates for the mariadb server via caucased's auto-approve feature, use the sign-csr service to approve all service certificates, including the ones on the same instance. This will be safer (no risk another certificate than the one intended is auto-approved) and more consistent.

  • Decide whether ProxySQL's lack of CRL support is an issue, and find a workaround or another solution if it is.

  • Deprecate use-ipv6 somehow.

  • Find a more generic name than "haproxy" for the HAProxy and ProxySQL related parameters; maybe simply IPv6-proxy?

  • Forward the publication of these new mariadb published parameters in erp5 root instance.

  • Find better naming conventions for published parameters: replication-caucased-url is suboptimal because that caucased-url may end-up being used to grant IPv6 access in other use-cases than replication. caucased-csr-to-sign does not make it clear that the CSRs need to be signed by the primary's caucased, not the caucased of the replica. Generaly, caucased and replication related parameters tend to be ambiguous in whether they refer to the local mariadb or the primary when the current instance is a replica.

  • Add these new instance parameters in JSON Schema.


5. Automate neo asynchronous replication

When upstream-cluster and upstream-masters are given, also pass --backup to the neo master so that it converges automatically to BACKINGUP state.

In other words, when a neo is requested with upstream- parameters, make it to automatically start in BACKINGUP state without requiring manual intervention. This applies only on neo database creation.

Add a promise that asserts neo is BACKINGUP state when upstream- parameters are set (but does not assert neo is in RUNNING state when upstream- parameters are unset, for backwards compatibility with current usage).

TODO:

  • Make the neo state promise avoid needless partition processing (bang): currently, it the will trigger a bang e.g. when the state is RUNNING and the promise expects BACKINGUP, even though SlapOS only controls the initial state on database creation, and reprocessing the partition will by-design not attempt to make it converge to the expected state.

6. Make zope aware of replication

Deactivate zope promises when the neo is expected to be BACKINGUP, as this makes the zope process crash, which is currently expected.

Deactivating the zope process entirely in that case is not desired because reactivating it would require updating instance parameters and reprocessing the partition. Instead, ideally, the zope service should adapt to the state of the neo.

Also, move zope service from etc/service to etc/run to make it not be "on-watch", so that when the neo is BACKINGUP and zope crashes, the partition does not bang and reprocess continuously. This seems ok because the promise already asserts the service is running.

TODO:

  • Adapt the zope service so that it detects when neo is in BACKINGUP state and goes on standby until neo is RUNNING as part of normal execution of the service, instead of crashing. One envisioned way is to wrap the existing zope service in a wrapper program that will handle this additional functionality, catch zope crashing, poll neo state or otherwise be notified of neo state changes, and relaunch zope as needed. Such a program could be an ad-hoc wrapper script, or maybe a supervisord launching a zope and a kind of neo-listener service.

  • Standardize operations related to creating an ERP5 clone of a production ERP5: this implies creating a replica, "detaching" it (like taking it over without stopping the original primary), selectively start an admin zope while making sure activity zopes remain stopped, and change all that is required to prevent the clone from interfering with the actual production ERP5 before starting the remaining zopes. One way is simply to start only selected zope partitions via SlapOS. Another way could be that zope services may be started or stopped directly: this could also be achieved via a wrapper program such as supervisord, but would require it offers a remote interface. Or maybe the right thing to do would be to standardize a way to control network access of each partition via a firewall, so as to be able to selectively cut network access.


7. Miscellaneous fixes

Include some miscellaneous fixes for mariadb-with-IPv6 and gcc-version-for-Python2-SRs.

Edited Jun 12, 2025 by Xavier Thompson
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: feat/mariadb-replication
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7