Cloud Database Migration Aws to Gcp accompanied with PgAudit

Felipe Veloso
5 min readAug 30, 2022

In one of the most recent challenges that I have had in the concept of migrations, a case arises where the transactionality and consistency of data was affected by a migration from cloud to cloud, the plan seemed solid, but the restrictions of the legacy software prevented breaking the id-autoincrementals, therefore the risk of inconsistencies became a risk factor, given how interesting this is, I will write some of the strategies that served us as a safety net when migrating.

starting the adventure

A migration is always complex, especially with legacy software or systems that were not designed for distributed scaling or some other strategy.

The challenge itself was simple, moving from one cloud to another, some systems, networks and infrastructure. The task was highly studied, but not the consistency problems between changing databases with minimal downtime and transmitting at least 1 terabyte of information in less than 2 hours.

Now, delving deeper due to the nature of the systems to be migrated and that they depend on regional-level transactional databases ( Postgresql ), this is where the adventure begins in the fields of data, migration and achieving the objectives of an ACID database.

Specifically, ACID which is an acronym in English for Atomicity, Consistency, Isolation and Durability: Atomicity, Consistency, Isolation and Durability. for those systems.

Now the problems

There are several tools to generate ETL from one database to another, several have been tried, Airbyte, Google Dataflow, DMS (google and aws), but the rollback plans were an obligation, because it was not accepted to have information losses .

I will start with some approaches for this migration.

Diagrama AWS a GCP
  • Initially it was thought to create a CDC flow between both clouds, to maintain a constant flow of updates between both database engines, a CDC was generated through DMS from aws to GCP
  • The result of this was successful, it was possible to send information from AWS to GCP in less than 36 hours, after both databases were synchronized, the GCP update was almost in real time.
  • In order to test the migration for some clients, we tried to synchronize both databases in real time, therefore it would be called Bidirectional CDC.
  • After many tests, search for resources and some poc, this option was dismissed as “usable” mainly the problem was in the Auto-Incrementing fields.

Some of these tests were

  1. - Deploy CDC from GCP to AWS at the time any customer will operate on top of GCP.
  2. Use Debezium as a Bidirectional CDC operator.
  3. Certain paid web tools that promise to perform these services, none of which actually worked.

Leaving the exploration and go to the solution.

After many iterations, the company’s back end architect came up with an extension that presented a change in approach, and I went on to make a POC that would become the viable option in case of rollback in cases of failing the migration of cloud.

PostgreSQL Audit Extension

The PostgreSQL Audit Extension (or pgaudit) provides detailed session and/or object audit logging via the standard logging facility provided by PostgreSQL. The goal of PostgreSQL Audit to provide the tools needed to produce audit logs required to pass certain government, financial, or ISO certification audits.

For purposes of this the pgaudit project, the term “audit” refers to an official inspection of an individual’s or organization’s accounts, typically by an independent body.

The pgaudit project builds on the work done by members of the PostgreSQL Development Community in support of an open source audit capability for PostgreSQL. The source code associated with pgaudit is available under the PostgreSQL License.

This extension allowed us to audit the transactions carried out by the target database ( gcp ) therefore any operation that was carried out in gcp could be replicated later in our leading database ( aws ).

New strategy

The strategy with pgaudit allows us to store in the gcp logging tool, which left us a json of no less than 300 characters for each action (in any case, interactions with the database can be limited when auditing, just leave the write and update for example). But in any case, it became unwieldy, the fact of extracting with a query on logging limits our ability to do a rollback in a short time, since the data would have to be extracted case by case.

Basically the experience in gcp allowed us to solve our recent problem quickly and practically unattended. The support of the GCP sink tool, which allows us to send logs to another type of storage, allowed us to move data without further management.

Finally our solution architecture (and de facto solution) can be seen in the following image.

This flow allowed us to register/audit the events in a tabular way, and later extract the query made from the payload without major headaches, it is more to extract this to a csv or a sql script type file, which would be executed in the order of arrival (according to pub/sub) the commands applied in our migrated bbdd.

Final results

We did not get to use this path, since the migration was without relevant incidents at the database level and it was not necessary to perform a rollback.

The experience and the security provided by having this security network allowed us to solve problems related to systems with the certainty that we had the possibility of keeping our transactions consistent at any time.

To close

I invite you to experience “out of the box” solutions in my working life I have been in at least 8 migrations, 3 being large scale (migration and merger of 2 banks) given this I feel that the solution described here allows some freedom for the systems /clouds that present lock-in today.

--

--

Felipe Veloso

Training to be a Dakar Pilot - ML Engineer and Data Engineer