What powers elyclover.com?

This is a fairly straight forward website in that it’s a static hugo generated website. The site is hosted on Azure storage and pumped into your browser via Microsoft’s global CDN.

All traffic is encrypted by default.

The hosting platform’s environments like dev, stg, and the beloved prod are built with Pulumi IaC code written in Go. The code creates a unique Resource Group per environment with all necessary functionality to host the site including DNS records and in non-prod environments, a TLS certificate that’s auto-rotated by the Azure platform. Due to Azure limitations on root domain auto-TLS provisioning, the prod environment uses a regular Comodo-sourced certificate. They would prefer users migrate to the more expensive Front Door service, which provides auto-rotating TLS certificates on root domains (as of q4 2023).

All secrets encrypted at rest with SOPS and stored in Git or Pulumi Cloud (as state.) Pulumi generated secrets from Azure are shipped into GitHub’s Secrets provider and accessible per-environment e.g.: dev, stg. These secrets are used by CI/CD Actions invoked by a simple caller workflow from the hugo site’s repo. A PR opened in the hugo site’s repo will automatically trigger a build -> deploy to dev.elyclover.com

All secrets are auto-provisioned into their necessary locations this way with no user intervention required. A DevOps/SRE role will not need to touch keys, rotate them, or share them in a shared browser-based password vault with your team. The Pulumi IaC code can be used to rotate these credentials within the CI/CD system automatically. This also provides DR capabilities and eases Change Management concerns out of the box.

Another action named Release Please then sweeps all PRs with Conventional Commits parseable titles into Release PRs and auto-generates release notes summarizing what’s in this release. Each release PR is also assigned a unique semantic version like v2.1.2. This semantic version is based on the size and risk of changes using simple PR title prefixes like feat or fix. Every time an additional PR like this is merged to main, it is auto-added to the Release rollup and we re-trigger a build and deploy of the site to stg.elyclover.com. The purpose here is to see all individual PRs with their own agendas merged and co-existing in a single environment before release to prod.

Finally, when these release PRs are merged into main, a full GitHub Release is triggered along with a corresponding tag in git referencing the semantic version for this release. This triggers a build + deploy to elyclover.com which is the production environment in this case.

This GitOps flow is possible due to a small workflow that invokes reusable Actions I’ve written and released with an Open Source license. Here’s how they are invoked from the repo holding our hugo site code. Three Jobs run sequentially and handle the entire operation.

name: hugo-cicd

on:
  pull_request:
    types: [ synchronize, opened ]
  release:
    types:
      - released

defaults:
  run:
    shell: bash

concurrency: cicd-v2

jobs:
  # determine what kind of event triggered us, what framework we'll ultimately build with
  gen-metadata:
    runs-on: ubuntu-latest
    outputs:
      build-id: ${{ steps.build-metadata.outputs.artifact-id }}
      build-domain: ${{ steps.build-metadata.outputs.composite-domain }}
      env-target: ${{ steps.build-metadata.outputs.env-target }}
      hugo-version: ${{ steps.build-metadata.outputs.hugo-version }}
    steps:
      - name: generate build metadata
        id: build-metadata
        uses: kevholmes/hugo-azure-actions/.github/actions/generate-metadata@v1
        with:
          hugo-version: 0.119.0
          site-base-tld: elyclover.com
  # compile the hugo site and save it as an artifact, use metadata generated earlier for context
  build:
    runs-on: ubuntu-latest
    needs: gen-metadata
    environment:
      name: ${{ needs.gen-metadata.outputs.env-target }}
      url: ${{ needs.gen-metadata.outputs.build-domain }}
    steps:
      - name: checkout source
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: build hugo site
        uses: kevholmes/hugo-azure-actions/.github/actions/build@v1
        with:
          build-id: ${{ needs.gen-metadata.outputs.build-id }}
          build-domain: ${{ needs.gen-metadata.outputs.build-domain }}
          hugo-version: ${{ needs.gen-metadata.outputs.hugo-version }}
  # deploys to azure blob storage container, secrets/params are by Actions Environment
  # which is part of metadata gen process, clears CDN cache and clobbers all files
  # in the blob container that conflict with the new release
  deploy-azure:
    runs-on: ubuntu-latest
    needs: [gen-metadata, build]
    environment:
      name: ${{ needs.gen-metadata.outputs.env-target }}
      url: ${{ needs.gen-metadata.outputs.build-domain }}
    steps:
      - name: deploy hugo site to azure
        uses: kevholmes/hugo-azure-actions/.github/actions/deploy@v1
        with:
          build-id: ${{ needs.gen-metadata.outputs.build-id }}
          az-client-id: ${{ vars.CLIENT_ID }}
          az-client-secret: ${{ secrets.CLIENT_SECRET }}
          az-subscription-id: ${{ vars.SUBSCRIPTION_ID }}
          az-tenant-id: ${{ vars.TENANT_ID }}
          az-storage-acct: ${{ vars.AZ_STORAGE_ACCT }}
          az-cdn-profile-name: ${{ vars.AZ_CDN_PROFILE_NAME }}
          az-cdn-endpoint: ${{ vars.AZ_CDN_ENDPOINT }}
          az-resource-group: ${{ vars.AZ_RESOURCE_GROUP }}

The above workflow has a simple concurrency lock. It allows only one CI/CD workflow to execute at any time, no matter the target environment. It’s executed for Pull Requests and GitHub Release events (generated by Release Please PR merges into main.)

The overall goals of the three jobs are as follows:

Generate metadata for this build - determine what environment we are deploying to
Run the build and generate an artifact
Deploy the build to the target environment determined in step 1

The CI/CD actor’s auth is tightly scoped with POLP to only allow a few operations on the Azure Storage Account necessary for deployments such as read/write/delete and clearing of the CDN cache on Microsoft’s side.

Each environment has its own Resource Group and Service Principal for CI/CD pipelines.

The Actions repo where the reusable website CI/CD workflows live has some additional automation like Dependabot to ensure dependencies are kept up-to-date. The same goes for the repo holding our Pulumi IaC. It has Dependabot updating any Go and Actions dependencies for us on a weekly basis. I am using a newer Dependabot feature called groups that places our related dependency updates into a single PR to reduce the amount of merge -> dependabot rebase PR -> merge work a developer needs to undergo when keeping the project up-to-date.

I’ve configured it to group all minor and patch level updates (in regard to semantic versions) in a single PR, and leave the major version updates to their own unique Dependabot PRs.

dependabot group feature screenshot

I still have extra work on this project, including sorting out a more advanced concurrency scheme.

Another to-do item is GitOps automation tied to the Pulumi IaC repo. I want to tie a new PR to apply to only the dev env Azure Resource Group, a Release Please PR with rollups of all individual changes for staging, along with a Release event in GH to trigger an pulumi up or similar for prod. I have the pulumi up process working idempotently about 99.99% of the time over the last month. So, it seems ready for automation. There’s a Pulumi feature to deploy PRs to ephemeral environments that I could utilize and skip the static dev.elyclover.com setup and move to some semi-unique-id.elyclover.com (maybe using the PR #?) and then clean these ephemeral environments up every 48 hours to save on costs. This effort could also tie into the concurrency work since we could build and deploy multiple dev-level environments simultaneously.

I have found what appears to be a bug with the Azure native Storage and Azure legacy CDN provider interactions within Pulumi. This issue manifests as a CDN endpoint you can’t delete until the DNS CNAME records are gone - the providers have some issue with their combined DAG that doesn’t get the order right and delete the CNAME record before our CDN EP custom domain (at least that’s how it manifests.) This out-of-order issue throws an error and can break the automation when tearing environments down programmatically. Right now, I have to delete the CNAME record manually, refresh Pulumi’s state for that environment, and then continue with the automated destruction process via Pulumi.

If you have any questions or want to contribute, please reach out.

- Kevin