Skip to main content
Ungathered Thoughts

Automating commit messages better

I use Git[1] on the daily and in most contexts[2] I care a lot about the commit messages being meaningful.

I also have a small handful of repositories where commits are automated (automated data feeds; daily timesheets; my lab journal, etc).

In the case of these repos, I want to track the incoming changes, but do not want the effort of composing a message. There's no benefit to me in manually composing a history entry, either because I'm not present, or because a descriptive change wouldn't add to the record.

(Sometimes that's not true; when I backfilled hundreds of dates into old journal entries, I did write a commit message for the change. But not for when it's happening from cron.)

For a long time I used a git commit -m "journal update on ${HOSTNAME} @ ${DATE}" to automate; it gave me visibility on the origin and recency of the changes, and I could check the history for more information if necessary.

But I want more. I want to be able to look at a commit message and see something more like:

Changed: 2024/03/09.md, Presentations/DS2024/Recipes.md

- 2024/03/09.md:
  - Modified section "Tasks"
  - Added section "Git commit message"
- Presentations/DS2024/Recipes.md
  - Added section "Config validation"

That seems to match what Martin Fowler has referred to as Semantic Diff (like Martin, I ask that if you know a better name, please tell me).

I don't have that part yet. I expect it wouldn't be terrible to implement; in Markdown you'd be looking for the parent "element" (ie, heading), and maybe reporting back on the parent of that if you wanted it.

I see also that are some LLM-based tools which will generate a commit message for you based on staged changes, which is perhaps nice for some but I'll give a hard no on feeding my daily work notes to an LLM for the provider to train against.

But to improve on the lack of information in "journal update on ${HOSTNAME} @ ${DATE}" I decided to at least capture a high-level view. Here's what I came up with.

A Git prepare-message-hook with the following content:

#!/usr/bin/env bash

COMMIT_MSG_FILE=$1
COMMIT_SOURCE=$2
SHA1=$3

ADDED=$(git status --porcelain | grep '^A' | wc -l)
DELETED=$(git status --porcelain | grep '^D' | wc -l)
MODIFIED=$(git status --porcelain | grep '^M' | wc -l)
RENAMED=$(git status --porcelain | grep '^R' | wc -l)
OTHER=$(git status --porcelain | grep -Ev '^[ADMR]' | wc -l)
HOSTNAME=$(hostname)
SUBJECT="Files changed on $HOSTNAME: "
ORIG_SUBJECT=$SUBJECT

function push_subject() {
if [ "$SUBJECT" != "$ORIG_SUBJECT" ]; then
echo "$SUBJECT, $1"
else
echo "$SUBJECT $1"
fi
}

if [ "$ADDED" != "0" ]; then
SUBJECT=$( push_subject "$ADDED added" )
fi
if [ "$DELETED" != "0" ]; then
SUBJECT=$( push_subject "$DELETED removed" )
fi
if [ "$MODIFIED" != "0" ]; then
SUBJECT=$( push_subject "$MODIFIED modified" )
fi
if [ "$RENAMED" != "0" ]; then
SUBJECT=$( push_subject "$RENAMED renamed" )
fi
if [ "$OTHER" != "0" ]; then
SUBJECT=$( push_subject "$OTHER other" )
fi

echo "$SUBJECT" > "$COMMIT_MSG_FILE"
echo >> "$COMMIT_MSG_FILE"
git diff --cached --stat >> "$COMMIT_MSG_FILE"

Will generate commit messages like this:

commit 7f9cb0ff06afaa5db8733869ca5552c86aa7fc99 (HEAD -> main, origin/main, origin/HEAD)
Author: Chris Burgess <chris@giantrobot.co.nz>
Date:   Sat Mar 9 07:16:04 2024 +1300

    Files changed on thip:  44 added, 1 removed, 23 modified, 1 renamed
    
     2021/03/23/Some Notes.md          | 374 +++++++++++++++++++++
     2023/09/04.md                     |   2 +-
     2024/02/04.md                     | 260 +++++++++++++-
     2024/02/05.md                     | 121 +++++++
     ...

Or this:

commit 36a61c68c8b0031ad556c6be15ffe8399c2903a9 (HEAD -> main, origin/main, origin/HEAD)
Author: Chris Burgess <chris@giantrobot.co.nz>
Date:   Sat Mar 9 07:26:37 2024 +1300

    Files changed on thip:  1 added
    
     2024/03/09.md | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     1 file changed, 82 insertions(+)

It's not perfect. It doesn't handle git commit -m <msg>, for one thing, and I hope to explore further a diff that is capable of precis-ing a set of changes to Markdown files. But it'll improve the history I see when working with those repos where I'm not writing commit messages for each change.

I'll report back on whatever annoys me about this first quick implementation 😀


  1. If you use another SCM, please substitute yours for "Git" in this. I'm not saying "SCM" here, picking clarity over correctness. ↩︎

  2. On a WIP branch, I will push any and all manner of rubbish commit messages, and rebase later. ↩︎