Skip to main content
Ungathered Thoughts

Yakfilling dates

I have a journal / lab book markdown collection with many entry file paths by date (2015/01/ Only the more recent files also had inline date metadata.

I wanted to backfill missing day properties for these markdown files into the yaml frontmatter. Initially my notes were just text, and over time recorded additional metadata into the entries, so making the date records consistent adds context to metadata already in the files.

Tools such as MarkdownDB, DataView, or SSGs such as Eleventy and Hugo permit querying a markdown corpus for data. To see records for those dates, migrating day "data" from file path into yaml frontmatter made access more consistent.

Thinking lots about data preservation and the maintenance costs of running dynamically generating systems at the moment ...

Here's the bash script I came up with to sort this quickly and not prettily. I tested it out in a copy of the repo (I'm not stupid!) and was delighted with the results.

This really accounts for a pretty simple markdown structure only, making assumptions about the structure of the documents. I know what this specific corpus mess is and isn't and accept the T&C of any breakages.

#!/usr/bin/env bash


if [[ "$SOURCE" == "" ]] ; then
echo "Usage: <path>"
exit 1

find $SOURCE -name '*.md' | while read FILE ; do
DAY=$( echo $FILE | sed -n -e 's|^.*\([0-9]\{4\}\)/\([0-9]\{2\}\)/\([0-9]\{2\}\).*|\1-\2-\3|p' )
if [[ "$DAY" == "" ]] ; then
echo "File $FILE does not match date pattern"
# Use -Pzo to see matched output for testing.
if grep -Pzl "^---\n(.|\n)*\n---" "$FILE" > /dev/null
# echo "$FILE has frontmatter."
if ! grep -E -- '^day:' "$FILE" > /dev/null
echo "$FILE has no day entry, should be $DAY."
echo "Inserting: $PROPERTY"
sed -i "0,/^---/a $PROPERTY" "$FILE"
# else
# # echo "$FILE has day entry."
# grep -E -- '^day:' "$FILE"
echo "$FILE has no frontmatter, should be $DAY."
METADATA="---\nday: $DAY\n---"
echo "Inserting: $METADATA"
sed -i "1i $METADATA" "$FILE"

Usage: /path/to/journal. Because of the layout of my journal (folders with and without date-based naming) I targetted each year one by one, working only on a separate copy so I could review changes and merge via Git when happy.

I hope this script may be useful or interesting for you to build on. This script is No Maintenance Intended :)

It's a pretty crude approach, but it suffices nicely. I was happy to quickly see results like this:

Screenshot of git commit message showing backfilled dates in journal entries from years ago

A happy accident of this implementation was that some notes with paths like 2019/02/03/Some were also correctly dated. A win for UNIX-y simplicty I guess. I'd forgotten that at one point I tried such a layout briefly ... it didn't really work for me, so I'd dropped it.

Feeling grateful for the workshop on the Atom Journal plugin at OS//OS in 2015 which got me to start using date-based filenames!

Whew, 882 modified files!

Screenshot of the PR in Github

On review, I found one case that hadn't been handled and which wanted tidying. Where the original markdown file had an empty frontmatter block -


the result was a duplicate frontmatter entry would be introduced.

I used grep -riPzl -- "---\n---" <path> to locate seven affected files like that, and fixed it by hand-editing. All were from a similar date range ... I didn't bother handling that in the script above.