Rubber meets the road

Alright, so I've put off creating the build script for long enough. So far, I have three markdown files (including this one) that look enough like posts anyhow, and I'm more than ready to at least attempt to publish something today.

Things I'm addressing now:

Decide what markdown dialect to use starting out. Since I don't have any fancy diagrams yet I can use markdeep which lets me defer decisions around the mermaid supporting alternative. So, I'll need to rename these files. For vanilla markdown documents which would render fine in any markdown dialect, it hardly matters which dialect is chosen for them.
- Even for markdeep there is a slight bit of streamlining I'd like to get up to with the build process. Although Markdeep streamlines things quite well by allowing it to be fully self-rendering just by putting one line of script tag at the bottom of an .md.html file, I certainly do not like the notion of having to fetch and replicate this code to insert as a footer in each post file I author, it is an unnecessary point of friction. I am going to name markdeep files .deep.md so that editors still see it as markdown while my automation will take care of inserting the markdeep code to make it work properly. Markdeep documents will appear hosted as .md.html URLs.
- Well, I already know that I will need to switch the simple files back later because I am going to be brutal about efficiency and speed, and markdeep, as cool as it is, is a 217K minified frontend bundle, and I will avoid pulling that in as a dependency unless I damn well need it. But, no change in my initial rollout strategy here for now.
For Markdeep to work in a way satisfying one of my requirements laid out before, all I have to do is bring the latest version of Markdeep into my project and include it when deploying static resources.
In the above bullet, I found a need to link to a specific part of a previous post. Markdeep has great features for linking to sections of a page, but I do find I want (like in that instance) higher precision when it comes to targeting a spot in a different page. In practice, text fragment links work excellently for this and what's great about it is that Safari finally supports that standard, so I can now confidently employ it. This is great, allowing me to avoid building something with code on the frontend or trying to create variously hidden elements with unique id's to achieve something similar (though inferior, as it provides autoscroll but lacks the text highlight) with a # in the link URL.
I realized that I will want to include images and videos at some point, and that this would easily lead to git repo bloat. One natural way to address this is with Git LFS, but it isn't entirely clear yet whether that would constrain things too much. From a quick look at GitHub documentation, using LFS would let me store files up to 2GB in size even in free plan repos. That would appear to conflict with only 1GB of LFS storage being provided for free. Anyway, even if it is feasible short- and medium-term to store content like that in LFS, bulkier data like this is ideally retained in the storage layer that is already in play that's separate from the git repo. Looks like I'll need to design a system around this in order to ensure that I keep the git repo compact while allowing the volume of served content to grow in a truly scalable way.
- I must explore this now, or at least pretty soon, since this is an important aspect of the scalability of this platform and will live in the core of the deployment mechanism.
- It seems practical for now to establish a special files/ directory, whether it's in the repo or outside of it probably doesn't matter (I could use a symlink to help get links to work); the gist of it is that everything under files/ is not stored by or known to GitHub (or whoever I'm hosting the repo with), but will be synchronized to the data source referenced by the public website's CDN system. Since I know that git's behavior with symlinks is generally to just store a text file with its target as the content, it looks like I can just make that symlink named files located in the root of this repo at some point in the future and work out later the particulars of how I want syncing to work in the local dev and CI/CD contexts.
I also got to thinking about a potential vulnerability in the approach of hosting static files on S3 for CloudFlare to consume. There were some recent murmurs of a possible billing attack via unauthorized incoming requests, but it seems like AWS has addressed that particular issue. The concern I came up with is a more basic form of this attack via successful GETs to it, since I do authorize the bucket publicly.
- The resolution to this issue is to restrict access to CloudFlare from the bucket policy. I actually already did this (as i initially set up this scheme over a year ago) and had just forgotten that I did. Cloudflare even has a changelog on their ip address list and in the past 3 years there have been a few changes but they actually evaluate to no change, and I already had the values from 2021.
- I did perform a test with curl and got the expected Access Denied, so at least I am not panicking right now.

Initial deployment

Now that I have a loose spec of sorts, as I have no large binary files I want to host yet, for the first baby step it looks like I can go live just by implementing the markdeep preprocessing capability in my build script:

#!/bin/bash

MARKDEEP_FOOTER='omitted due to code injection issues'
MARKDEEP_HEADER='<meta charset="utf-8"><link rel="stylesheet" href="/resources/slate.css">'

STAGING_DIR="${0%/*}/../stage"
# keeps a staging dir that will be the source for a s3 sync. This source can be used for local web previewing via `python3 -m http.server`.
rm -rf $STAGING_DIR
mkdir -p $STAGING_DIR
cp -r pages resources $STAGING_DIR/
mkdir -p $STAGING_DIR/blog

# Process top-level directories in the blog
for dir in "${0%/*}"/../blog/*/; do
    dir_name=$(basename "$dir")
    output_file="$STAGING_DIR/blog/${dir_name}.md.html"
    
    echo "$MARKDEEP_HEADER" > "$output_file"
    echo "${dir_name}" >> "$output_file"
    echo "===========" >> "$output_file"
    echo "" >> "$output_file"
    
    # Find and process all .deep.md files in the current directory
    find "$dir" -maxdepth 1 -type f -name "*.deep.md" -print0 | sort -z | while IFS= read -r -d '' file; do
        filename=$(basename "$file" .deep.md)
        # echo "## ${filename}" >> "$output_file"
        echo "" >> "$output_file"
        cat "$file" >> "$output_file"
        echo "" >> "$output_file"
    done
    
    echo "$MARKDEEP_FOOTER" >> "$output_file"
done

Only a little bit of experimentation got me here with the templates provided by the author of Markdeep. I then had to change my files quite a bit because there are some differences in whitespace handling in this markdown dialect, but there's nothing wild there. The free inclusion of a table of contents based on headings is welcomed and I decided to use folder structure to auto layout things:

The folder's name is set as the first level heading
I can use 2nd level headings as "titles" from individual markdown pages

... and so on.

It's a decent format for a blog. But I also know at some point I cannot just have an endless single page. I also have some slight changes I feel like I need to make because here on macOS there are a few pixels of horizontal scroll that the perfectionist in me will not accept. I think this needs a good bit of work to add the kind of flexibility that I will actually need, But it is also a good starting point.

Next up, I will re-evaluate the design from multiple angles and at some point sync to S3 will be implemented which will begin the go-live process!