In this first post I want to talk a bit about a program I have written during the course of setting this site up. What I wanted was
For the markup language, since I like writing email, I wanted something like Markdown, but the markup processor should
Yes, ma, I did consider some established software packages:
In summary, not only am I too lazy to learn new languages, I also don’t want to learn the intricacies of configuring the existing popular (and hence big) choices, and then I thought for some time that this could be an interesting program to write — or in the words of Tom St Denis:
- I am too lazy to figure out someone else’s API. I’d rather invent my own simpler API and use that.
- It was (still is) good coding practice.
And this is how bah
was born, the program which now generates this website. Its name of course abbreviates “a blog and homepage”. The Duden has an entry for “bah” (that links to its sibling “bäh”) and lists the following meanings:
If you are curious if bah
fulfills at least the first meaning of “bah”, do keep on reading.
bah
One of the criteria for my markup languages was Markdown-alikeness, because it is convenient to write, and another was being able to access and modify the parse tree of the document to setup custom filters for the document. I learned that those processors are scarce and double scarce if you’re lazy about learning new languages.
I considered for example AsciiDoc, which is what jemdoc is allegedly based on. The reference implementation in Python doesn’t have anyting resembling a parse tree in/out format, as far as I could see. An alternative implementation Asciidoctor, however, does, but it’s in Ruby and Ruby-only…
My savior was pandoc. It has a very feature-rich variant of Markdown that allows fenced code blocks with classes, images with attributes, definition lists, tables, inline and block mathematics and other cool things2. It handles UTF-8 and even turns ---
or "
into their typographically preferable — so-called “smart” — counterparts, as demonstrated by this sentence. More importantly, it can input and output a parse tree in its internal Haskelly format or JSON, and provides a --filter
option to install parse tree filter scripts which pandoc will insert into its processing pipeline at the appropriate point. This means that I can modify the document using external filters in any language I like. Perl happens to have excellent support for that in the Pandoc::Filter module. This is the basis for overcoming the insufficiencies of Jekyll and Hugo.
So I based bah
on pandoc
and Pandoc Markdown.
For some time I was stunned by the Perl 6 syntax highlighting situation that ruled out otherwise fit systems like the aforementioned Jekyll and Hugo. Indeed, even the Perl 6 Advent Calendar has to resort to posting a gist of an article to github and scraping the syntax highlighted code blocks from there back into the article.
Folk wisdom has it that »only perl can parse Perl« and that holds double for Perl 6. So you might ask: »Who even can highlight Perl 6 at all?« — Well, vim can, as can a bunch of other text editors that people use to write Perl 63. — »But who in their right mind uses vim as a syntax highlighting engine?« — The answer is Perl, in its infinite TIMTOWTDIty.
That’s right. There is a module called Text::VimColor on CPAN that allows you to call out to vim
with some text and a filetype and get a stream of your text with interlaced highlighting instructions back — or you can get straight HTML back which is what I’m using here. And I think that’s super cool. Couple this with the vim-perl6 syntax file and you got a capable highlighter for Perl 6 code snippets, and many, many other languages. Putting this behind a Pandoc::Filter
, I can turn every fenced code block in my source Markdown document into a highlighted HTML code block.
The Perl 6 syntax file is not perfect, but usually pretty close. See for yourself:
#|«
Return a Unicode clock character that approximately represents the time
component of the given DateTime. There is one character for every half-hour
of an analogue clock, 24 in total. They start at C<U+1F550> (E<0x1F550>).
The mapping from non-half-hours to half-hours is specified via the
C<round> parameter which defaults to C<Closest>.
»
sub unitime (DateTime:D() $dt, Round :$round? = Closest --> Str) is export {
my $half-hour = do given $round {
# Minute with second and millisecond as fraction
my $minute = $dt.minute + $dt.second / 60;
when Up { ceiling $minute / 30 }
when Down { floor $minute / 30 }
when Closest { round $minute / 30 }
}
my $hour = $dt.hour + $half-hour div 2;
$half-hour mod= 2;
$hour = ($hour - 1) mod 12 + 1; # 0100 to 1230
my $handle = $half-hour == 0 ?? ' OCLOCK' !! '-THIRTY';
uniparse "CLOCK FACE %ENGLISH{$hour}$handle"
}
I can highlight every language my vim installation is capable of, for example Gambas which (nearly?) nobody can handle, not even github. Since every Gambas programmer just uses the Gambas IDE, there is not much motivation to support its syntax elsewhere, even though it isn’t all that difficult. Well, now I have a source of motivation for finishing my vim-gambas syntax file. The result isn’t pretty yet, but I’ll keep working on it as I have time.
'' Sort this bucket. This is Mergesort + Insertionsort. Only the last
'' instance (maximum index) of a particular key survives.
Static Private Sub _Sort(Entries As _Entry[]) As _Entry[]
Dim aSorted As _Entry[]
Dim hEnt As _Entry
Dim iMid As Integer
If Entries.Count < MergesortLimit Then ' Insertionsort
aSorted = New _Entry[]
For Each hEnt In Entries
_Insert(aSorted, hEnt)
Next
Return aSorted
Endif
' Mergesort
iMid = Entries.Count / 2
Return _Merge(_Sort(Entries.Copy(0, iMid)), _Sort(Entries.Copy(iMid, Entries.Count - iMid)))
End
Pandoc Markdown recognizes “maths”, either inline between $
signs or as a display block between $$
signs. Using the other half of the week, I appreciate the consistency4.
However, the built-in math rendering in pandoc either calls out to external services, relies on Javascript, or embeds the typeset formulas as images — or it produces MathML which gets an honorary mention but isn’t portable. Pandoc::Filter
s come to the rescue again! What I do is intercept all the math blocks in the document and convert them on my own, using via Node.js.
The project prides itself, among other things, with
- Server side rendering: KaTeX produces the same output regardless of browser or environment, so you can pre-render expressions using Node.js and send them as plain HTML.
In my opinion, that pride is completely justified and I can barely contain my amazement. As you can see above, I can even turn the logo into a link, it scales when you zoom in or out of this page, because it is just HTML, and it still looks as nice as if it came straight out of pdflatex
. The formulas are statically generated once and for everyone on my server — the only external resources I embed are the required fonts and CSS files from ’s recommended CDN, but no client-side Javascript is involved. If I cared to host these resources myself, this site would be uMatrix-clean.
To flex, let me show you how nicely a result from my Master’s thesis can be reproduced using . For context, define the undirected simple graph for as follows: its vertices are the -faces of the -cube and two such faces are connected by an edge in the graph if and only if there is a -dimensional face which intersects and each in at least -dimensional faces. Then the following holds:
Theorem. The graph is transitive, hence regular. It is complete if and only if . The degree of any vertex can be calculated as follows:
where the sum extends over pairs which satisfy the feasibility and connectivity conditions
The basic operation of bah
is like you would imagine. It crawls a project directory and either copies files over or, if they’re Markdown files, converts them to HTML using the filters discussed above. Different parts of the site can have different Mustache templates holding the Markdown-converted content.
That holds for the static part of the site anyway5. There is also a blog part, on which you are right now. The blog is a bit more dynamic in that posts are scattered in a blog
subdirectory outside the static part of the site and are rendered into files whose path depends on the month, year and post title found in the header of the source file. They are also categorized into tags. The /blog URL and its descendants provide lists of posts which fall into their buckets. There is a global RSS feed /blog/feed for the blog, as well as for every tag, e.g. you will find this article in the perl
feed.
Yes, bah
even has “tooling”. The CSS file for the syntax highlighter is generated by a little script vim2css
from the peachpuff
color theme that comes with vim. I had to tweak it by hand a little, of course, but it’s better than starting from scratch. (You may notice that picking colors that go well together isn’t my forte.)
In the beginning, I mentioned that I wanted the live website to be generated out of a version control system. bah
itself is completely agnostic of what the project directory is. The wrapper bah-git
can be installed as a post-receive
hook into a git repository holding the site. It maintains a checked-out version of the repository and calls bah
whenever new commits come in. It also handles locking of the project, moving and chown
ing the build directory for the webserver and error recovery because bah
itself merrily ignores these aspects.
For local testing, I wrote bah-watch
which uses inotify on Linux to watch the site’s project directory for changes. On every change, it updates the build directory in a temporary location. The script has an embedded HTTP server which then serves the build directory. It would have been very annoying to write such a long post without this tool.
In summary, this site relies on pandoc
, vim
, and git
, all champions of their respective discipline, glued together by perl
, the champion of gluing things together. And, well, the webserver is nginx
; there’d be nothing here without it, too.
And with that we’re at the end. I didn’t release bah
yet and I’m not sure whether the world needs yet another static site generator. On the other hand, I’m quite proud of it, in that it does all the things I wanted it to do, and in my opinion it does them The Right Way, and thus better than all the alternatives. (If only releasing software wasn’t such a pain…)
If you have any comments or inquiries, please direct them to me via email, post@$this-domain.de, PGP fingerprint can be found in Home.
By which I mean not “best-effort” rendering to unicode and not rendering to static images because not only do they not look nice, they also don’t scale with the rest of the text.↩︎
Like footnotes or citations!↩︎
Update 1 Mar 2019: As I’ve learned meanwhile pygments could do it all along but didn’t list Perl 6 on their language list↩︎
“Static” refers to their location here, all content is of course à priori static.↩︎