Awesome Babashka: Parse & produce HTML and SQLite

Babashka is a lightning-fast Clojure scripting tool with batteries included. It provided almost everything I needed to turn an AsciiDoctor document into a SQLite database and HTML in the format Dash - the offline documentation browser - requires to use it as a navigable, searchable "docset". While Babashka offers a lot out of the box, it can be further extended leveraging a number of available "pods" or extensions. This is a brief story of how I used Babashka to glue together Hickory, Selmer, and SQLite to make my html2dash.bb script.

First I add the pods and requires I need:

(pods/load-pod 'retrogradeorbit/bootleg "0.1.9") ; provides Hickory
(pods/load-pod 'org.babashka/go-sqlite3 "0.0.1")

(require
  '[babashka.fs :as fs]
  '[selmer.parser :as selmer] ; provided by Babashka
  '[pod.retrogradeorbit.bootleg.utils :as bootleg]
  '[pod.retrogradeorbit.hickory.select :as s]
  '[pod.retrogradeorbit.hickory.render :as hick.render]
  '[pod.babashka.go-sqlite3 :as sqlite])

Then I read in and parse the AsciiDoctor-generated HTML (Bootleg does not support AsciiDoctor (yet?)):

(def html (slurp "web-cheatsheet.html"))
(def hickory (bootleg/convert-to html :hickory))

Next I use Hickory to select the parts I need and turn parts back to HTML:

(->> hickory
  (s/select (s/and (s/tag :div) (s/class :sect2)))
  (map #(->> % :content (remove string?)))
  (map (fn [[heading-elm & body-elms]]
         [(-> heading-elm :content first)
          (->> body-elms (map hick.render/hickory-to-html) (str/join "\n")]))))

I do a few more of these transformations to end up with the cheatsheet-data data I need.

I have also a Selmer template to output what Dash requires:

...
{% for category in categories %}
<section class='category'>
  <h2 id='//dash_ref/Category/{{category.0}}/1'>{{category.0}}</h2>
  <table>
    {% for entry in category.1 %}
    <tr id='//dash_ref_{{category.0}}/Entry/{{entry.0|abbreviate:20}}/0'>
    <td class='description'>
    <div class='name'><p>{{entry.0}}</p></div>
    <div class='notes'><p>{{entry.1|safe}}</p></div>
    </td></tr>
    {% endfor %}
  </table>
</section>
{% endfor %}
...

At the end, I bind them all together, leveraging the fs utils to prepare the file structure:

(let [...]
  (when (fs/directory? "AsciiDoctor.docset")
    (fs/delete-tree "AsciiDoctor.docset"))
  (fs/create-dirs "AsciiDoctor.docset/Contents/Resources/Documents")
  (fs/copy-tree "resources/Contents" "AsciiDoctor.docset/Contents")

  (selmer/set-resource-path! (System/getProperty "user.dir"))
  (spit "AsciiDoctor.docset/Contents/Resources/Documents/index.html"
    (selmer/render-file "./cheatsheet.template.html"
      {:categories cheatsheet-data}))

  (printf "DocSet file `%s` written\n" html-file)

  (sqlite/execute! index-file
    ["create table searchIndex(id integer primary key, name TEXT, type TEXT, path TEXT)"])
  ;; etc...

  (printf "Index file `%s` written\n" index-file))

There is even a Babashka GitHub Action so I was able to make a GH Actions workflow that takes the input .adoc, runs it through AsciiDoctor and Babashka, and releases the docset .zip file.

Conclusion

Babashka is awesome, especially with the pods. Hickory and Selmer are neat for igesting and producing HTML.

You can also give a try to the AsciiDoctor cheat sheet for Dash and let me know what you think :)


Tags: clojure productivity babashka


Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob