Checkpointing and Persistent Workflows with Serializable Continuations

Save and resume computations across process boundaries using serializable continuations.

Overview

Turmeric v2 will support serializable continuations, enabling suspended computations to be marshalled to bytes, persisted to disk or sent over a network, then resumed in a fresh process. This enables:

Core Concept

Phase 18's delimited continuations reify the call stack as a heap-allocated closure chain. Each frame is a struct on the heap. Serialization traverses this chain, emitting a stable encoding, then reconstructs on load.

;; Capture a continuation
(def saved #f)
(def result
  (+ 1 (cloneable-shift [k]
         (set! saved k)
         10)))

;; Serialize to bytes
(def bytes (serialize saved))

;; Write to disk or send over network
(write-file "continuation.dat" bytes)

;; Later, in another process:
(def k (deserialize (read-file "continuation.dat")))
(resume k 42)  ; => 43

Serialization Design

Stable Symbol Table

Function pointers are not portable across builds. Each continuation frame stores:

(struct continuation-frame
  [fn-symbol : string  ; e.g., "mymodule.myfunction"
   args : (list any)   ; serializable arguments
   captures : (map symbol any)])  ; captured variables

On deserialization, the symbol is resolved to the current build's function pointer.

The Serializable Typeclass

Not all types can be serialized. Opt-in via the Serializable trait:

(defclass Serializable [a]
  (serialize [x : a] : bytes)
  (deserialize [b : bytes] : a))

;; Primitive implementations
(instance Serializable int64 ...)
(instance Serializable string ...)
(instance Serializable bool ...)

;; Derived implementations
(instance Serializable (Pair a b) [Serializable a, Serializable b] ...)

;; NOT serializable
(instance Serializable FileHandle
  ;; Custom handler: store file path, re-open on deserialize
  (serialize [fh] (file-handle-path fh))
  (deserialize [path] (open-file path)))

Resource Types

File handles, sockets, and other system resources can define custom marshal/unmarshal hooks:

(defclass Resource-Serializable [a]
  ;; Serialize to a stable representation
  (marshal [x : a] : resource-token)
  ;; Restore from token in new process
  (unmarshal [token : resource-token] : a))

(instance Resource-Serializable FileHandle
  (marshal [fh] (file-handle-path fh))
  (unmarshal [path] (open-file path)))

Ownership Model and Serialization

Serialized continuations produce a deep copy. Ownership is transferred; originals are invalidated:

(def r (ref 42))
(cloneable-shift [k]
  (serialize k))  ; Serialization deep-copies r
                  ; Original r is now inaccessible
(deserialize bytes)  ; Deserialize: new r created with value 42

This is safe because: - The original continuation is no longer reachable (it was consumed by shift). - The deserialized continuation has a fresh copy of captured state. - No aliasing between old and new process.

Example: Persistent Workflow

A multi-step business process that survives crashes:

(defn process-order [order-id]
  ;; Step 1: Validate order
  (def order (load-order order-id))
  (unless (valid-order? order)
    (throw (validation-error "Invalid order")))
  (checkpoint "order-validated" order)

  ;; Step 2: Charge payment (slow network call)
  (def charge-result (charge-payment order.payment-info))
  (checkpoint "payment-charged" charge-result)

  ;; Step 3: Fulfill order
  (def fulfillment (fulfill order charge-result))
  (checkpoint "order-fulfilled" fulfillment)

  fulfillment)

;; Checkpointing macro
(defmacro checkpoint [name value]
  `(cloneable-shift [k]
     ;; Save continuation to disk
     (def checkpoint-file (str "checkpoint-" ~name ".bin"))
     (write-file checkpoint-file (serialize k))
     ;; Resume immediately on first run
     (continue k ~value)))

;; On crash, user can resume from last checkpoint
(defn resume-from-checkpoint [name]
  (def checkpoint-file (str "checkpoint-" name ".bin"))
  (def k (deserialize (read-file checkpoint-file)))
  (resume k))

Example: Distributed Task Migration

Send a half-finished computation to another node:

;; Node A: long-running job, half done
(def job
  (cloneable-reset
    (fn []
      (def task1-result (run-task1))
      (def task2-result (run-task2 task1-result))
      (def task3-result (run-task3 task2-result))
      task3-result)))

;; Save state
(def bytes (serialize job))
(send-to-node-b bytes)

;; Node B: resume
(def job (deserialize (receive-bytes))
(def result (resume job))

Example: Web Continuations (Racket-style)

Serialize "what to do when form is submitted" as a URL token:

;; Initial page
(defn get-checkout [req]
  (cloneable-shift [k]
    ;; Save continuation to disk, return URL token
    (def token (save-continuation-to-db k))
    (render-page
      (form :action (str "/checkout-submit?token=" token)))))

;; Form submission handler
(defn post-checkout-submit [token req]
  ;; Load and resume continuation
  (def k (load-continuation-from-db token))
  (def response (resume k (parse-form-data req)))
  response)

Example: Checkpointing Long-Running Computation

Periodic snapshots for crash recovery:

(defn analyze-large-dataset [data]
  (defn checkpoint-every-n [n items]
    (let [processed []]
      (for-each-with-index items
        (fn [i item]
          (set! processed (conj processed (process item)))
          (when (= (mod (+ i 1) n) 0)
            ;; Checkpoint every n items
            (cloneable-shift [k]
              (write-file (str "checkpoint-" i ".bin")
                         (serialize k))
              (continue k)))))))

  (checkpoint-every-n 1000 data))

Reconstruction and Error Handling

Schema Versioning

Continuation frames carry schema version. Mismatches produce an error:

(def k (deserialize bytes))  ; May fail if:
                             ; - Function no longer exists
                             ; - Argument types changed
                             ; - Captured types are incompatible

Error handling:

(try-with
  (fn []
    (deserialize (read-file "checkpoint.bin")))
  (fn [e k]
    (match e
      (schema-mismatch _ old-version) ->
        (throw (error (str "Cannot resume: checkpoint uses version " old-version
                           " but current code is version " (current-version)))))))

Partial Reconstruction

If deserialization of a captured value fails, the whole continuation fails. To tolerate missing state:

;; Wrap potentially failing values in Option
(def opt-value
  (try
    (deserialize captured-value)
    (catch [e] (None))))

Performance Considerations

Serialization Overhead

Strategies

  1. Limit continuation depth — Design workflows to have shallow call stacks.
  2. Minimize captured state — Use identifiers (e.g., order ID) instead of entire objects.
  3. Lazy serialization — For large state, write once, reference by ID on resumption.
  4. Incremental checkpointing — Save deltas instead of full continuation.

API Summary

;; Serialize a continuation
(serialize cont : (cloneable-shift [k] k)) : bytes

;; Deserialize a continuation
(deserialize bytes : bytes) : (cloneable-shift [k] k)

;; Resume a continuation with a value
(resume k : (cloneable-shift [k] k) v : a) : a

;; Checkpoint macro (example)
(checkpoint name value)

;; Resource marshalling
(marshal resource : a) : resource-token
(unmarshal token : resource-token) : a

See Also