Skip to content

Provider Plugins

Declarative Cloudflare-Hosted Provider Plugins

Status: Proposal (CAS-3499) Date: 2026-05-17 Owner: Backend (Artificer) Supersedes: PLAN-unified-providers.md Phase 3, CAS-3494

1. Decision Summary

Casaconomy provider plugins become downloadable data specs hosted on Cloudflare and executed by one in-app interpreter on desktop and iOS.

Hard invariant:

  • Downloaded provider plugins are declarative data only.
  • No downloaded executable code.
  • No eval/runtime scripting.
  • No WASM/native plugin loading.

The existing config-driven pipeline (ProviderConfig transport + ProcessorConfig parse + generic processors) is promoted to the long-term architecture, not deprecated.

2. Goals And Non-Goals

Goals:

  • Add/update providers without app-store release.
  • Keep one provider model across desktop + iOS.
  • Preserve App Store policy safety boundary.
  • Make provider authoring compatible with recorder workflow.
  • Add strong trust + provenance controls because plugins parse financial data.

Non-goals:

  • Arbitrary scripting language for providers.
  • JS/WASM/native provider runtime.
  • Replacing current transport/parser model with hardcoded provider modules.
  • Dynamic execution outside the bounded interpreter.

3. Architecture

3.1 Runtime Components

  • Provider Registry Service (new): fetches plugin index + specs from Cloudflare, verifies signatures, validates schema, stores approved versions locally.
  • Provider Execution Engine (existing+hardening): interprets provider transport/fetch spec and processor mapping spec, then yields normalized transaction rows.
  • Provider Store (existing DB tables extended): stores installed plugin metadata, enabled/disabled state, pinning, provenance, and local override markers.
  • UI surfaces (new): plugin catalog, install/update/rollback, trust status, enable toggles.

3.2 Data Model

Plugin package is JSON (or compact JSON+detached signature), versioned:

  • plugin_id: stable logical id (seb.se style namespace)
  • version: semantic-ish provider version (2026.05.17.1 acceptable)
  • provider_config: declarative transport spec
  • processor_config: declarative parse/transform spec
  • capabilities: declared feature flags (pagination, oauth, csv, json, etc.)
  • constraints: engine version range + platform compatibility
  • redaction_hints: declarative map of sensitive fields so masking/redaction remains enforced for downloaded specs
  • integrity: hash of canonical payload
  • signature: detached signature over canonical payload
  • published_at, publisher, changelog

No field in schema may contain executable snippets. Any existing free-form script surfaces are removed/locked.

3.3 Storage And Resolution

Resolution order at runtime:

  1. Local pinned plugin version (if pinned)
  2. Latest locally installed verified version
  3. Bundled fallback provider templates (read-only)

Provider execution only runs fully validated + signature-verified specs.

3.4 Schema-Level Safety Constraints

Safety constraints that apply at plugin-schema validation time (not deferred to runtime hardening):

  • Regex pattern length limits for FetchValue::Prefetch::pattern and processor Regex.pattern.
  • Regex feature restrictions to avoid catastrophic backtracking patterns.
  • Maximum replacement length for processor Regex.replacement.
  • redaction_hints required when fields contain account numbers, personal identifiers, auth tokens, or provider session identifiers.
  • Reject plugin activation if any schema safety constraint fails.

3.5 Provider Expression Model (Existing Engine, No New Code)

The central design question (“how does pure data express bank-specific extraction without per-provider code?”) is already answered by the current engine:

  • preload.js is already a generic parameterized extractor that reads window.providerConfig and adapts behavior from config fields (readySelector, autoClickSelectors, dataArrayKey, parentFields, dateField, idField, etc.).
  • ProviderConfig is the transport DSL: typed fetch/lookup expressions (Prefetch { method, url, pattern }, Template, Static, Initial) with data-only parameters.
  • ProcessorConfig is the parse DSL: typed transforms and mapping/filter vocabulary (ParseDate, ToNumber, Regex, field mappings, JSON path, excel sheet/header selectors, Equals/Contains/IsEmpty conditions).

Design consequence:

  • A downloadable provider spec is a ProviderConfig + ProcessorConfig bundle interpreted by already-bundled engines.
  • Adding a new provider requires zero new preload.js code and zero new Rust execution-engine code.
  • Downloaded plugins remain data-only and stay inside the iOS-safe execution boundary.

Engine-hardening audit target:

  • Audit all current provider configs to confirm there is no free-form script/eval surface; reject future schema additions that introduce one.

4. Cloudflare Distribution Model

4.1 Hosting Layout

Cloudflare R2 + CDN (or Workers static assets):

  • plugins/index.json signed catalog
  • plugins/<plugin_id>/<version>/plugin.json
  • plugins/<plugin_id>/<version>/plugin.sig
  • optional metadata blobs (CHANGELOG.md, diagnostics)

index.json contains current channel pointers (stable, maybe beta later), minimum app engine version, revoked versions.

4.2 Update Protocol

  • App fetches signed index.json on schedule + manual refresh.
  • For each enabled plugin, app compares installed version vs catalog target.
  • Download candidate spec, verify hash + signature + schema.
  • Stage as pending until user approves update policy (auto/manual depends on setting).
  • Promote to active only after successful dry-run sanity checks.

4.3 Offline Behavior

If network/catalog unavailable:

  • Keep using last verified installed versions.
  • Never disable an already working installed provider solely due to fetch outage.
  • Surface stale-catalog warning in UI.

5. Security And Trust Model

5.1 Threats

  • Tampered plugin payload in transit/storage
  • Malicious or compromised publisher key
  • Unsafe parser config causing data exfiltration or overreach
  • Regression update breaking parse correctness silently

5.2 Controls

  • Detached signatures (Ed25519 recommended) on catalog and every plugin payload.
  • Trust root key(s) pinned in app bundle; support key rotation manifest.
  • Strict JSON schema validation with deny-by-default unknown executable fields.
  • Domain allowlist constraints in ProviderConfig enforced by engine.
  • Bounded transform language only (existing typed transforms), no code interpolation.
  • Size/time guards: response size cap, parse row cap, mapping recursion cap.
  • Provenance ledger in DB: who published, hash, installed-at, activated-at.
  • Revocation list in signed catalog; revoked versions are blocked from activation.

5.3 User Trust UX

Every plugin shows:

  • publisher identity
  • signature status
  • publish date
  • permissions/capabilities summary
  • whether it is official-household or imported-local

Imported local configs remain supported but explicitly marked untrusted until approved.

5.4 Redaction And Telemetry Parity

Downloaded specs must not weaken current data-protection guarantees:

  • Extracted financial data follows existing redaction/masking guarantees (parity with current redact.ts behavior) before any bug-report, telemetry, diagnostics, or review-snapshot surface.
  • redaction_hints is part of plugin metadata and is validated before activation.
  • Engine diagnostics must default to masked output for sensitive fields.

6. Generic Scraper Engine Design

Per regent direction: “think really hard how to do this — the ENGINE will get really complex.”

The engine splits into two distinct parts:

Scraper (this section — the hard part): Navigate to the right authenticated page state, execute the right fetch or in-page extraction, deliver raw bytes/JSON to Rust.

Mapper (§3.5 — the easy part): Take raw bytes/JSON from the scraper, apply ProcessorConfig field mappings and transforms, produce Vec<TransactionRow>.

A downloadable plugin is a data bundle configuring both halves. The scraper config lives in ProviderConfig; the mapper config lives in ProcessorConfig. The bank-specific knowledge is entirely in the data. Neither half downloads or executes new code.

6.1 Extraction Strategy Taxonomy

The fetch_url field is the strategy selector. The bundled engine dispatches on it:

Strategyfetch_url patternHow data is obtained
http-fetchhttps://... real URLHTTP GET or POST within the authenticated webview session. Session cookies are present; response bytes forwarded to Rust + ProcessorConfig. Current example: AMEX (/api/servicing/v1/financials/documents?file_format=excel).
nextjs-rscextract://nextjs-rscExtracts from the RSC flight payload (self.__next_f global). Handles Next.js 13+ App Router banks. Retries up to 20× at 500 ms intervals to cover async streaming/hydration. Current example: SEB/Spendwise portal.
nextjs-dataextract://nextjs-dataExtracts from the __NEXT_DATA__ JSON script tag. Handles Next.js Pages Router banks (the older, still-common variant).
script-jsonextract://script-jsonScans all inline <script> tags for embedded JSON matching a declared dataArrayKey. Handles SPAs that inline transaction data in script blocks.
(future v1.1) xhr-captureextract://xhr-captureIntercepts the XHR the bank’s own page issues during normal load, forwards the matching response. Requires new bundled extractor entry. See §6.7 Gap 1.
(future v1.2) dom-tableextract://dom-tableExtracts rows from an HTML <table> matching a declared CSS selector and column index mapping. See §6.7 Gap 2.

The strategy name is a string constant in the config. Adding a new strategy means adding one entry to the bundled extractor registry in preload.js — a new engine capability, not a per-provider script. Plugins declare their required engine version in constraints.engine_version; the app will not activate a plugin requiring a strategy the installed engine does not support.

6.2 Auth and Navigation Layer

Before the main fetch or extract, the scraper handles page navigation and auth completion. All auth/navigation config is declarative — no provider code.

Config keyLocationPurpose
readySelectorfetch_body staticCSS selector: wait until this element appears in DOM before triggering the fetch. Handles SPA hydration. Timeout: 15 s, then overlay hides and fetch proceeds anyway.
autoClickSelectorsfetch_body staticComma-separated CSS selectors: click matching elements as they appear. Handles BankID initiation buttons, cookie banners, “continue” steps. Per-selector timeout: 1.5 s.
overlayTextfetch_body staticUser-visible overlay text while the engine navigates (default: “Loading…”).

For http-fetch providers, the webview navigates to login_url, then landing_page_url. The readySelector / autoClickSelectors can run on the landing page before the final fetch_url request fires.

6.3 Prefetch Step Model

Some banks require extracting a CSRF token, session ID, or account key from one endpoint before the main data fetch. FetchValue::Prefetch models this:

"fetch_query": {
"account_key": { "prefetch": { "method": "GET", "url": "https://bank/api/session", "pattern": "\"accountKey\":\"([^\"]+)\"" } }
}
  • The webview issues the prefetch request within its authenticated session.
  • The pattern (a regex) extracts a named value from the response body.
  • The extracted value is stored in window.extractedValues[key].
  • In the main fetch, a body/query field left as an empty string is substituted with the matching extracted value.
  • Multiple prefetch steps execute in parallel before the main fetch fires.

Prefetch is data-only. The pattern is a regex string, not code. Length and construct constraints apply (§3.4).

6.4 Extractor Configuration Vocabulary

When using extract:// strategies, the fetch_body carries extractor configuration as static values alongside the navigation config:

KeyDefaultDescription
dataArrayKey"transactions"JSON key of the target data array within the extracted content.
parentFields""Comma-separated field names to copy from a parent object (e.g., card group) into each extracted row. Example: "nameOnCard,maskedCardNumber".
wrapperFormat"cardGroups"Output wrapper structure: "cardGroups" (nested groups → transaction groups → transactions) or "flat" (direct array).
dateField""Field containing dates to normalize to YYYY-MM-DD (ISO truncation).
idField""Numeric ID field to convert to string (avoids JS integer precision loss for large IDs).

6.5 Mapper Layer (ProcessorConfig — the Easy Part)

Once the scraper delivers raw bytes/JSON to Rust, ProcessorConfig takes over. This is conceptually solved — the bounded typed vocabulary already exists:

  • Formats: Json, Excel, Csv
  • Transforms: ParseDate { format, timezone }, ToNumber, Regex { pattern, replacement }
  • Filters: Equals { field, value }, Contains { field, value }, IsEmpty { field }
  • Selectors: json_path, excel_sheet, excel_header_rows, field_mappings

Each FieldMapping declares target_field (destination in TransactionRow), source_field (column index for Excel, JSON path for JSON), default_value, and optional transform. No code; entirely data.

6.6 Failure Handling and Robustness

Failure handling is bundled in the engine, not per-provider:

  • Async hydration delays: nextjs-rsc retries up to 20 times at 500 ms each (10 s total) before failing. This covers Next.js RSC streaming. Constant in the bundle; not configurable per plugin in v1.
  • Page readiness timeout: readySelector times out at 15 s; overlay hides and fetch proceeds. Prevents indefinite hang on slow-loading banks.
  • Empty extraction: Engine emits typed diagnostics (“why row dropped”, mapping misses, transform failures — Phase B hardening). Surfaces error to UI rather than silently succeeding with zero rows.
  • HTTP errors: Non-2xx responses from http-fetch or prefetch steps produce typed errors forwarded to Rust and surfaced to the UI.
  • Auth redirect: If the bank’s landing page redirects to login mid-session, the webview shows the login page; the user completes auth. The readySelector waits for landing-page readiness after the redirect resolves. No special handling needed.
  • Plugin-declared timeouts: constraints.scraper_timeout_ms is reserved in the schema for a future engine version to support overrides. Not in v1.

6.7 Honest Gap Assessment

The engine covers a well-defined set of real-world bank patterns. The gaps below are explicit follow-up engine capability work — not downloadable code, but new entries in the bundled extractor registry. Each has a severity estimate and requires a dedicated engine CAS before Phase C rollout can target banks that need it.

Covered by engine v1.0:

  • REST/JSON API accessible via session-cookie auth (AMEX pattern)
  • Next.js App Router RSC payload (SEB/Spendwise pattern)
  • Next.js Pages Router __NEXT_DATA__
  • Generic inline script JSON
  • BankID / single-button auth flows via autoClickSelectors
  • CSRF / auth-token prefetch via FetchValue::Prefetch
  • Wait-for-hydration via readySelector
  • Excel, CSV, JSON file downloads from API endpoint

Gap 1 — XHR intercept/capture (severity: HIGH)

Some banks serve transaction data only through XHR requests the bank’s own page makes during normal load. The window.providerConfig recorder already captures these (it wraps fetch and XMLHttpRequest in recording mode). The gap is that there is no extract://xhr-capture replay strategy: the engine has no way to intercept a specific outgoing XHR at load time and forward its response, because the XHR is authenticated with runtime state that exists only during the page’s own JS execution.

Solution design: add extract://xhr-capture extractor that installs a fetch/XHR wrapper at document_start, matches outgoing requests against a declared URL pattern, and forwards the first matching response to Rust. Config: urlPattern (regex), method (GET/POST). This requires a new extractor function in preload.js — a bundled capability addition, not downloadable code.

Action: file engine capability CAS for xhr-capture as part of Phase B hardening.

Gap 2 — HTML DOM table scraping (severity: MEDIUM)

Legacy bank portals rendering transactions as HTML <table> elements with no embedded JSON or API endpoint. No extractor handles this today.

Solution design: add extract://dom-table extractor with declared config: tableSelector (CSS selector), headerRow (boolean), columnMappings (column index → named field). Returns JSON array of row objects for ProcessorConfig to transform.

Action: file engine capability CAS for dom-table.

Gap 3 — Multi-fetch pagination (severity: MEDIUM)

Banks returning page 1 of N where N is only known from the first response (a response header or body field). Current engine issues one fetch per sync.

Solution design: add a pagination block to ProviderConfig: totalPagesField (JSON path into first response), pageUrlTemplate (URL with {page} placeholder), maxPages (safety cap). Engine runs the page loop internally, merges results before handing off to ProcessorConfig.

Action: file engine capability CAS for paginated fetch.

Gap 4 — Conditional multi-step navigation (severity: LOW-MEDIUM)

Banks requiring a conditional navigation sequence (select account → pick date range → confirm → download). autoClickSelectors handles simple linear click sequences; conditional state machines are not expressible.

Assessment: most Swedish target banks fall into the readySelector + autoClickSelectors pattern or have accessible API endpoints. Defer until a specific bank pattern requires it.

Action: document as known limitation; flag during recorder-based provider authoring.

Gap 5 — OAuth 2.0 authorization code flow (severity: LOW)

Banks requiring a proper OAuth dance where the app intercepts the redirect callback and exchanges codes. Most current targets use session-cookie auth within the webview.

Assessment: deferred. A separate design is warranted before attempting implementation.

6.8 Engine Version Contract

The constraints.engine_version field in the plugin schema creates a capability contract between the plugin and the installed app:

Engine versionStrategies added
v1.0http-fetch, nextjs-rsc, nextjs-data, script-json
v1.1xhr-capture
v1.2dom-table, paginated

The Registry Service rejects activation of a plugin declaring engine_version: ">=1.1" on an app with engine v1.0. The app surfaces a “plugin requires app update” message in the catalog UX.

Engine versioning is separate from app version: the engine capability level is a named constant in the bundle, incremented only when new extractor strategies are added.

6.9 Hardening Checklist (Phase B)

Before rollout, harden the engine:

  • Remove any remaining script/eval path from provider execution flow.
  • Constrain URL templating to declared FetchValue placeholders only.
  • Validate all variable substitutions against declared schema; reject unknown keys.
  • Add deterministic per-row diagnostics: why a row was dropped, which field mapping missed, which transform failed.
  • Add canonical test vectors per plugin version (used in dry-run gate).
  • Apply regex safety constraints (§3.4) to all existing provider configs before migration.

7. Recorder Pipeline Reconciliation

PLAN-provider-recorder.md becomes the authoring pipeline for this model:

  1. Record provider traffic (desktop tooling).
  2. Generate draft ProviderConfig + ProcessorConfig.
  3. Iterate with dry-run parse/live harness.
  4. Publish signed plugin artifact to Cloudflare catalog.
  5. App downloads + verifies + offers activation.

Recorder output format should be extended with publish-ready metadata (plugin id, changelog, constraints), but still emits declarative config payloads.

8. Superseded And Preserved Parts

Superseded from PLAN-unified-providers.md:

  • Phase 3 premise that config-driven path should be removed/deprecated.
  • End-state of code-first compiled provider modules as the canonical model.

Preserved:

  • Separation of transport orchestration in services from parse execution.
  • Existing ProviderConfig/ProcessorConfig semantics as base contract.
  • Generic processor path as the single runtime engine.

9. Rollout Plan

Phase A: Spec + Trust Foundations

  • Define plugin schema v1 and catalog schema v1.
  • Encode regex safety constraints directly in schema v1 (length bounds + restricted constructs).
  • Add redaction_hints to schema v1 and enforce validation requirements.
  • Implement signature verification + trust root loading.
  • Add local DB tables/columns for plugin provenance and state.
  • Retire/rewrite PLAN-unified-providers.md to remove the superseded code-first Phase 3 direction.

Phase B: Execution Hardening

  • Remove any dynamic script surfaces.
  • Add strict validators and runtime guards.
  • Add deterministic diagnostics and test vectors.

Phase C: Cloudflare Distribution

  • Publish signed catalog + artifacts.
  • Implement fetch/stage/activate lifecycle.
  • Add revocation + key-rotation support.

Phase D: UX + Policy

  • Catalog UI and enable/disable/update flows.
  • Trust surfaces and warnings.
  • Update policy controls (manual/auto).

Phase E: Recorder Integration

  • Recorder outputs publish-ready plugin manifests.
  • Publish toolchain signs and pushes to Cloudflare.

Phase F: iOS Provider Scraping Port (sequenced follow-on — NOT concurrent with A–E)

Gate: desktop Phases A–E complete, desktop test harness green, and regent has verified a working desktop round-trip with a real provider.

Scope:

  • iOS consumer side only: bundled generic scraper (preload.js + ProviderConfig + ProcessorConfig) running under iOS WKWebView, provider catalog/enable UI on mobile, auth/login flow on iOS including BankID app-handoff.
  • Recorder and authoring tooling stay desktop-only; iOS consumes specs, does not record.
  • iOS no-downloaded-code invariant: already satisfied — the declarative model downloads data specs only; the bundled scraper interprets them. This holds identically on the iOS execution path; no change to the invariant.
  • iOS test architecture leg: see §11.8.

Phase F must NOT be broken into implementation CASes until the desktop gate above is satisfied.

10. Open Decisions For Regent Sign-Off

Recommendations are included (Saga + MoC). Regent confirms or overrides from phone.

DecisionRecommendationRationale
Signing authority modelSingle household key for v1; delegated per-provider keys deferredConservative start; rotation manifest in bundle from day one so delegation can be added without bundle change
Auto-update default for stable channelManual approve; opt-in to autoConsistent with risk-mitigation.md conservative ethos; users of financial data tooling should see what changed
Dry-run gate before activationMandatory strictA plugin that fails dry-run never activates; no soft-warning bypass in v1
Revocation of already-enabled versionsGrace window + loud warning; hard disable only if confirmed-maliciousAvoids breaking active users on a flag error; kill-switch is still fast (config push, hours not days)
Beta channelDeferNo second distribution tier in v1; stable channel only

11. Automated Test Architecture (CF-Hosted Fake Provider Harness)

Regent mandate (2026-05-18): testing is a first-class design concern because the system is by nature painful to test manually (real bank auth, BankID, no determinism). The test architecture must exercise the real trust+distribution chain — not a local shortcut — and run in CI without a Mac, a human, or a real bank.

11.1 Design Principles

  • Real distribution chain, fake data. Fake providers are distributed via the same signed-catalog → download → hash+signature verify → enable path as production plugins. No test bypasses the integrity or signature check.
  • Auth exercised, not skipped. Fake bank servers serve canned login pages with stub auth flows (the same autoClickSelectors/readySelector config the real scraper drives). BankID is replaced by a fake button; the navigation code path is identical.
  • Gap↔fake 1:1 mapping. One fake provider per scraper extraction strategy. A green test proves “the generic scraper handles this pattern via config alone.” A red test marks a gap not yet implemented.
  • RED→GREEN is the acceptance gate. Each Phase A–E implementation CAS is not done until its corresponding fake-provider end-to-end test is green in CI.
  • Mapper tests are independent. ProcessorConfig fixture tests (raw bytes → Vec<TransactionRow>) run in the standard Rust test harness — no webview, fast, pure unit.

11.2 Fake Provider Infrastructure

Fake CF distribution layer:

A second CF Workers namespace (plugins-test.casaconomy.workers.dev or equivalent) hosts:

  • index.json — signed fake catalog pointing at fake plugin versions
  • plugins/<fake_id>/<version>/plugin.json — real signed plugin specs (using a test signing key)
  • plugins/<fake_id>/<version>/plugin.sig — detached signatures

App in test mode reads PLUGIN_CATALOG_URL from env to point at the fake catalog. The trust chain (download, hash, signature, schema validation) runs identically to production. The test signing key is pinned in the test bundle separately from the production trust root.

Fake bank server (CF Worker per fake provider):

Each fake bank Worker serves:

  • GET /login — stub login page HTML with a fake auth button matching the provider’s autoClickSelectors
  • GET /landing — landing page HTML containing canned transaction data in the extraction format the strategy reads
  • GET /api/data (for http-fetch fakes) — canned Excel/JSON bytes response

The fake bank Worker is stateless and deterministic: same request always returns the same canned response.

11.3 Gap↔Fake Mapping Table

Each row is one fake provider. “Phase gate” is the CI gate that must turn GREEN for the corresponding implementation phase to close.

Fake provider IDExtraction strategyAuth simulationFixture dataMirrorsPhase gate
fake-amex.testhttp-fetchLogin redirect + readySelectorCanned Excel bytes (3 rows)AMEXPhase B
fake-seb.testextract://nextjs-rscLogin redirect + autoClickSelectors (fake BankID btn) + readySelectorLanding page with self.__next_f RSC payloadSEB/SpendwisePhase B
fake-nextjs-pages.testextract://nextjs-dataLogin redirect + readySelectorLanding page with __NEXT_DATA__ script tag(new pattern)Phase B
fake-script-json.testextract://script-jsonLogin redirect + readySelectorLanding page with inline <script> JSON block(new pattern)Phase B
fake-xhr-capture.testextract://xhr-captureLogin redirect + readySelectorLanding page that issues an XHR to /api/data on loadGap 1 (§6.7)Gap 1 capability CAS
fake-dom-table.testextract://dom-tableLogin redirect + readySelectorLanding page with <table> transaction rowsGap 2 (§6.7)Gap 2 capability CAS
fake-paginated.testpaginated http-fetchLogin redirect + readySelectorCanned 3-page paginated API (?page=1,2,3)Gap 3 (§6.7)Gap 3 capability CAS

Gap fakes start RED at Phase B. They turn GREEN when the corresponding engine capability CAS ships. That transition is the proof that “the generic scraper covers this bank pattern via config alone, zero new code.”

11.4 Full End-to-End Test Sequence

For each fake provider in §11.3, the automated test executes:

  1. Catalog fetch: fetch and verify the signed index.json from the fake CF catalog Worker
  2. Plugin download: download plugin.json + plugin.sig for the target fake provider version
  3. Integrity + signature verify: run the production verification path (hash, Ed25519 sig against test trust root)
  4. Schema validate: run the production schema validator against the plugin payload
  5. Enable plugin: install and activate via the production Provider Registry Service
  6. Scraper run: open a headless WebView pointing at fake-<id>/login; let the scraper drive nav + auth + fetch/extract via the same bundled preload.js code path as a real bank
  7. Assert output: compare extracted Vec<TransactionRow> against the expected fixture (deterministic per fake)

Steps 1–5 test the trust/distribution chain. Steps 6–7 test the scraper engine.

11.5 CI Environment

Tests run on a Linux GitHub Actions runner:

  • Headless WebView via WebKitGTK + Xvfb (the same WebKit engine Tauri uses on Linux)
  • Fake CF Workers deployed to test namespace before the CI run (or mocked via a local HTTP server for speed; the CF deployment gate runs as a separate smoke step)
  • No Mac-specific dependencies; no real bank connectivity
  • PLUGIN_CATALOG_URL, TRUST_ROOT_KEY (test key), and FAKE_BANK_BASE_URL injected via CI env vars

11.6 Mapper Fixture Tests (Separate Track)

The mapper side (ProcessorConfig) has independent fixture-based unit tests that don’t require a WebView:

  • Input: raw bytes (synthetic Excel/JSON/CSV files matching the real schemas)
  • Run: ProcessorConfig field-mapping pipeline + transforms
  • Assert: Vec<TransactionRow> matches expected baseline
  • Parity test: synthetic AMEX Excel fixture → AMEX ProcessorConfig → assert expected rows

Run via cargo test, fast, no infrastructure dependency.

11.7 Relationship to Provider Recorder

Provider RecorderFake Harness
PurposeAuthor real provider specs from live trafficTest the scraper/mapper against deterministic fixtures in CI
When usedDuring provider development (desktop, human-in-loop)Every PR
Data sourceReal bank trafficCanned/synthetic fixture data
OutputProviderConfig + ProcessorConfig specTest pass/fail + gap coverage signal

Recorder output seeds fake providers: capture real traffic once, adapt for the stub server, commit to the fake-providers CF deployment. They share the schema; they don’t share infrastructure.

11.8 iOS Test Leg (Phase F)

The gap↔fake test set (§11.3) runs on both legs: desktop (§11.5) and iOS (this section). The iOS leg replaces the “manual device nightmare” — the regent’s core concern — with automated iOS Simulator runs using the same fake CF infrastructure.

CI environment for iOS leg:

  • macOS GitHub Actions runner (required — iOS builds and Simulator require Xcode/macOS)
  • iOS Simulator (not physical device) via xcodebuild test
  • Same fake CF Worker endpoints used by the desktop leg — no iOS-specific server changes
  • Same test signing key + fake catalog URL injected via CI env vars
  • Gate: iOS leg tests are added in Phase F and must be green before Phase F implementation CASes can close

What the iOS leg exercises vs. desktop leg:

The end-to-end sequence (§11.4 steps 1–7) runs identically. What differs:

  • Step 6 uses iOS WKWebView instead of WebKitGTK; all extractor strategies must behave identically
  • Fake BankID auth uses the same stub button + autoClickSelectors approach for the in-webview flow. Deep-link handoff (BankID as a separate app) is NOT tested in CI — that remains manual for now; the CI leg proves the engine code path works

iOS-specific design risks for Phase F (explicit, not glossed):

  1. BankID iOS deep-link handoff (risk: HIGH): On iOS, real BankID auth redirects the user to the BankID.app and back to the calling app via a custom URL scheme. This is fundamentally different from the desktop flow (where BankID runs in the webview). The declarative autoClickSelectors config covers the in-webview tap, but the OS-level app redirect requires a registered URL scheme handler in the Tauri iOS app and a resume flow. Phase F must design this explicitly before implementation.

  2. WKWebView Tauri IPC on iOS (risk: MEDIUM): The window.__TAURI_INTERNALS__.invoke bridge is implemented differently on iOS (uses WKScriptMessageHandler rather than the desktop IPC mechanism). handle_base64_data and related commands must be verified under the iOS bridge. Known to work in current Tauri iOS builds for other features — but must be confirmed for the scraper data path.

  3. xhr-capture strategy on iOS WKWebView (risk: MEDIUM): The XHR/fetch wrapping in the xhr-capture extractor uses XMLHttpRequest.prototype.open patching. This is standard JS and should work in WKWebView on iOS, but App Store review has historically scrutinised JavaScript injection more tightly on iOS. Must verify the approach passes review and that WKWebView on iOS does not restrict the prototype patching.

  4. Cookie persistence and session scope (risk: LOW): WKWebView on iOS has its own separate cookie store (not shared with Safari). Session cookies from the bank login are scoped to the WKWebView instance. This matches desktop behavior — no architectural difference — but must be confirmed during Phase F bringup.

  5. WebKit API surface differences (risk: LOW): Minor API differences between WebKitGTK (desktop CI) and WKWebView (iOS) could affect edge cases in the RSC/__NEXT_DATA__/script-json extractors. The fake-provider CI run on iOS is the only reliable way to catch these without real devices.

12. Acceptance Criteria For This Design CAS

  • Proposal explicitly states declarative-only invariant and iOS policy boundary.
  • Proposal supersedes unified-providers Phase 3 deprecation premise.
  • Proposal reconciles recorder plan as authoring path.
  • Proposal defines trust, signature, provenance, and revocation model.
  • Proposal defines a staged rollout sequence suitable for implementation breakdown.
  • Proposal includes a concrete Generic Scraper Engine chapter (§6) with extraction-strategy taxonomy, auth/navigation model, prefetch step model, honest gap assessment, and engine version contract.
  • Proposal includes a first-class Automated Test Architecture chapter (§11) with gap↔fake 1:1 mapping table, full end-to-end test sequence, CI acceptance-gate definition, mapper fixture test track, and iOS test leg with explicit design risks.
  • §9 rollout plan includes sequenced Phase F (iOS port), gated on desktop A–E + harness green, with explicit iOS-specific scope constraints and design risks.
  • §10 open decisions confirmed by regent (2026-05-18: “agree with all”).

What changed {#what-changed}

2026-05-18 — CAS-3499 Declarative Provider Plugins (shipped)

The declarative plugin system shipped across Phases A–E. Provider plugins are now downloadable from Cloudflare R2, signature-verified (Ed25519), schema-validated, and activated inside the app without any executable code leaving the trust boundary. The full catalog UI is live in Settings. This architecture doc was updated from proposal to delivered status.

See: CHANGELOG → 2026-05-18