Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .claude/resources/build-and-images.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Build system and images

## rules.mk

Every env `Makefile` includes the root `rules.mk` (builders include it via `../../rules.mk`).
Defaults to know:

- `PLATFORMS ?= linux/amd64,linux/arm64` — multi-arch by default.
- `REPO ?= ghcr.io/fission`, `TAG ?= dev`.
- **`DOCKER_FLAGS ?= --push`** — a bare `make` attempts to push to ghcr.io.
- The generic rule is `%-img:` → `docker buildx build $($@-buildargs) --platform=$(PLATFORMS) -t $(REPO)/<name>:$(TAG) $(DOCKER_FLAGS) -f $< .`
- Per-target build args are declared as `<image>-img-buildargs := --build-arg KEY=value`.
- Target-specific platform overrides are supported, e.g. `tensorflow-serving-env-img: PLATFORMS=linux/amd64` (the upstream `tensorflow/serving` image is amd64-only).

## Local builds

```sh
cd <env>/ && make <image>-img DOCKER_FLAGS=--load PLATFORMS=linux/arm64
cd <env>/builder/ && make <builder>-img DOCKER_FLAGS=--load PLATFORMS=linux/arm64
```

Use a single platform with `--load`; buildx cannot `--load` a multi-arch manifest.
On Apple Silicon use `linux/arm64` for speed; CI builds run on amd64.
All base images used are multi-arch except `tensorflow/serving`.

## Where build args live (three places, keep in sync)

The same base-image pin is duplicated in:

1. `<env>/Makefile` (`<image>-img-buildargs`)
2. `<env>/builder/Makefile` (`<builder>-img-buildargs`) — easy to miss; it has bitten before
3. `skaffold.yaml` (the env's build profile `buildArgs`, used by CI)

Dockerfile `ARG` defaults are a fourth copy in some envs (python).
When bumping a base image, grep for the old value across all of these plus READMEs and `envconfig.json`'s `runtimeVersion`.

## Build contexts

- The env image context is `<env>/`; the builder image context is `<env>/builder/`.
- A file needed by both images (e.g. `jvm/install-fission-java-core.sh`) must be physically duplicated into `builder/` — symlinks break because Docker contexts don't follow them out of tree.
Mark such copies with a keep-in-sync header comment.

## skaffold.yaml

- Per-env build profiles plus a helm deploy of the fission chart (`remoteChart` URL pins the fission version; chart release tags have no `v` prefix, e.g. `fission-all-1.25.0`).
- CI uses `SKAFFOLD_PROFILE=<env> make skaffold-run`; skaffold with the kind profile loads built images into the kind cluster.
- Some skaffold image names differ from release image names (e.g. profile builds `jvm-jersey-env` while releases publish `jvm-jersey-env-25`).
42 changes: 42 additions & 0 deletions .claude/resources/ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# CI

## Workflow structure

`.github/workflows/environment.yaml` runs on PRs to master.
A `check` job runs `dorny/paths-filter` (filters in `.github/workflows/filters/filters.yaml`); each env job gates on its filter key, so only changed environments build and test.

Two kinds of env jobs:

- **Full e2e** (binary, go, jvm, nodejs, python, python-fastapi, dotnet8): setup-cluster → `SKAFFOLD_PROFILE=<env> make skaffold-run` → `make <env>-test-images` (kind-load) → `make router-port-forward` → `./test_utils/run_test.sh <env>/tests/test_*_env.sh` → fission dump on failure.
- **Build-only** (perl, php7, ruby, tensorflow, jvm-jersey): setup-cluster + `make skaffold-run` only; no functional test.
Compensate with local container smoke tests (specialize + invoke) before pushing changes to these envs.

Composite actions: `.github/actions/setup-cluster` (helm + `helm/kind-action` with `cluster_name: kind` + fission CLI + skaffold install + crds; version pins live in its input defaults) and `.github/actions/collect-fission-dump` (best-effort by design — must never mask the original failure).

## Gotchas (each of these caused a real failure)

1. **E2e tests must pin the local image.**
Test scripts must `export <ENV>_RUNTIME_IMAGE=<image>` to the kind-loaded name (e.g. `jvm-env`, `go-env`).
The fallback defaults in `test_utils/utils.sh` point at years-stale Docker Hub images (`fission/jvm-env` etc.) and silently test the wrong image.
2. **Workflow-only PRs exercise nothing.**
A PR touching only `.github/` triggers no env jobs, so composite-action changes go unvalidated and can break master for every subsequent run.
Include a small genuine change under one env dir (e.g. a `perl/` README fix) to force one job through the changed path before merging.
3. **Exact-match the filter gates.**
`packages` is a JSON array string; use quoted matches like `contains(needs.check.outputs.packages, '"jvm"')`.
Bare substrings cross-trigger: `jvm` matches `jvm-jersey`, `python` matches `python-fastapi`.
Also: it's `needs.check.outputs.packages` — `needs.check.outputs` alone never matches (historical bug that kept the python job from ever running).
4. **Don't reintroduce action pins that ship without compiled dist.**
`engineerd/setup-kind@v0.6.2` failed with `File not found: dist/main/index.js`; `helm/kind-action` is the maintained replacement.
`hiberbee/github-action-skaffold` pins skaffold 2.3.1 which cannot parse `skaffold/v4beta13` — use `make skaffold-run` instead.

## Test harness

- `test_utils/run_test.sh [files...]` runs tests via GNU parallel and aggregates logs; a file containing the line `#test:disabled` is skipped.
- macOS prerequisites: `brew install coreutils findutils gnu-sed parallel` (see `test_utils/init_tools.sh`).
- Some envs have cluster-free `local_test.sh` (binary, nodejs, python, python-fastapi) — run these first, they catch dependency breakage in seconds.

## Debugging CI

- `gh run view <id> --log-failed` for failing steps; e2e test output is embedded in the `run_test.sh` log dump.
- Function-level failures need the fission dump artifact (`<env>-fission-dump`); a `test_fn` curl loop timing out (exit 124) usually means the function pod never became ready — check the env image actually used (gotcha 1).
- Local e2e reproduction works with kind + skaffold + fission CLI installed (`make verify-kind-cluster create-crds`, then the same steps as CI).
67 changes: 67 additions & 0 deletions .claude/resources/environment-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Per-environment notes

State as of the June 2026 dependency-update series (PRs #436–#446, #450–#451).

## jvm (Spring Boot)

- Java 25 LTS (eclipse-temurin alpine), Spring Boot 3.5.x, Maven 3.9.x.
- `io.fission:fission-java-core` was only ever published as `0.0.2-SNAPSHOT` to oss.sonatype.org (OSSRH), decommissioned in 2025 — it resolves from **no** remote repository.
`install-fission-java-core.sh` builds it from a pinned commit of [fission/fission-java-libs](https://github.com/fission/fission-java-libs) and installs it locally as `0.0.1`, `0.0.2`, and `0.0.2-SNAPSHOT` (the SNAPSHOT keeps pre-existing user functions building).
The script exists twice (env context and `builder/` context) — keep both copies in sync.
The library's 2018-era pom lacks XML namespace declarations; the script patches the root element before running maven plugins.
- The CI test builds the example jar in a clean maven container, so the test script must run the install script there too.

## jvm-jersey

- Jersey 2.x (javax namespace) on Jetty 9.4.x, Java 25; depends on `io.fission:fission-jvm-jersey:0.0.1` which IS on Maven Central (unlike fission-java-core).
- Image names carry the Java version suffix (`jvm-jersey-env-25`, `jvm-jersey-builder-25`); renaming requires touching Makefile target names, envconfig, and the fission.io catalog.

## python / python-fastapi

- `python:3.13-alpine`; Flask 3.x + bjoern/gevent, FastAPI + uvicorn.
- bjoern needs libev headers (alpine: in image; macOS local: `brew install libev` with `CFLAGS=-I/opt/homebrew/include LDFLAGS=-L/opt/homebrew/lib`).
- `flask_sockets.py` is vendored (upstream dead); Werkzeug ≥2.3 moved `parse_cookie` to `werkzeug.sansio.http`.
- Local tests: `USERFUNCVOL=/tmp RUNTIME_PORT=<port> ./tests/local_test.sh`, then repeat with `WSGI_FRAMEWORK=GEVENT` — the gevent path exercises the fragile websocket stack.

## nodejs

- Three image flavours from one Dockerfile via `NODE_BASE_IMG`: `node-env` + `node-env-22` (alpine) and `node-env-debian`.
- ESM-first server with CJS support; `test/local_test.sh` covers both loaders.
- The Dockerfile copies only `package.json` (not the lockfile), so images resolve dependency floors at build time — lockfile refreshes need a version bump + rebuild to reach the published image.

## go

- Versioned image pair (`go-env-1.xx`, `go-builder-1.xx`) plus unversioned aliases; bump = rename targets/images in `go/Makefile`, `go/builder/Makefile`, `envconfig.json`, `skaffold.yaml`, the example spec, and the fission.io catalog.
- Plugin model: function `.so` must be built with the exact toolchain of the env server — env and builder share `GO_VERSION`.

## binary

- Alpine + a small Go server executing arbitrary binaries; `go mod init`/`tidy` at image build (stdlib only).

## ruby

- `ruby:3.4-alpine`; Rack pinned `~> 2.2` (Rack 3 removed `Rack::Handler`, which `server.rb` uses via thin).
- Regenerate `Gemfile.lock` inside the target container and `bundle lock --add-platform` for both gnu and musl, amd64 and arm64.
- Builder uses bundler deployment config (`bundle config set --local deployment true`), not the deprecated `--deployment` flag.

## php7 (directory name kept; runs PHP 8.3)

- `php:8.3-alpine`, react/http 1.x, Monolog 3, php-parser 5.
- Only compile extensions NOT bundled with the official image; rebuilding bundled exts (e.g. iconv) fails on musl.
`json` is core; `xmlrpc`/`mcrypt` were removed from PHP 8 and their PECL ports are unmaintained.
- The directory and image name stay `php7`/`php-env` — path filters and release derivation depend on them.

## perl

- Pinned `perl:5.42`; Dancer2 + Twiggy; v1 specialize only.

## tensorflow-serving

- Pinned `tensorflow/serving` tag; upstream publishes **amd64 only** — the Makefile target overrides `PLATFORMS=linux/amd64`.
- Go proxy built with modules initialized at build time (`pkg/errors`, `zap`).

## dotnet / dotnet20 (frozen legacy)

- .NET Core 1.1 / 2.0, both EOL years ago, on the removed `microsoft/dotnet` Docker Hub images.
- Intentionally untouched; their release matrix legs fail if a reconcile run picks them up — expected.
- `dotnet8/` is the supported .NET path.
42 changes: 42 additions & 0 deletions .claude/resources/release-process.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Release process

## Version-bump-driven releases

1. Bump `version` in `<env>/envconfig.json` (this is the image tag to publish).
2. Run `make update-env-json` — sorts every `envconfig.json` (jq) and regenerates the root `environments.json`.
Never hand-edit `environments.json`; commit the regenerated file with the bump.
3. On merge to master, `.github/workflows/release.yaml` (path filter `**/envconfig.json**`) runs `hack/release_check.py`, which emits a matrix of every `image:version` not yet on ghcr.io; the workflow's `docker-buildx-push` job then runs `TAG=<version> make <image>-img` (and `<builder>-img` in `builder/`) plus a `latest` push for each matrix entry.

Image content changes without an envconfig bump never release — if a merged change should reach the published image (e.g. a lockfile refresh), follow up with a version-bump PR.
Conversely, examples/ and docs changes don't need a bump (they're not in the image).

## release_check.py semantics

- Checks GHCR via the v2 API with an anonymous bearer token.
- Token endpoint 401/403/404 ⇒ the package doesn't exist yet ⇒ release needed (GHCR refuses tokens for unknown packages — this is the first-release path for renamed/new images).
- tags/list 200 ⇒ skip if tag present; 404 ⇒ release; anything else raises (fail-closed so registry hiccups can't trigger mass re-pushes of `latest`).
- Outputs go to `$GITHUB_OUTPUT`; `release_needed` is lowercase `true`/`false` and release.yaml gates on `== 'true'` — keep these in sync.
- **Reconcile mode**: invoked with no package list (e.g. `gh workflow run release.yaml`), it scans every `*/envconfig.json` and releases anything unpublished.
Use this to backfill after a failed release run.
Expect the legacy `dotnet`/`dotnet20` matrix legs to fail (EOL bases); `fail-fast: false` keeps other legs going.
- Testable locally: `GITHUB_OUTPUT= python3 hack/release_check.py '[python,go]'` (needs `requests`).

## Multi-PR trains

Every env PR rewrites the generated `environments.json`, so PRs in a series conflict with each other on that file.
Merge serially; for each next PR: `git merge origin/master`, run `make update-env-json` to resolve the conflict canonically, `git add environments.json`, commit, push, wait for green, merge.
Take master's side for workflow-file conflicts when master's change is a superset of the branch's.

## Version pin locations for the Fission version

The Fission version string must be bumped together in four places (note the differing key names — grepping for `FISSION_VERSION` alone misses two):
`FISSION_VERSION` in `rules.mk` and in `environment.yaml`'s `env`, the `fission-cli-version` input default in `setup-cluster/action.yml`, and the hardcoded skaffold `remoteChart` URL (chart tags have no `v` prefix: `fission-all-1.25.0`).

## Downstream: fission.io website

The site mirrors this repo's catalog.
After image renames, new environments, or removals, sync the site repo (it has a `updating-environments-and-examples` skill):

- `static/data/environments.json` stores image/builder *names* only (not versions) — only name changes matter there.
- `tools/environments.py` regenerates it from this repo's manifest and is keyed by image name (display names are not unique — both jvm and jvm-jersey report "JVM Environment").
- Docs pages may embed versioned image names in examples (grep for `go-env-1.`, old runtime versions); leave historical release-notes pages untouched.
41 changes: 41 additions & 0 deletions .claude/resources/runtime-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Runtime architecture

## The environment contract

Every environment is an HTTP server listening on **port 8888** that fission's fetcher/executor drives:

- `POST /specialize` (v1): load user code from the fixed path `/userfunc/user`.
- `POST /v2/specialize`: JSON body `{"filepath": "...", "functionName": "..."}`; `filepath` may be a single file or a directory (built package).
- All subsequent requests on `/` (any method) are routed to the loaded user function.
- Most servers also expose `GET /healthz`.

A container specializes exactly once; pool manager replaces pods rather than re-specializing.
Unspecialized containers return an error on `/` ("Container not specialized" or similar) — that response is the expected pre-specialization behaviour, not a bug.

## functionName semantics differ per language

- **jvm**: fully-qualified class name implementing `io.fission.Function` (e.g. `io.fission.HelloWorld`).
- **ruby**: the *method* name defined by the loaded file(s), e.g. `handler` — NOT the filename.
Passing a filename makes `method(func)` raise and specialize returns 500.
- **php**: `module::function` (e.g. `hellopsr.php::handler`).
Without the `::` divider the env enters legacy echo mode: the file is `require`d and its buffered output is returned as the response body.
- **python**: `module.function` style handled by the server's module loader.
- **go**: entrypoint symbol in a Go plugin (`.so`) — see toolchain note below.

## Builder contract

Builder images run `/usr/local/bin/build` (from `build.sh`/`defaultBuildCmd`) with `SRC_PKG` and `DEPLOY_PKG` env vars, transforming a source package into a deploy package.
Examples: maven `package` (jvm), `bundle install` with deployment config (ruby), `composer install` (php), `pip install -r requirements.txt -t` (python), go plugin build.

## Language-specific runtime notes

- **go**: functions are Go plugins; the function build toolchain MUST exactly match the env server's toolchain.
Env and builder Dockerfiles share the `GO_VERSION` build arg — always bump them together.
- **jvm**: depends on `io.fission:fission-java-core`, which is not resolvable from any remote repository (see environment-notes.md); it is built from source by `install-fission-java-core.sh` in the env image, the builder image (pre-seeded `/root/.m2`), and the CI test container.
- **ruby**: `fission/specializer.rb` loads vendored gems from `vendor/bundle/ruby/*/gems/*/lib` and native extensions via a platform-wildcard glob (images are musl, amd64+arm64 — never hardcode a platform dir).
Stay on Rack 2.2.x: `server.rb` uses `Rack::Handler::Thin`, removed in Rack 3.
- **php**: react/http 1.x `HttpServer` with the auto-run global loop; uncaught handler Throwables become 500s and the process keeps serving (an `error` listener logs them).
`ob_start` must be balanced on every early-return path — the process is long-running, leaked buffer levels accumulate and can corrupt later echo-mode responses.
- **python**: serves via bjoern by default or gevent (`WSGI_FRAMEWORK=GEVENT`), with vendored `flask_sockets.py` for websockets (Werkzeug 3 moved `parse_cookie` to `werkzeug.sansio.http`).
- **perl**: Dancer2 + Twiggy; only `/specialize` (v1) and `/` routes.
- **tensorflow-serving**: a Go proxy (`server.go`) in front of `tensorflow_model_server`; built with go modules initialized at image build time.
45 changes: 45 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What this repo is

Language runtime environments for [Fission](https://fission.io) (Kubernetes serverless framework).
Each top-level directory (`go/`, `python/`, `nodejs/`, `jvm/`, `binary/`, `dotnet8/`, etc.) is one self-contained environment that produces Docker images published to `ghcr.io/fission`.

Every environment follows the same layout: `server.*` (the runtime HTTP server), `Dockerfile` + `Makefile` (runtime image), optional `builder/` (builder image with `build.sh`), `envconfig.json` (metadata; the `version` field drives releases), `examples/`, and `tests/` or `test/`.

## Quick commands

```sh
# Local image build (a bare `make` tries to PUSH multi-arch to ghcr.io!)
cd python/ && make DOCKER_FLAGS=--load PLATFORMS=linux/arm64

# Cluster-free unit tests (binary, nodejs, python, python-fastapi)
cd nodejs/ && ./test/local_test.sh

# E2e against a kind cluster (envs with e2e jobs)
SKAFFOLD_PROFILE=python make skaffold-run
make python-test-images router-port-forward
./test_utils/run_test.sh ./python/tests/test_python_env.sh

# After any envconfig.json change (never hand-edit environments.json)
make update-env-json
```

## Detailed guides

Read the relevant file before working in that area:

- [.claude/resources/build-and-images.md](.claude/resources/build-and-images.md) — make/buildx system, where build args live (and drift between), multi-arch rules, local build recipes.
- [.claude/resources/runtime-architecture.md](.claude/resources/runtime-architecture.md) — the specialize protocol (v1/v2), per-language entrypoint semantics, builder contract.
- [.claude/resources/ci.md](.claude/resources/ci.md) — workflow structure, path-filter gotchas, how e2e tests pick images, debugging CI failures.
- [.claude/resources/release-process.md](.claude/resources/release-process.md) — version-bump-driven releases, the GHCR gate, reconcile mode, multi-PR trains.
- [.claude/resources/environment-notes.md](.claude/resources/environment-notes.md) — per-environment quirks and history (jvm's vendored dependency, EOL legacy dotnet, amd64-only tensorflow, etc.).

## Hard rules

- `environments.json` is generated — regenerate with `make update-env-json`, never edit by hand.
- Bumping `version` in any `envconfig.json` triggers an image release when merged to master.
- Build args are duplicated across each env `Makefile`, its `builder/Makefile`, and `skaffold.yaml` — update all three together.
- The fission.io website mirrors `environments.json`; after image renames or new environments, sync the site (see release-process.md).
Loading