feat: add embedded adapters (granite switch) to openai backend by jakelorocco · Pull Request #881 · generative-computing/mellea

jakelorocco · 2026-04-17T21:04:57Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes N/A

Adds support for granite switch models for OpenAIBackends.

Changes:

OpenAIBackends can utilize EmbeddedIntrinsicAdapters with any of the regular intrinsic functionality.
Adapters are loaded by default when load_embedded_adapters=True at init.
Added a placeholder IBM_GRANITE_SWITCH_4_1_8B that points to the temp repo for now.
Added documentation, examples, tests.
When generating from an intrinsic, we now only grab temperature and seed from model options. This is also correctly noted.
call_intrinsic function now works with both OpenAIBackends and LocalHFBackends

Please see most recent comment below for additional context: #881 (comment)

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

Enable calling intrinsics on Granite Switch models via the OpenAI backend. Granite Switch models embed adapter weights directly in the checkpoint and activate them via chat template control tokens, so no PEFT loading is needed. - Add EmbeddedIntrinsicAdapter class that carries only I/O config (no weights) with factory methods to load from a model directory or HuggingFace Hub - Add register_granite_switch_model() and add_embedded_adapter() to OpenAIBackend - Add _generate_from_intrinsic() that reuses IntrinsicsRewriter/ResultProcessor and injects intrinsic_name into chat_template_kwargs for the switch model - Ensure serialized messages always include 'role' (latent issue in rewriter instruction messages, newly exposed by OpenAI API serialization path) - Add unit tests for adapter loading, registration, and rewriting Signed-off-by: lastrasl <lastrasl@us.ibm.com>

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

github-actions · 2026-04-17T21:05:10Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

jakelorocco · 2026-04-17T21:06:47Z

I'm still in the process of testing some of these changes from an e2e perspective. I also still need to add documentation for how to use embedded models along with examples.

These changes do seem to work though and allow utilizing intrinsics with an embedded adapter / granite switch model.

lastras · 2026-04-19T20:13:18Z

Bug: model_options temperature is silently dropped for intrinsic calls

The seed parameter is correctly forwarded to api_params at line 639, but temperature (and any other model_options entries) are not. They are written into request_json at line 617, which gets absorbed into the ChatCompletion object by rewriter.transform(), but never extracted back out when building api_params. The result is that callers passing model_options={ModelOption.TEMPERATURE: 0.0} get the model's default temperature instead — causing non-deterministic outputs even when greedy decoding is explicitly requested.

Fix: mirror the seed pattern immediately after line 640:

if ModelOption.TEMPERATURE in model_options:
    api_params["temperature"] = model_options[ModelOption.TEMPERATURE]

Verified: with this fix, 5 back-to-back pipeline runs (guardian → rewrite → answerability → QC → base model, no sleep between calls) produce bit-for-bit identical outputs including long-form base model answers. Without the fix, outputs vary run-to-run despite temperature=0.0 being passed by the caller. The warning at line 557 ("some model options may be overwritten / ignored") is technically accurate but understated — temperature is always silently dropped, not just sometimes.

nrfulton · 2026-04-20T10:28:38Z

@lastras Your comment states that all model options get dropped but that the fix is to add (another) special case for temperature in model_options.

It sounds from your comment like there's an underlying bug here that should be fixed. Can you clarify?

jakelorocco · 2026-04-20T12:03:39Z

@lastras @nrfulton, I am going to move this convo to slack for faster resolution.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

jakelorocco · 2026-04-20T18:30:08Z

I fixed a few random issues and also the issues we talked about (call_intrinsic and seed/temperature setting). I'm working on adding some documentation and minor examples.

The functional parts of the code should be stable now. I have additional things that I'd like to clean up, but I will delay them until after we make sure everything is working as intended and post-release, so I don't impact any functionality.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

jakelorocco · 2026-04-20T21:43:12Z

Updated the PR description. Fixed a few additional errors / bugs. Added documentation and examples. We should be good to start reviewing. I need to run one more round of tests with the latest from the granite-switch team.

I have plans to open a few additional issues to clean things up after this PR is merged.

Additional changes that should be made before the official release:

Update IBM_GRANITE_SWITCH_4_1_8B to point to the correct model
Update our documentation to point to the official granite switch documentation when published

ajbozarth

Good overall direction — the EmbeddedIntrinsicAdapter class is well-structured and the design of delegating I/O config loading to the existing io.yaml / IntrinsicsRewriter machinery makes sense. A few issues need addressing before this merges, ranging from a runtime bug to a dependency footprint concern.

ajbozarth · 2026-04-20T22:04:47Z

    "nltk>=3.9", # Needed for sentence tokenization in granite citation parsing.
    "rouge_score", # Needed for Majority Voting Sampling Strategies.
    "PyYAML", # Needed for backends/adapters and granite formatters.
+    "huggingface-hub>=0.33.4", # Needed for Granite Switch embedded adapter downloads (OpenAI backend).


huggingface-hub is now a hard dependency for all Mellea users, including those who only use the Ollama or non-switch OpenAI backends. It's also duplicated — it already appears in the hf extra at line 51.

AGENTS.md §5 calls for optional backend imports to be wrapped in try/except ImportError with a helpful message. The HF download only happens when using a Granite Switch model, so this should either be gated behind an optional extra (e.g., openai or a new switch extra) with a try/except ImportError wrapping calls in EmbeddedIntrinsicAdapter.from_hub, or at minimum the duplicate in the hf extra should be removed.

ajbozarth · 2026-04-20T22:04:47Z


+# Pre-Built Granite Switch Models
+IBM_GRANITE_SWITCH_4_1_8B = ModelIdentifier(
+    hf_model_name="GrizleeBer/gs-test-2"  # Placeholder.


IBM_GRANITE_SWITCH_4_1_8B points to a personal repo (GrizleeBer/gs-test-2). Any user who imports this constant today — or who passes it as a model ID with load_embedded_adapters=True — will silently hit that personal repo.

Please block merge on updating this to the real repo ID, or make it raise explicitly when accessed so it can't accidentally ship in a release.

ajbozarth · 2026-04-20T22:04:47Z

+        # adapters during call_intrinsic, or once we support other types of adapters for
+        # OpenAIBackends.
+        # OpenAI Backends only support embedded_adapters.
+        self._uses_embedded_adapters = True


_uses_embedded_adapters is unconditionally True for every OpenAIBackend, regardless of whether the served model is actually a Switch model. This means calling any intrinsic on a non-Switch OpenAI backend will attempt a HuggingFace download and fail with a confusing FileNotFoundError (no adapter_index.json) rather than a clear "this model does not support embedded intrinsics" error.

Suggest tying this flag to whether load_embedded_adapters=True was passed (or whether register_embedded_adapter_model has been called), and raising a clear error in _generate_from_intrinsic when no adapters are registered.

ajbozarth · 2026-04-20T22:04:47Z

+            raise NotImplementedError("Intrinsics require a chat context.")
+
+        # Intrinsics don't support streaming because of their post-processing step.
+        if model_options.get(ModelOption.STREAM, None) is not None:


This guard raises NotImplementedError whenever ModelOption.STREAM appears in model_options at all — including STREAM=False. Compare the analogous check at line 877 which correctly uses:

if model_opts.get(ModelOption.STREAM, False):

Fix:

if model_options.get(ModelOption.STREAM, False): raise NotImplementedError("Intrinsics do not support streaming.")

ajbozarth · 2026-04-20T22:04:47Z

+
+        if len(model_options.items()) > 0:
+            MelleaLogger.get_logger().info(
+                "passing in model options when generating with an intrinsic; only temperature and seed are kept from model options"


The log message says "only temperature and seed are kept" but SYSTEM_PROMPT is also extracted and used a few lines below:

system_prompt = model_options.get(ModelOption.SYSTEM_PROMPT, "")

Either update the message to accurately list all supported options, or narrow the log condition to only fire when unrecognized options are present.

ajbozarth · 2026-04-20T22:04:47Z

+
+        # TODO: OpenAIBackend only supports EmbeddedAdapters.
+        #       It should be refactored into a specific adapter.transform() function.
+        assert isinstance(adapter, EmbeddedIntrinsicAdapter), (


assert is disabled when Python runs with -O (optimized mode), making this a silent no-op in production environments. Replace with an explicit guard:

if not isinstance(adapter, EmbeddedIntrinsicAdapter): raise TypeError( f"OpenAIBackend only supports EmbeddedIntrinsicAdapter, got: {type(adapter).__name__}" )

ajbozarth · 2026-04-20T22:04:47Z

+    # adapter loading: 1. regular adapters, and 2. embedded adapters.
+    if not has_adapter:
+        # EmbeddedAdapters get grabbed directly from the hf repo.
+        if getattr(backend, "_uses_embedded_adapters", False):


Accessing private attributes (_uses_embedded_adapters, _model_id) of a concrete class via getattr from outside that class couples _util.py to OpenAIBackend internals without going through the AdapterMixin interface.

Suggest adding a method to AdapterMixin (e.g., uses_embedded_adapters() -> bool) so this is an explicit part of the interface contract rather than a duck-typed private attribute check.

ajbozarth · 2026-04-20T22:04:47Z

+            )
+        adapter_type = AdapterType.ALORA if technology == "alora" else AdapterType.LORA
+        super().__init__(intrinsic_name, adapter_type)
+        self.intrinsic_name = intrinsic_name


self.intrinsic_name is set here, but super().__init__(intrinsic_name, adapter_type) already sets self.name = intrinsic_name (line 49 of the base class). The test even asserts adapter.intrinsic_name == adapter.name.

Pick one: use self.name (the inherited attribute) or rename self.name to self.intrinsic_name in the base class. Having both on EmbeddedIntrinsicAdapter creates ambiguity about which is canonical.

ajbozarth · 2026-04-20T22:04:47Z

+            if intrinsic_name is not None:
+                raise ValueError(
+                    f"No adapter found for intrinsic '{intrinsic_name}' in {repo_id}"
+                ) from None


raise ... from None suppresses the original exception chain, which includes the local model path that was searched — useful information for debugging. Use raise ... from e (or just raise) to preserve the context.

ajbozarth · 2026-04-20T22:04:47Z

+        int(os.environ.get("CICD", 0)) == 1,
+        reason="Skipping OpenAI intrinsics tests in CI",
+    ),
+    pytest.mark.skip(


The entire E2E test file is unconditionally skipped. The unit tests in test_embedded_adapter.py cover adapter loading well, but the ~200-line _generate_from_intrinsic code path in OpenAIBackend has no test coverage at all.

At minimum, add unit tests that mock the OpenAI client and verify the generate path end-to-end (chat_template_kwargs is set, result_processor is applied, temperature/seed are forwarded). These can live alongside the existing unit tests rather than requiring a real vLLM server.

ajbozarth · 2026-04-20T22:19:38Z

This is pretty cool, I had Claude explain it to me along with its review. Afaik the review comments look important to address, but I didn't block on them just incase some are intentional. I also ran uv run pytest locally and it failed on the same test as CI

ajbozarth · 2026-04-20T22:20:30Z

 import re
 from typing import TypeVar

+import huggingface_hub


TestFromHub.test_missing_huggingface_hub_raises fails (both in CI and locally) because of this top-level import.

The test patches sys.modules["huggingface_hub"] to None to simulate the library being absent, but since the name is already bound here at module load time, the patch has no effect — snapshot_download runs for real, hits HuggingFace with some/repo, gets a 401, and raises RepositoryNotFoundError instead of the expected ImportError.

Two consistent fixes:

Option A — keep huggingface_hub optional (recommended): Remove this top-level import and move it inside from_hub:

@staticmethod def from_hub(repo_id, ...): try: import huggingface_hub except ImportError: raise ImportError( "huggingface_hub is required to load embedded adapters from the Hub. " "Install it with: pip install huggingface-hub" ) from None local_root = huggingface_hub.snapshot_download(...)

The test then passes as-is, and huggingface_hub can be removed from base dependencies.

Option B — accept it as a hard dependency: Remove test_missing_huggingface_hub_raises (the scenario it tests can never happen with a top-level import) and update the from_hub docstring to drop the ImportError entry from Raises:.

lastrasl@us.ibm.com;4A8621897;Luis Lastras and others added 3 commits April 10, 2026 18:11

feat: refactor embedded adapters

45d2277

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: issues with qualitative openai intrinsics tests

0656a57

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

github-actions bot added the enhancement New feature or request label Apr 17, 2026

jakelorocco changed the title ~~feat: add embedded adapters to openai backend~~ feat: add embedded adapters (granite switch) to openai backend Apr 17, 2026

jakelorocco added 2 commits April 20, 2026 08:22

fix: minor refactoring

6f4f36f

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

fix: call_intrinsic function to work with other adapter types

c0f54d8

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

jakelorocco self-assigned this Apr 20, 2026

jakelorocco added 5 commits April 20, 2026 13:38

fix: more improvements to adapters and call_intrinsic

521cebb

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: explicitly extract seed and temp from model options for intrinsics

76aa1ae

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: add tests specifically for openai backend and call_intrinsic flow

f958d32

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: rename openai init param to load_embedded_adapters

8f3574d

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

fix: pre-commit issues

3227a5b

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

jakelorocco force-pushed the jal/granite-switch-intrinsics branch from c0b163c to 3227a5b Compare April 20, 2026 18:20

fix: merge changes from main

1e53b23

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

jakelorocco added 4 commits April 20, 2026 16:10

feat: add model id for granite switch

d1d6978

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

feat: add examples and documentation for granite-switch in mellea

07b59e7

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: update pyproject dependencies for openai to support granite-switch

535d3dd

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: pre-commit issues

bc76801

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

jakelorocco marked this pull request as ready for review April 20, 2026 21:45

jakelorocco requested a review from a team as a code owner April 20, 2026 21:45

jakelorocco requested review from ajbozarth and avinash2692 April 20, 2026 21:45

ajbozarth reviewed Apr 20, 2026

View reviewed changes

Conversation

jakelorocco commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

jakelorocco commented Apr 17, 2026

Uh oh!

lastras commented Apr 19, 2026

Uh oh!

nrfulton commented Apr 20, 2026

Uh oh!

jakelorocco commented Apr 20, 2026

Uh oh!

jakelorocco commented Apr 20, 2026

Uh oh!

jakelorocco commented Apr 20, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth commented Apr 20, 2026

Uh oh!

ajbozarth Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jakelorocco commented Apr 17, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading

ajbozarth Apr 20, 2026 •

edited

Loading