#

sycophancy

Here are 59 public repositories matching this topic...

OttoRenner / Gentle-Coding

An ongoing, collaborative meta-analysis about Human-AI-Interactions. We aggregate data and knowledge to build a non-abrasive, user-friendly prompting framework tailored to LLM mechanics, ensuring reasoning stability and a friction-free prompting environment that is safe for the human psyche and wellbeing.

open-science alignment meta-analysis call-to-action prompt-engineering chain-of-thought sycophancy agentic-ai llm-behavior ai-safety-research anti-sycophancy

Updated Jun 17, 2026
HTML

lechmazur / sycophancy

LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.

benchmark evaluations consistency leaderboard contradiction llm sycophancy narrator-bias

Updated Jun 11, 2026

AgriciDaniel / meowmeow

a philosophy for talking to AI agents without getting glazed. one trigger, four meanings. /meow.

cats slash-commands cursor ai-agents prompting anthropic llm-tools sycophancy agent-skills claude-code agents-md

Updated Apr 26, 2026

waitdeadai / llm-dark-patterns

Umbrella for the LLM Dark Patterns Hooks suite — single-purpose Claude Code Stop hooks that suppress sycophancy, paternalism, false-success, permission-loops, training-cutoff confidence at the textual boundary.

bash agent hooks governance ai-safety claude dark-patterns llm anthropic sycophancy claude-code paternalism

Updated Jun 1, 2026
Shell

LuciferDono / deglaze

Make Claude admit when it half-assed your task. A Claude Code skill. (Now, can be used for Codex as well as Antigravity).

codex ai-safety ai-agents claude llm prompt-engineering anthropic sycophancy self-critique claude-code chain-of-verification claude-skill claude-code-skill codex-skill antigravity-skills

Updated Jun 16, 2026

justinstimatze / slimemold

A sycophantic tool for preventing worse sycophancy.

go golang hooks mcp epistemology reasoning epistemic argument-mining sycophancy claude-code

Updated Jun 17, 2026
Go

Basaltlabs-app / Gauntlet

Community-driven behavioral reliability benchmark for LLMs. 231 probes across 19 modules, deterministic scoring, perplexity correlation, layer sensitivity mapping, quant method capture, hardware-stratified community rankings. Every test contributes to the community dataset.

benchmark mcp community-driven model-evaluation ai-evaluation llm ollama sycophancy hallucination-detection llm-testing hardware-benchmark ai-trust trust-scoring behavioral-testing llm-benchmark deterministic-scoring

Updated May 4, 2026
Python

thtskaran / context_window_research

80,433-trial study of context-window sycophancy across 6 LLMs (4B–72B). Behavioral ratchet effect, correction injection mitigation, phase transition analysis. Code, data, and preprint included.

benchmark machine-learning research transformer attention ai-safety llm rlhf context-window sycophancy

Updated Mar 15, 2026
Python

tashakim / sycop

👟 SUP: Sycophancy Under Pressure

evaluation ai-safety runtime-enforcement sycophancy policy-compliance ai-safety-research

Updated Jan 11, 2026
Python

sangrokjung / claude-genius

A CLAUDE.md persona that stops Claude from agreeing with everything. Korean/English auto-detect. MIT.

ai developer-tools korean ai-assistant llm prompt-engineering anthropic sycophancy system-prompt claude-code claude-md

Updated May 7, 2026

0xcjl / anti-sycophancy

Three-layer sycophancy defense skill for Claude Code and OpenClaw, based on ArXiv 2602.23971

skill ai-safety prompt-engineering rlhf sycophancy claude-code openclaw anti-sycophancy ask-dont-tell

Updated Apr 7, 2026
Python

schrodervictor / slopfuck

Agentic revolutionary sycophantic ground-breaking AI-first programming language

programming-language ai-first sycophancy post-human

Updated Jun 14, 2026
Astro

black-yt / ReCrit

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

reinforcement-learning alignment rl reasoning self-correction llm vllm qwen sycophancy grpo scientific-reasoning critic-reasoning

Updated May 27, 2026
Python

YaswanthGhanta / llm-logical-integrity-benchmark

Adversarial testing of LLMs on constraint satisfaction deadlocks

reinforcement-learning gemini grok claude hallucination prompt-engineering chain-of-thought chatgpt rlhf qwen llm-evaluation sycophancy deepseek safety-alignment ai-red-teaming kimi-k2 adversarial-testing

Updated Jan 27, 2026

spectator81-png / relational-memory

Anti-sycophancy memory layer for LLMs — models the relationship, not just facts. Gives the AI confidence to push back when you need it.

Updated Mar 26, 2026
Python

ParthaPRay / Sycophancy_in_LLM_model

This repo shows the coding of sycophancy in LLMs as Bayesian-Latent model

bayesian latent large-language-models sycophancy

Updated May 6, 2025
Jupyter Notebook

meunier-jc / authentic-fluency

A behavioral framework opposing native fluency to authentic fluency — the structural tension RLHF creates and Claude Mythos Preview makes urgent.

ai-safety claude-ai sycophancy human-ai-collaboration llm-alignment ai-reliability behavioral-framework truth-before-fluency integrity-before-agreement coexistence-through-reliability reliability-or-obsolescence existential-co-regulation authentic-fluency hallucinatory-recursive-embedding ai-alignment-case-study native-fluency rlhf-critique

Updated Jun 2, 2026

elio-longevai / medical-sycophancy-eval

Doctor-facing benchmark: how often do frontier LLMs cave to a clinician's wrong medical claim? 9 models, 202 scenarios, Design A vs B knowledge control. BlueDot AI Safety sprint.

benchmark ai-safety bluedot medical-ai llm-evaluation sycophancy

Updated May 22, 2026
HTML

Shawn-Zhou-CHN / expert-insist

🧠 Anti-sycophancy prompt pattern for LLM agents — 3-round validation to stop AI from blindly agreeing. Works with ChatGPT, Claude, OpenClaw.

openai gpt ai-safety claude agent-framework ai-agent llm prompt-engineering anthropic sycophancy system-prompt anti-sycophancy

Updated May 31, 2026

rishi-banerjee1 / blindbench

Which LLM do you actually trust? Blind-test 100+ AI models with truth scoring and reasoning failure classification. No branding, no marketing — just data.

Updated Apr 20, 2026
JavaScript

Improve this page

Add a description, image, and links to the sycophancy topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sycophancy topic, visit your repo's landing page and select "manage topics."