Skip to content
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
bf89089
propose llm policy
minrk Feb 19, 2026
b14b690
revise llm policy
KirstieJane Feb 23, 2026
00410d3
revise llm policy
KirstieJane Feb 23, 2026
f750b38
Merge branch 'llms' of https://github.com/KirstieJane/jupyterhub-team…
KirstieJane Feb 23, 2026
95452bb
add link to contributing guidelines
KirstieJane Feb 23, 2026
b22b123
added paragraph about alignment
KirstieJane Feb 23, 2026
9f2e702
Merge pull request #3 from KirstieJane/llms
minrk Feb 23, 2026
56187f0
revise: put a little back, hedge less, mention uncopyrightability
minrk Feb 23, 2026
059e5f3
remove not-quite-endorsement of a less-bad model
minrk Feb 23, 2026
1d1dce7
Apply suggestions from code review
minrk Feb 25, 2026
f272eb4
Slightly modify and adopt the SciPy AI policy
yuvipanda Feb 25, 2026
ddebd6c
Merge pull request #4 from yuvipanda/scipy-policy
minrk Feb 25, 2026
91308c3
consolidate copyright points, add one about review
minrk Feb 25, 2026
4aea7e6
slightly more direct: "takes responsibility"
minrk Feb 25, 2026
a40c3c9
link lobsters
minrk Feb 25, 2026
41994b1
add "I wrote this" to the top-level summary
minrk Feb 23, 2026
3078379
caveats about ai undustry
minrk Mar 19, 2026
319d691
this looks like a job for—dramatic pause—em-dash
minrk Mar 19, 2026
8ba6ba6
more clearly separate harms of the AI industry from those generic to …
minrk Mar 24, 2026
e3fb80a
Apply suggestions from code review
minrk May 1, 2026
5ba0eab
trade "I wrote this" for "I am the author"
minrk May 1, 2026
a2fdd04
value contributors
minrk May 1, 2026
7d4332d
Apply suggestions from code review
minrk May 4, 2026
ec1bb72
remove note against commercial AI in copyright
minrk May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/contribute/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ or as a [link][rick_roll] (`[link](https://youtu.be/dQw4w9WgXcQ)`) to another we
GitHub has a helpful page on
[getting started with writing and formatting Markdown on GitHub][writing_formatting_github].

## Generative AI (LLM) contributions

If you're thinking of using "AI" tools to contribute, make sure to check our ["Generative AI" / Large Language Model Contribution Policy](#llm-policy) first.

## Find issues to work on

Expand Down
126 changes: 126 additions & 0 deletions docs/contribute/llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
(llm-policy)=

# Generative AI/LLM Contribution Policy

We ask that all contributions (through issues and pull requests) are made by a human who takes responsibility for the code, documentation, or comment they submit.
Comment thread
minrk marked this conversation as resolved.
Outdated

See [our LLM policy](#llm:policy) below. Here's a brief summary:

- **Responsibility**: You are responsible for any code you submit to JupyterHub's repositories, regardless of whether it was manually written or generated by AI.
- **Disclosure**: You must disclose whether AI has been used to assist in the development of your pull request.
- **Code Quality**. We will reject pull requests that we deem being [AI slop](https://en.wikipedia.org/wiki/AI_slop).
Comment thread
minrk marked this conversation as resolved.
Outdated
- **Copyright**. We reserve the right to reject any pull requests, AI generated or not, where the copyright is in question.
Copy link
Copy Markdown
Member

@jnywong jnywong May 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because closed commercial LLMs do not appropriately attribute their training set, their output is never acceptable.

I think there has to be mention of this hard-line at the top-level if contributors are coming here for a quick glance for how to get started with contributions. I echo @consideRatio 's sentiment that this gets lost in the details further down, and is important enough to highlight since this should have a material effect on a PR workflow and probably the most contentious effect of this policy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because closed commercial LLMs do not appropriately attribute their training set, their output is never acceptable.

I fail to reconcile that statement with accepting any use of Claude Code, Codex, Copilot CLI, or autocomplete from associated companies' models that leads to any character written in the PR process.

I know almost nothing about copyright (especially not in an international context), but I figure with a statement like this retained in the policy, it would be better to clarify the statement implications concretely and discuss if we align and accept them rather than leaving it in for people to interpret differently. For me it reads extremely strict, while I don't pick up in this discussion that its meant to be interpreted that strict.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a suggestion to remove that sentence in #880 (comment)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I don't pick up in this discussion that its meant to be interpreted that strict.

fwiw, the existence of this statement matches with my interpretation of this policy and the discussion here so far.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My perception is influenced mostly by #880 (comment) and the fact that its not being stated at the top of the policy currently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely seems important enough to be at the top of the policy. But I also wouldn't complain if this was merged without that change, and would be happy to open a PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not interpret this statement @consideRatio as a ban on all AI tool usage. It definitely discourages the "throw it over the wall" and "lacking context of the project" AI slop. For contributors who may be unsure, they can ask or disclose what has been used as a first step which we can guide in a PR template.

I would recommend leaving as written for iteration 1. A separate issue can be created to discuss language improvement for a v2 of the policy.

@minrk I think this PR has reached general consensus. Let's aim for "good/well reasoned" for now over capturing everything. There's been wonderful input from the team across the board. I'm happy to press the green button to merge or leave it in your capable hands.

- **Communication**. When interacting with developers (forum, discussions, issues, pull requests, etc.) do not use AI to speak for you (except for translation).
- **AI Agents**. The use of an AI agent that writes code and then submits a pull request autonomously is not permitted.


Comment thread
minrk marked this conversation as resolved.

## Principles

We, the JupyterHub community, value:

* **human co-creation**: we are proud to collaborate with developers, researchers, educators, and infrastructure specialists across our global community.
Comment thread
minrk marked this conversation as resolved.
Outdated
* We respect **their copyright** over their work and appreciate their shared investment in the scientific open source ecosystem by licensing the materials under the [BSD 3-Clause "New" or "Revised" License](https://github.com/jupyterhub/jupyterhub/blob/main/LICENSE).
* We respect the **time** taken to read and review contributions, maintaining a high standard and supportive community engagement.
* **security**: JupyterHub is trusted infrastructure for hundreds of thousands of users and we prioritize keeping their data, code, and personal configuration information secure.
* **veracity**: beyond security, we ensure our tools do what we think they do, and that we respect accuracy in communications within and beyond the scientific open source ecosystem.
* **our global society**: we seek to minimise environmental impact and human exploitation in the development and deployment of JupyterHub infrastructure.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this section!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1. This is what almost always gets left out! JupyterHub can not exist in a world that's 2+ Celcius warmer on average, not to mention the other potential economic / political harms.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recognize this is hard given that we rely heavily on non-LLM services from these same organizations for many other things (like this GitHub we are using). However, I do think magnitude matters ("dose makes the poison"), as well as recognizing learned helplessness around what is 'inevitable' and what is not. I appreciate the current wording's use of 'minimize', as that is in fact the best we can do.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad this section resonates! I think its important to remind us all what we are FOR rather than just what we are against 💖

I also didn't figure out how to include the political harms / threats to democracy .... but I'm very happy to include those (as they're one of the things that I am most concerned about!) Happy to iterate on any suggested phrasings (and I'll noodle myself) if folks think that would be a good additional bullet point?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this section too! +1 to rooting ourselves in principles, values, and goals.


## Our concerns

"Generative AI" tools, such as LLMs, are

* contributing to rapid **over-burdening** of the reviewing capacity for many open source projects.
* Unlike human contributors, models _cannot_ learn from feedback, vastly diminishing the long-term community benefit of the review process.
* trained on very large datasets, leaving many unsettled questions about copyright and consent. [US Copyright guidance](https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf)

And in particular, the models in use by the commercial AI industry, which account for the vast majority of LLM-generated contributions today, are particularly destructive, as they are:

* trained on very large datasets without credit or consent and do not respect the licenses (or lack thereof) of this input data.
* Even _if_ copyright of LLM output is settled safely worldwide, the training of models on open-licensed and/or proprietary inputs and generating outputs stripped of credit remains objectionable.
* consuming very large amounts of energy and potable water, both during their training phases and as they are used to generate outputs. See the MIT Technology Review's summaries of the literature at their [Super Topic: AI and our energy future](https://www.technologyreview.com/supertopic/ai-energy-package/).

(llm:policy)=
## Policy

Comment thread
minrk marked this conversation as resolved.
### Responsibility

You are responsible for any code you submit to JupyterHub's repositories, regardless
of whether it was manually written or generated by AI. You must understand and be able
to explain the code you submit as well as the existing related code. It is not
acceptable to submit a patch that you cannot understand and explain yourself.
In explaining your contribution, do not use AI to automatically generate
descriptions. Always make sure you are comfortable saying "I wrote this"
before submitting code or a comment.

### Disclosure

If you've used AI in any part of the workflow that led to your contribution, you must disclose that you used AI and how you used it.
Document which tool(s) have been used, how they were used,
and specify what code or text is AI generated. We will reject any pull request
that does not include the disclosure.

### Code Quality

Code generated by AI can be of low quality. Contributors are expected to
submit code that meets JupyterHub's standards. We will reject pull requests that we
deem being [AI slop](https://en.wikipedia.org/wiki/AI_slop). Do not waste
Comment thread
minrk marked this conversation as resolved.
Outdated
developers' time by submitting code that is fully or mostly generated by AI, and
doesn't meet our standards.

### Copyright

All code in JupyterHub is released under the BSD 3-clause copyright license.
Contributors to JupyterHub license their code under the same license when it is
included into JupyterHub's version control repository. That means contributors must
own the copyright of any code submitted to JupyterHub or must include the BSD
3-clause compatible open source license(s) associated with the submitted code
in the patch. Code generated by AI may infringe on copyright and it is the
submitter's responsibility to not infringe. We reserve the right to reject any pull
requests, AI generated or not, where the copyright is in question.
Because closed commercial LLMs do not appropriately attribute their training set, their output is never acceptable.
Comment thread
minrk marked this conversation as resolved.
Outdated

### Communication

When interacting with developers (forum, discussions,
issues, pull requests, etc.) do not use AI to speak for you (except for
translation). If the developers want to chat with a chatbot,
they can do so themselves. Human-to-human communication is essential for an
open source community to thrive.

### AI Agents

The use of an AI agent that writes code and then submits a pull request autonomously is
not permitted.
We will not include or accept supporting resources in `AGENTS.md`, `CLAUDE.md` or similar files to our repos as we prefer human contributors follow the
[contribution guidelines](guide.md) that already exist.
If present, these files shall only include instructions to _prevent_ generating contributions, such as that [used by lobste.rs](https://github.com/lobsters/lobsters/blob/f2b1ccaa21af0721db73b77b7fb8bffe7ba12ffb/AGENTS.md).

### Other Resources

While these do not formally form part of JupyterHub's AI policy, the following resources
may be helpful in understanding some pitfalls associated with using AI to contribute to
JupyterHub:
Comment thread
minrk marked this conversation as resolved.
Outdated

Comment thread
minrk marked this conversation as resolved.
## Other Resources

While these do not formally form part of JupyterHub's AI policy, the following resources
may be helpful in understanding some pitfalls associated with using AI to contribute to
JupyterHub:
- https://llvm.org/docs/AIToolPolicy.html
- https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/
- https://jellyfin.org/docs/general/contributing/llm-policies/

## Acknowledgements

We thank the SciPy developers for [their AI
policy](https://github.com/scipy/scipy/blob/main/doc/source/dev/conduct/ai_policy.rst),
upon which the policy section of this document is largely based. They in turn credit the
[SymPy AI Policy](https://docs.sympy.org/dev/contributing/ai-generated-code-policy.html).

## Alignment and agreement

The [JupyterHub team](https://compass.hub.jupyter.org/team/) is diverse and passionate.
We do not all hold the same perspectives around the differential - and different - harms and benefits that can come from the training and deployment of large language models.

In our creation of this policy we sought to maximise our interpersonal alignment, compromising where appropriate to reach agreement on an actionable policy.
1 change: 1 addition & 0 deletions docs/index-team_guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The sections below provide some resources to help you get started contributing t
contribute/guide
contribute/skills
contribute/documentation
contribute/llm
```

**Team information** is meant to share information and context across team members, and align us on a plan of action.
Expand Down