-
Notifications
You must be signed in to change notification settings - Fork 41
proposal: llm contribution policy #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 23 commits
bf89089
b14b690
00410d3
f750b38
95452bb
b22b123
9f2e702
56187f0
059e5f3
1d1dce7
f272eb4
ddebd6c
91308c3
4aea7e6
a40c3c9
41994b1
3078379
319d691
8ba6ba6
e3fb80a
5ba0eab
a2fdd04
7d4332d
ec1bb72
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| (llm-policy)= | ||
|
|
||
| # Generative AI/LLM Contribution Policy | ||
|
|
||
| We ask that all contributions (through issues and pull requests) are made by a human who takes responsibility for the code, documentation, or comments they submit. | ||
|
|
||
| See [our LLM policy](#llm:policy) below. Here's a brief summary: | ||
|
|
||
| - **Responsibility**: You are responsible for any code you submit to JupyterHub's repositories, regardless of whether it was manually written or generated by AI. | ||
| - **Disclosure**: You must disclose whether AI has been used to assist in the development of your pull request. | ||
| - **Code Quality**. We will reject pull requests that we deem as [AI slop](https://en.wikipedia.org/wiki/AI_slop). | ||
| - **Copyright**. We reserve the right to reject any pull requests, AI generated or not, where the copyright is in question. | ||
| - **Communication**. When interacting with developers (forum, discussions, issues, pull requests, etc.) do not use AI to speak for you (except for translation). | ||
| - **AI Agents**. The use of an AI agent that writes code and then submits a pull request autonomously is not permitted. | ||
|
|
||
|
|
||
|
minrk marked this conversation as resolved.
|
||
|
|
||
| ## Principles | ||
|
|
||
| We, the JupyterHub community, value: | ||
|
|
||
| * **human co-creation**: we are proud to collaborate with developers, researchers, educators, and infrastructure specialists across our global community. We value contributors over their contributions. | ||
| * We respect **their copyright** over their work and appreciate their shared investment in the scientific open source ecosystem by licensing the materials under the [BSD 3-Clause "New" or "Revised" License](https://github.com/jupyterhub/jupyterhub/blob/main/LICENSE). | ||
| * We respect the **time** taken to read and review contributions, maintaining a high standard and supportive community engagement. | ||
| * **security**: JupyterHub is trusted infrastructure for hundreds of thousands of users and we prioritize keeping their data, code, and personal configuration information secure. | ||
| * **veracity**: beyond security, we ensure our tools do what we think they do, and that we respect accuracy in communications within and beyond the scientific open source ecosystem. | ||
| * **our global society**: we seek to minimise environmental impact and human exploitation in the development and deployment of JupyterHub infrastructure. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I love this section!
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Big +1. This is what almost always gets left out! JupyterHub can not exist in a world that's 2+ Celcius warmer on average, not to mention the other potential economic / political harms.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recognize this is hard given that we rely heavily on non-LLM services from these same organizations for many other things (like this GitHub we are using). However, I do think magnitude matters ("dose makes the poison"), as well as recognizing learned helplessness around what is 'inevitable' and what is not. I appreciate the current wording's use of 'minimize', as that is in fact the best we can do.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm glad this section resonates! I think its important to remind us all what we are FOR rather than just what we are against 💖 I also didn't figure out how to include the political harms / threats to democracy .... but I'm very happy to include those (as they're one of the things that I am most concerned about!) Happy to iterate on any suggested phrasings (and I'll noodle myself) if folks think that would be a good additional bullet point?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this section too! +1 to rooting ourselves in principles, values, and goals. |
||
|
|
||
| ## Our concerns | ||
|
|
||
| "Generative AI" tools, such as LLMs, are | ||
|
|
||
| * contributing to rapid **over-burdening** of the reviewing capacity for many open source projects. | ||
| * Unlike human contributors, models _cannot_ learn from feedback, vastly diminishing the long-term community benefit of the review process. | ||
| * trained on very large datasets, leaving many unsettled questions about copyright and consent. [US Copyright guidance](https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf) | ||
|
|
||
| And in particular, the models in use by the commercial AI industry, which account for the vast majority of LLM-generated contributions today, are particularly destructive, as they are: | ||
|
|
||
| * trained on very large datasets without credit or consent and do not respect the licenses (or lack thereof) of this input data. | ||
| * Even _if_ copyright of LLM output is settled safely worldwide, the training of models on open-licensed and/or proprietary inputs and generating outputs stripped of credit remains objectionable. | ||
| * consuming very large amounts of energy and potable water, both during their training phases and as they are used to generate outputs. See the MIT Technology Review's summaries of the literature at their [Super Topic: AI and our energy future](https://www.technologyreview.com/supertopic/ai-energy-package/). | ||
|
|
||
| (llm:policy)= | ||
| ## Policy | ||
|
|
||
|
minrk marked this conversation as resolved.
|
||
| ### Responsibility | ||
|
|
||
| You are responsible for any code you submit to JupyterHub's repositories, regardless | ||
| of whether it was manually written or generated by AI. You must understand and be able | ||
| to explain the code you submit as well as the existing related code. It is not | ||
| acceptable to submit a patch that you cannot understand and explain yourself. | ||
| In explaining your contribution, do not use AI to automatically generate | ||
| descriptions. Always make sure you are comfortable saying "I am the author" | ||
| before submitting code or a comment. | ||
|
|
||
| ### Disclosure | ||
|
|
||
| If you've used AI in any part of the workflow that led to your contribution, you must disclose that you used AI and how you used it. | ||
| Document which tool(s) have been used, how they were used, | ||
| and specify what code or text is AI generated. We will reject any pull request | ||
| that does not include the disclosure. | ||
|
|
||
| ### Code Quality | ||
|
|
||
| Code generated by AI can be of low quality. Contributors are expected to | ||
| submit code that meets JupyterHub's standards. We will reject pull requests that we | ||
| deem as [AI slop](https://en.wikipedia.org/wiki/AI_slop). Do not waste | ||
| developers' time by submitting code that is fully or mostly generated by AI, and | ||
| doesn't meet our standards. | ||
|
|
||
| ### Copyright | ||
|
|
||
| All code in JupyterHub is released under the BSD 3-clause copyright license. | ||
| Contributors to JupyterHub license their code under the same license when it is | ||
| included into JupyterHub's version control repository. That means contributors must | ||
| own the copyright of any code submitted to JupyterHub or must include the BSD | ||
| 3-clause compatible open source license(s) associated with the submitted code | ||
| in the patch. Code generated by AI may infringe on copyright and it is the | ||
| submitter's responsibility to not infringe. We reserve the right to reject any pull | ||
| requests, AI generated or not, where the copyright is in question. | ||
| Because closed commercial LLMs do not appropriately attribute their training set, their output is never acceptable. | ||
|
minrk marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Communication | ||
|
|
||
| When interacting with developers (forum, discussions, | ||
| issues, pull requests, etc.) do not use AI to speak for you (except for | ||
| translation). If the developers want to chat with a chatbot, | ||
| they can do so themselves. Human-to-human communication is essential for an | ||
| open source community to thrive. | ||
|
|
||
| ### AI Agents | ||
|
|
||
| The use of an AI agent that writes code and then submits a pull request autonomously is | ||
| not permitted. | ||
| We will not include or accept supporting resources in `AGENTS.md`, `CLAUDE.md` or similar files to our repos as we prefer human contributors follow the | ||
| [contribution guidelines](guide.md) that already exist. | ||
| If present, these files shall only include instructions to _prevent_ generating contributions, such as that [used by lobste.rs](https://github.com/lobsters/lobsters/blob/f2b1ccaa21af0721db73b77b7fb8bffe7ba12ffb/AGENTS.md). | ||
|
|
||
|
|
||
| ## Other Resources | ||
|
|
||
| While these do not formally form part of JupyterHub's AI policy, the following resources | ||
| may be helpful in understanding some pitfalls associated with using AI to contribute to | ||
| JupyterHub: | ||
| - https://llvm.org/docs/AIToolPolicy.html | ||
| - https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/ | ||
| - https://jellyfin.org/docs/general/contributing/llm-policies/ | ||
|
|
||
| ## Acknowledgements | ||
|
|
||
| We thank the SciPy developers for [their AI | ||
| policy](https://github.com/scipy/scipy/blob/main/doc/source/dev/conduct/ai_policy.rst), | ||
| upon which the policy section of this document is largely based. They in turn credit the | ||
| [SymPy AI Policy](https://docs.sympy.org/dev/contributing/ai-generated-code-policy.html). | ||
|
|
||
| ## Alignment and agreement | ||
|
|
||
| The [JupyterHub team](https://compass.hub.jupyter.org/team/) is diverse and passionate. | ||
| We do not all hold the same perspectives around the differential - and different - harms and benefits that can come from the training and deployment of large language models. | ||
|
|
||
| In our creation of this policy we sought to maximise our interpersonal alignment, compromising where appropriate to reach agreement on an actionable policy. | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there has to be mention of this hard-line at the top-level if contributors are coming here for a quick glance for how to get started with contributions. I echo @consideRatio 's sentiment that this gets lost in the details further down, and is important enough to highlight since this should have a material effect on a PR workflow and probably the most contentious effect of this policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to reconcile that statement with accepting any use of Claude Code, Codex, Copilot CLI, or autocomplete from associated companies' models that leads to any character written in the PR process.
I know almost nothing about copyright (especially not in an international context), but I figure with a statement like this retained in the policy, it would be better to clarify the statement implications concretely and discuss if we align and accept them rather than leaving it in for people to interpret differently. For me it reads extremely strict, while I don't pick up in this discussion that its meant to be interpreted that strict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a suggestion to remove that sentence in #880 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, the existence of this statement matches with my interpretation of this policy and the discussion here so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My perception is influenced mostly by #880 (comment) and the fact that its not being stated at the top of the policy currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely seems important enough to be at the top of the policy. But I also wouldn't complain if this was merged without that change, and would be happy to open a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not interpret this statement @consideRatio as a ban on all AI tool usage. It definitely discourages the "throw it over the wall" and "lacking context of the project" AI slop. For contributors who may be unsure, they can ask or disclose what has been used as a first step which we can guide in a PR template.
I would recommend leaving as written for iteration 1. A separate issue can be created to discuss language improvement for a v2 of the policy.
@minrk I think this PR has reached general consensus. Let's aim for "good/well reasoned" for now over capturing everything. There's been wonderful input from the team across the board. I'm happy to press the green button to merge or leave it in your capable hands.