-
Notifications
You must be signed in to change notification settings - Fork 41
proposal: llm contribution policy #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 19 commits
bf89089
b14b690
00410d3
f750b38
95452bb
b22b123
9f2e702
56187f0
059e5f3
1d1dce7
f272eb4
ddebd6c
91308c3
4aea7e6
a40c3c9
41994b1
3078379
319d691
8ba6ba6
e3fb80a
5ba0eab
a2fdd04
7d4332d
ec1bb72
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| (llm-policy)= | ||
|
|
||
| # Generative AI/LLM Contribution Policy | ||
|
|
||
|
|
||
| ```{note} | ||
| "AI" herein refers to generative AI tools like large language models (LLMs) | ||
| that can generate, edit, and review software code, create and manipulate | ||
| images, or generate human-like communication. | ||
| ``` | ||
|
|
||
| We ask that all contributions (through issues and pull requests) are made by a human who takes responsibility for the code, documentation, or comment they submit. | ||
| In short: LLM-generated contributions are not accepted, | ||
| and please make sure you are comfortable saying "I wrote this" about everything you submit to the project. | ||
|
|
||
|
minrk marked this conversation as resolved.
|
||
|
|
||
| ## Principles | ||
|
|
||
| We, the JupyterHub community, value: | ||
|
|
||
| * **human co-creation**: we are proud to collaborate with developers, researchers, educators, and infrastructure specialists across our global community. | ||
|
minrk marked this conversation as resolved.
Outdated
|
||
| * We respect **their copyright** over their work and appreciate their shared investment in the scientific open source ecosystem by licensing the materials under the [BSD 3-Clause "New" or "Revised" License](https://github.com/jupyterhub/jupyterhub/blob/main/LICENSE). | ||
| * We respect the **time** taken to read and review contributions, maintaining a high standard and supportive community engagement. | ||
| * **security**: JupyterHub is trusted infrastructure for hundreds of thousands of users and we prioritize keeping their data, code, and personal configuration information secure. | ||
| * **veracity**: beyond security, we ensure our tools do what we think they do, and that we respect accuracy in communications within and beyond the scientific open source ecosystem. | ||
| * **our global society**: we seek to minimise environmental impact and human exploitation in the development and deployment of JupyterHub infrastructure. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I love this section!
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Big +1. This is what almost always gets left out! JupyterHub can not exist in a world that's 2+ Celcius warmer on average, not to mention the other potential economic / political harms.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recognize this is hard given that we rely heavily on non-LLM services from these same organizations for many other things (like this GitHub we are using). However, I do think magnitude matters ("dose makes the poison"), as well as recognizing learned helplessness around what is 'inevitable' and what is not. I appreciate the current wording's use of 'minimize', as that is in fact the best we can do.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm glad this section resonates! I think its important to remind us all what we are FOR rather than just what we are against 💖 I also didn't figure out how to include the political harms / threats to democracy .... but I'm very happy to include those (as they're one of the things that I am most concerned about!) Happy to iterate on any suggested phrasings (and I'll noodle myself) if folks think that would be a good additional bullet point?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this section too! +1 to rooting ourselves in principles, values, and goals. |
||
|
|
||
| ## Our concerns | ||
|
|
||
| "Generative AI" tools, such as LLMs, are | ||
|
|
||
| * contributing to rapid **over-burdening** of the reviewing capacity for many open source projects. | ||
| * Unlike human contributors, models _cannot_ learn from feedback, vastly diminishing the long-term community benefit of the review process. | ||
| * trained on very large datasets, leaving many unsettled questions about copyright and consent. [US Copyright guidance](https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf) | ||
|
|
||
| And in particular, the models in use by the commercial AI industry, which account for the vast majority of LLM-generated contributions today, are particularly destructive, as they are: | ||
|
|
||
| * trained on very large datasets without credit or consent and do not respect the licenses (or lack thereof) of this input data. | ||
| * Even _if_ copyright of LLM output is settled safely worldwide, the training of models on open-licensed and/or proprietary inputs and generating outputs stripped of credit remains objectionable. | ||
| * consuming very large amounts of energy and potable water, both during their training phases and as they are used to generate outputs. See the MIT Technology Review's summaries of the literature at their [Super Topic: AI and our energy future](https://www.technologyreview.com/supertopic/ai-energy-package/). | ||
|
|
||
| ## Policy | ||
|
|
||
|
minrk marked this conversation as resolved.
|
||
| ### Responsibility | ||
|
|
||
| You are responsible for any code you submit to JupyterHub's repositories, regardless | ||
| of whether it was manually written or generated by AI. You must understand and be able | ||
| to explain the code you submit as well as the existing related code. It is not | ||
| acceptable to submit a patch that you cannot understand and explain yourself. | ||
| In explaining your contribution, do not use AI to automatically generate | ||
| descriptions. Always make sure you are comfortable saying "I wrote this" | ||
| before submitting code or a comment. | ||
|
|
||
| ### Disclosure | ||
|
|
||
| You must disclose whether AI has been used to assist in the development of | ||
| your pull request. | ||
| If so, you must document which tool(s) have been used, how they were used, | ||
| and specify what code or text is AI generated. We will reject any pull request | ||
| that does not include the disclosure. | ||
|
|
||
|
minrk marked this conversation as resolved.
Outdated
|
||
| ### Code Quality | ||
|
|
||
| Code generated by AI can be of low quality. Contributors are expected to | ||
| submit code that meets JupyterHub's standards. We will reject pull requests that we | ||
| deem being [AI slop](https://en.wikipedia.org/wiki/AI_slop). Do not waste | ||
|
minrk marked this conversation as resolved.
Outdated
|
||
| developers' time by submitting code that is fully or mostly generated by AI, and | ||
| doesn't meet our standards. | ||
|
|
||
| ### Copyright | ||
|
|
||
| All code in JupyterHub is released under the BSD 3-clause copyright license. | ||
| Contributors to JupyterHub license their code under the same license when it is | ||
| included into JupyterHub's version control repository. That means contributors must | ||
| own the copyright of any code submitted to JupyterHub or must include the BSD | ||
| 3-clause compatible open source license(s) associated with the submitted code | ||
| in the patch. Code generated by AI may infringe on copyright and it is the | ||
| submitter's responsibility to not infringe. We reserve the right to reject any pull | ||
| requests, AI generated or not, where the copyright is in question. | ||
| Because closed commercial LLMs do not appropriately attribute their training set, their output is never acceptable. | ||
|
minrk marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Communication | ||
|
|
||
| When interacting with developers (forum, discussions, | ||
| issues, pull requests, etc.) do not use AI to speak for you (except for | ||
| translation). If the developers want to chat with a chatbot, | ||
| they can do so themselves. Human-to-human communication is essential for an | ||
| open source community to thrive. | ||
|
|
||
| ### AI Agents | ||
|
|
||
| The use of an AI agent that writes code and then submits a pull request autonomously is | ||
| not permitted. | ||
| We will not include or accept supporting resources in `AGENTS.md`, `CLAUDE.md` or similar files to our repos as we prefer human contributors follow the | ||
| [contribution guidelines](guide.md) that already exist. | ||
| If present, these files shall only include instructions to _prevent_ generating contributions, such as that [used by lobste.rs](https://github.com/lobsters/lobsters/blob/f2b1ccaa21af0721db73b77b7fb8bffe7ba12ffb/AGENTS.md). | ||
|
|
||
| ### Other Resources | ||
|
|
||
| While these do not formally form part of JupyterHub's AI policy, the following resources | ||
| may be helpful in understanding some pitfalls associated with using AI to contribute to | ||
| JupyterHub: | ||
|
minrk marked this conversation as resolved.
Outdated
|
||
|
|
||
|
minrk marked this conversation as resolved.
|
||
| - https://llvm.org/docs/AIToolPolicy.html | ||
| - https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/ | ||
| - https://jellyfin.org/docs/general/contributing/llm-policies/ | ||
|
|
||
| ## Acknowledgements | ||
|
|
||
| We thank the SciPy developers for [their AI | ||
| policy](https://github.com/scipy/scipy/blob/main/doc/source/dev/conduct/ai_policy.rst), | ||
| upon which the policy section of this document is largely based. They in turn credit the | ||
| [SymPy AI Policy](https://docs.sympy.org/dev/contributing/ai-generated-code-policy.html). | ||
|
|
||
| ## Alignment and agreement | ||
|
|
||
| The [JupyterHub team](https://compass.hub.jupyter.org/team/) is diverse and passionate. | ||
| We do not all hold the same perspectives around the differential - and different - harms and benefits that can come from the training and deployment of large language models. | ||
|
|
||
| In our creation of this policy we sought to maximise our interpersonal alignment, compromising where appropriate to reach agreement on an actionable policy. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the language that people must take responsibility for their code.