#4 Github Copilot Lawsuit

Aug 11, 2024

Big News in 𝗚𝗶𝘁𝗛𝘂𝗯 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝗟𝗮𝘄𝘀𝘂𝗶𝘁!

A judge has thrown out most claims in the high-profile lawsuit against GitHub, Microsoft, and OpenAI over the AI coding assistant GitHub Copilot.
Here's what happened:

𝗕𝗮𝗰𝗸𝗴𝗿𝗼𝘂𝗻𝗱: In 2022, developers filed a $1B class-action lawsuit alleging Copilot violated copyright laws by utilizing code from GitHub repositories without proper attribution or adherence to licensing terms

𝗥𝗲𝗰𝗲𝗻𝘁 𝗥𝘂𝗹𝗶𝗻𝗴: Of 22 initial claims, 20 have now been dismissed, including a crucial allegation under the 𝗗𝗶𝗴𝗶𝘁𝗮𝗹 𝗠𝗶𝗹𝗹𝗲𝗻𝗻𝗶𝘂𝗺 𝗖𝗼𝗽𝘆𝗿𝗶𝗴𝗵𝘁 𝗔𝗰𝘁 (𝗗𝗠𝗖𝗔) section 1202(b). DMCA claim alleged that Copilot removed essential copyright information when suggesting code snippets

𝗪𝗵𝗮𝘁 𝗱𝗼𝗲𝘀 𝗗𝗠𝗖𝗔 𝘀𝗮𝘆: —No person shall, without the authority of the copyright owner or the law—
(1)intentionally remove or alter any copyright management information,
(2)distribute or import for distribution copyright management information..
(3)distribute, import for distribution, or publicly perform works, copies of works.. (details in comments)

𝗧𝘂𝗿𝗻𝗶𝗻𝗴 𝗣𝗼𝗶𝗻𝘁: Judge Tigar found insufficient evidence of substantial code similarity and rejected claims of exact code reproduction. One possible reason could be 𝗚𝗶𝘁𝗛𝘂𝗯'𝘀 𝗮𝗱𝗷𝘂𝘀𝘁𝗺𝗲𝗻𝘁𝘀 𝘁𝗼 𝗖𝗼𝗽𝗶𝗹𝗼𝘁, which were designed to generate variations of training code rather than exact copies, thereby avoiding direct infringement accusations

𝗥𝗲𝗺𝗮𝗶𝗻𝗶𝗻𝗴 𝗖𝗹𝗮𝗶𝗺𝘀: Only two claims survive - open-source license violation and breach of contract.

𝗘𝗨 𝗔𝗜 𝗔𝗰𝘁 𝗣𝗲𝗿𝘀𝗽𝗲𝗰𝘁𝗶𝘃𝗲: While the EU AI Act doesn't directly address these specific issues, it does emphasize:

-Compliance with copyright laws when using data to train AI systems (Article 10)
-Transparency requirements for AI systems, including documentation of data sources (Article 13)
-Obligations for AI providers that could relate to licensing and contractual issues (Chapter 3)

Our 𝗧𝗮𝗸𝗲: This ruling is critical as it could set precedents for how copyright law applies to AI-generated content. The dismissal of copyright claims may encourage AI companies to continue using publicly available code for training purposes with slight adjustments 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗿𝗲𝗾𝘂𝗶𝗿𝗲𝗱 𝗰𝗼𝗻𝘀𝗲𝗻𝘁/ 𝗽𝗲𝗿𝗺𝗶𝘀𝘀𝗶𝗼𝗻𝘀. I strongly feel these type of judgements could lead to potential misuse of developers' work 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗽𝗿𝗼𝗽𝗲𝗿 𝗰𝗿𝗲𝗱𝗶𝘁 𝗼𝗿 𝗰𝗼𝗺𝗽𝗲𝗻𝘀𝗮𝘁𝗶𝗼𝗻. This case also highlights the need for techniques like 𝗗𝗘-𝗖𝗢𝗣, 𝗠𝗜𝗡-𝗞 𝗣𝗥𝗢𝗕 to detect Copyrighted Content in LLM Training Data (link in comments)

The Responsible AI Digest by SoRAI (formerly ABCP)

Discussion about this post