HackerNews中文版

伦理何在？呼吁全面禁止发布！目前所有可用的 AI 模型，以及未来可能出现的模型，都将基于公开数据集进行训练，这些数据集包括但不限于编程博客、问答网站、讨论论坛、个人博客、开源代码、免费书籍等等。然而，几乎所有这些模型背后的组织都从各种灰色地带甚至盗版内容中获取了数据，更不用说各种数字画廊、数字商店、流媒体平台上的艺术作品——照片、绘画、音乐等等，我无需一一列举所有来源。虽然这些数据对训练 AI 来说是好的，但它们也正在针对那些依靠创作和工作方式生存的程序员、创作者和作家。如果从一个依赖创意人士的角度来看，基于生成式 AI 的创作已经接管了这个过程，并且正在扼杀或已经扼杀了数百万依赖它的人。现在，大多数争论的焦点不是关于替代，而是为什么不使用 AI 来辅助这个过程。好吧，这只是偏离了这篇文章的主旨，对此我们没有异议。当人们将他们的作品作为开源发布时，他们做梦也不会想到，这种对私人信息和知识产权的侵犯，会被这些腐败的、以营利为目的的组织所利用，用来训练最终会取代他们的东西。为什么没有人站出来发声或采取法律行动？我知道有很多针对许多公司的集体诉讼和抗议，但损害已经造成。因此，我提出以下建议： * 既然公开的材料被用来训练任何 AI 模型，那就免费提供吧。（一开始未经同意就使用它就是错误的——如果有人投资了，那是他们的问题。不想谈论 CAPEX / OPEX 等等。） * 或者为训练中使用的每一条公开信息付费（由于版权、copyleft 以及其他管理 OSS 和其他免费使用媒体的无意义许可，这并不可行！） * 停止这样做——这不会发生…… 还有其他想法/抱怨吗？虽然我表达了一些我的想法，但我希望社区的意见能够集体行动起来。请花几分钟时间表达您的支持或反对，但请简洁地表达您的想法，以便在不久的将来将所有内容整理、呈现并采取行动（可能！）。再次感谢。

查看原文

Where are the ethics and call for a blanket restrain from publishing.While all the available AI models and probably future ones are going to be trained on publicly available data sets including but not limited to programming blogs, q & a sites, discussion forums, personal blogs, open source code, free books etc., almost all of the organisations behind the models have taken sources from all shady places and even pirated content and not to mention the art works - photos, paintings, music from various digital galleries, digital shops, streaming platforms etc., etc., and I don't have to list all sources.While it is all good for the training, the same is being targetted towards the fellow programmers, creators, writers who thrive on their creations and the way of work.If one looks at from a perspective of someone trying to do something that is dependent on creative people, the generative AI based creation has taken over the process and has killed or killing millions of people who are dependent. Now, most of the argument is not about the replacement, but rather why don't one use AI to support the process. Well, that's just a deviation from this article's motto and it is not disagreed.When people published their work as open source, not even in their dreams they would have thought that such breach of private information and intellectual properties will be swindled by such corrupt, for profit organisations to train something that will eventually replace them. Why is that no-one wants to voice against or take legal action. I'm aware that there are a lot of class action lawsuits and uproar against many companies, but the damage is already done.So, I'm proposing the following:- As how the publicly available materials were used to train whatever AI model, give it away for free. (It was wrong in the first place to use it with / without consent - If one had invested, it's their issue. Don't want to talk about CAPEX / OPEX etc,.). - Or pay for every open piece of information that was used in training (this is not an option due to the copyright and copyleft and other mindless licenses that govern OSS and other free to use media!) - Stop doing it - It will not happen...Any other thoughts / rants?While I have expressed some of my thoughts, I want the communities thoughts to collectively action. Please spend a few minutes to voice out whether in support or against, but be concise in expressing your thoughts so that all can be collated, presented and actioned (probably!) in the near future.Thanks again.

呼吁全面禁止开源贡献或发布