HackerNews中文版

大家好，HN！我们是Adit和Raunak，Reducto（YC W24，<a href="https://reducto.ai">https://reducto.ai</a>）的联合创始人。Reducto可以将非结构化文档（例如，PDF、扫描件、电子表格）转化为结构化数据。这些数据可以用于检索、输入到LLM中，或在下游的其他地方使用。我们创建Reducto是因为我们意识到，当今许多AI应用都需要高质量的数据。大家都知道，好的输入会带来更好的输出，但世界上80%的数据仍然被困在像混乱的PDF和电子表格这样的文件中。Raunak和我推出了一个非常早期的MVP，用于解析和提取非结构化文档，并且很幸运地得到了技术团队的极大兴趣，他们意识到我们提供的准确性是他们前所未见的。我们最初只是发布了一个供工程师使用的API，但随着时间的推移，我们意识到一个准确的API只是拼图的一部分。我们的客户希望能够轻松地设置多步骤流程，评估和迭代他们用例中的性能，并与参与实际文档处理流程的非工程团队成员一起工作。这就是我们推出Reducto Studio的原因，这是一个基于我们API的Web平台，供用户构建和迭代端到端文档流程。使用Studio，您可以： - 放入整个文件集，并根据您的评估数据获得每个字段和每个文档的准确性评分。 - 自动生成并持续优化提取模式，以快速达到生产级质量。 - 保存每次运行，迭代解析/提取配置，并并排比较结果。您可以在这里查看一些示例 (<a href="https://studio.reducto.ai">https://studio.reducto.ai</a>) 或者您可以观看这个演练：<a href="https://www.loom.com/share/b243551741c642c6a594c00353fcecb3" rel="nofollow">https://www.loom.com/share/b243551741c642c6a594c00353fcecb3</a>。如果您想上传自己的文档，也可以登录并进行操作——我们不会要求您预约演示或预先付款来试用。感谢您的阅读和关注！这只是Studio的第一步，所以我们非常欢迎您对任何事情提出反馈：用户体验方面的不足（我们知道它们存在！）、可以使评估对您来说更好的功能、您遇到困难的文档，或者关于处理非结构化数据的任何其他问题。

查看原文

Hi HN! We’re Adit and Raunak, co-founders of Reducto (YC W24, <a href="https://reducto.ai">https://reducto.ai</a>). Reducto turns unstructured documents (e.g., PDFs, scans, spreadsheets) into structured data. This data can then be used for retrieval, passed into LLMs, or used elsewhere downstream.We started Reducto when we realized that so many of today’s AI applications require good quality data. Everyone knows that good inputs lead to better outputs, but 80% of the world’s data is still trapped inside of things like messy PDFs and spreadsheets. Raunak and I launched a really early MVP of parsing and extracting from unstructured documents, and were lucky to have a lot of interest from technical teams when they realized that the accuracy was something they hadn’t seen before.We started by just releasing an API for engineers to build with, but over time we realized that an accurate API was only part of the puzzle. Our customers wanted to be able to easily set up multi step pipelines, evaluate and iterate on performance within their use case, and work with non-engineering teammates that were also involved in the real world document processing flow.That’s why we’re launching Reducto Studio, a web platform that sits on top of our APIs for users to build and iterate on end-to-end document pipelines.With Studio, you can:- Drop an entire file set and get per-field and per-document accuracy scores against your eval data.- Auto-generate and continuously optimize extraction schemas to hit production-grade quality fast.- Save every run, iterate on parse/extract configs, and compare results side-by-side.You can see some examples here (<a href="https://studio.reducto.ai">https://studio.reducto.ai</a>) or you can watch this walkthrough: <a href="https://www.loom.com/share/b243551741c642c6a594c00353fcecb3" rel="nofollow">https://www.loom.com/share/b243551741c642c6a594c00353fcecb3</a>.If you’d like to upload your own document you can log in and do so as well - we don’t make you book a demo or put a payment down to try it.Thanks for reading and checking it out! This is only the first step for Studio, so we’d love feedback on anything: UX rough edges (we know they’re there!), features that would make evaluations better for you, hard documents you’ve had trouble with, or anything else about wrangling with unstructured data.

Launch HN: Reducto Studio (YC W24) – 快速构建精准文档处理流程