HackerNews中文版

以下是我们创业公司使用 LLM 的方式。我们有一个单体仓库，其中包含已排程的 Python 数据工作流程、两个 Next.js 应用程序和一个小型工程团队。我们使用 GitHub 进行 SCM 和 CI/CD，部署到 GCP 和 Vercel，并高度依赖自动化。本地开发：每位工程师都获得 Cursor Pro（加上 Bugbot）、Gemini Pro、OpenAI Pro，以及可选的 Claude Pro。我们并不在意大家使用哪个模型。实际上，LLM 相当于每位工程师配备了大约 1.5 位优秀的初级/中级工程师，因此为多个模型付费是完全值得的。我们高度依赖 pre-commit hooks：ty、ruff、TypeScript 检查、跨所有语言的测试、格式化和其他保护措施。所有内容都自动格式化。LLM 使编写类型和测试变得容易得多，尽管复杂的类型仍然需要一些人工指导。 GitHub + Copilot 工作流程：我们主要为 GitHub Enterprise 付费，因为它允许将问题分配给 Copilot，然后 Copilot 会打开一个 PR。我们的规则很简单：如果你打开一个问题，就把它分配给 Copilot。每个问题都会附带一个代码尝试。对于大量的 PR，我们没有任何顾虑。我们经常删除不使用的 PR。我们使用 Turborepo 管理单体仓库，并且在 Python 方面完全使用 uv。所有编码实践都编码在 .cursor/rules 文件中。例如：“如果你正在进行数据库工作，只编辑 Drizzle 的 schema.ts，不要手写 SQL。” Cursor 通常会遵守这一点，但其他工具很难始终如一地读取或遵循这些规则，无论我们添加多少 agent.md 风格的文件。我个人的开发循环：如果我在旅途中看到一个 bug 或有一个想法，我会通过 Slack、移动设备或网页打开一个 GitHub 问题，并将其分配给 Copilot。有时问题很详细，有时只有一句话。Copilot 会打开一个 PR，我稍后会进行审查。如果我在键盘前，我会在 Cursor 中以 Git 工作树中的 agent 身份开始工作，使用最佳模型。我迭代直到满意为止，要求 LLM 编写测试，审查所有内容，然后推送到 GitHub。在人工审查之前，我让 Cursor Bugbot、Copilot 和 GitHub CodeQL 审查代码，并要求 Copilot 修复它们标记的任何问题。仍然很痛苦的事情：要真正知道代码是否有效，我需要运行 Temporal、两个 Next.js 应用程序、几个 Python worker 和一个 Node worker。其中一些已 Docker 化，有些则没有。然后我需要一个浏览器来运行手动检查。据我所知，没有服务可以让我：提供提示、编写代码、启动所有这些基础设施、运行 Playwright、处理数据库迁移，并让我手动检查系统。我们用 GitHub Actions 来近似实现这一点，但这无助于手动验证或数据库工作。 Copilot 在分配问题或代码审查期间不允许你选择模型。它使用的模型通常很糟糕。你可以在 Copilot 聊天中选择一个模型，但在问题、PR 或审查中不行。 Cursor + 工作树 + agents 简直糟透了。工作树从源仓库克隆，包括未暂存的文件，因此如果你想要一个干净的 agent 环境，你的主仓库必须是干净的。有时感觉直接将仓库克隆到一个新目录中比使用工作树更简单。运行良好的地方：由于我们不断启动 agents，我们的单体仓库设置脚本经过了充分的测试并且可靠。它们也可以干净地转换为 CI/CD。大约 25% 的“打开问题 → Copilot PR”结果可以直接合并。这并不惊人，但总比零好，并且在添加一些注释后可以达到约 50%。如果 Copilot 更可靠地遵循我们的设置说明或允许我们使用更强大的模型，这个比例会更高。总的来说，每月花费大约 1000 美元，我们相当于每位工程师增加了 1.5 位初级/中级工程师。这些“LLM 工程师”总是编写测试、遵循标准、生成良好的提交消息，并且 24/7 全天候工作。在审查和跨 agents 切换上下文时存在摩擦，但这可以管理。你们在生产系统中如何进行氛围编码？

查看原文

Here’s how we’re working with LLMs at my startup.We have a monorepo with scheduled Python data workflows, two Next.js apps, and a small engineering team. We use GitHub for SCM and CI/CD, deploy to GCP and Vercel, and lean heavily on automation.Local development: Every engineer gets Cursor Pro (plus Bugbot), Gemini Pro, OpenAI Pro, and optionally Claude Pro. We don’t really care which model people use. In practice, LLMs are worth about 1.5 excellent junior/mid-level engineers per engineer, so paying for multiple models is easily worth it.We rely heavily on pre-commit hooks: ty, ruff, TypeScript checks, tests across all languages, formatting, and other guards. Everything is auto-formatted. LLMs make types and tests much easier to write, though complex typing still needs some hand-holding.GitHub + Copilot workflow: We pay for GitHub Enterprise primarily because it allows assigning issues to Copilot, which then opens a PR. Our rule is simple: if you open an issue, you assign it to Copilot. Every issue gets a code attempt attached to it.There’s no stigma around lots of PRs. We frequently delete ones we don’t use.We use Turborepo for the monorepo and are fully uv on the Python side.All coding practices are encoded in .cursor/rules files. For example: “If you are doing database work, only edit Drizzle’s schema.ts and don’t hand-write SQL.” Cursor generally respects this, but other tools struggle to consistently read or follow these rules no matter how many agent.md-style files we add.My personal dev loop: If I’m on the go and see a bug or have an idea, I open a GitHub issue (via Slack, mobile, or web) and assign it to Copilot. Sometimes the issue is detailed; sometimes a single sentence. Copilot opens a PR, and I review it later.If I’m at the keyboard, I start in Cursor as an agent in a Git worktree, using whatever the best model is. I iterate until I’m happy, ask the LLM to write tests, review everything, and push to GitHub. Before a human review, I let Cursor Bugbot, Copilot, and GitHub CodeQL review the code, and ask Copilot to fix anything they flag.Things that are still painful: To really know if code works, I need to run Temporal, two Next.js apps, several Python workers, and a Node worker. Some of this is Dockerized, some isn’t. Then I need a browser to run manual checks.AFAICT, there’s no service that lets me: give a prompt, write the code, spin up all this infra, run Playwright, handle database migrations, and let me manually poke at the system. We approximate this with GitHub Actions, but that doesn’t help with manual verification or DB work.Copilot doesn’t let you choose a model when assigning an issue or during code review. The model it uses is generally bad. You can pick a model in Copilot chat, but not in issues, PRs or reviews.Cursor + worktrees + agents suck. Worktrees clone from the source repo including unstaged files, so if you want a clean agent environment, your main repo has to be clean. At times it feels simpler to just clone the repo into a new directory instead of using worktrees.What’s working well: Because we constantly spin up agents, our monorepo setup scripts are well-tested and reliable. They also translate cleanly into CI/CD.Roughly 25% of “open issue → Copilot PR” results are mergeable as-is. That’s not amazing, but better than zero, and it gets to ~50% with a few comments. This would be higher if Copilot followed our setup instructions more reliably or let us use stronger models.Overall, for roughly $1k/month, we’re getting the equivalent of 1.5 additional junior/mid engineers per engineer. Those “LLM engineers” always write tests, follow standards, produce good commit messages, and work 24/7. There’s friction in reviewing and context-switching across agents, but it’s manageable.What are you doing for vibe coding in a production system?

提问 HN：你如何在已有的代码库中保持良好的编码氛围？