Loading...
Loading...
84% of developers use AI coding tools. Only 29% trust the output. Something is off, and it's worth talking about.
I've been using Cursor daily for about eight months now. Before that, GitHub Copilot for a year. And I'll be honest: I don't fully trust what either of them produce.
That's not a complaint. It's more of an observation about where we are right now.
According to the 2025 Stack Overflow Developer Survey, 84% of developers use or plan to use AI coding tools. That number is up from 76% the year before. But here's the part that stuck with me: only 29% say they actually trust the output. That's down from 40% in 2024.
More people using it. Fewer people trusting it. That gap is interesting.
My first guess would be familiarity. When you're new to a tool, you don't know enough to catch its mistakes. After a few months, you do. You start noticing the confidently wrong SQL queries. The test that passes but doesn't actually test anything. The "fixed" bug that introduces two new ones three files away.
This isn't a knock on the tools. It's just what happens when you use something long enough to understand its failure modes.
The other part is stakes. Early on, most people use AI tools for low-risk stuff: boilerplate, one-off scripts, README drafts. As adoption matures, the tasks get harder. You're asking it to touch authentication logic, refactor a payment flow, write migration scripts. The margin for error gets much thinner, and that's where trust starts to matter more.
There's a team called Kyrylai, a Toronto-based venture studio. Eight people, including four interns and one full-stack engineer, used Cursor to deliver one production-ready product, a semi-production tool, and three proof-of-concept projects in ten weeks. That's a real result. Cursor plus Claude Code, 26 pull requests a week, average merge time of about 10 hours.
At Kalvium Labs, over 200 engineers standardized on Cursor for coding and Claude Code for PR reviews. They cut time-to-first-commit by 35 to 40% and their AI review process caught two to three potential production bugs every week.
Those are genuinely good numbers. But the same research also found that 45% of AI-generated code contains vulnerabilities like command injection or hardcoded secrets. A 2025 audit of 1,645 web apps generated by Lovable found that 10% had critical vulnerabilities exposing user data.
So the productivity gains are real. The security risks are also real. Both things are true at the same time.
Here's something I think about more than the security stuff: what happens to junior developers?
61% of junior developers say the job market is challenging right now, compared to 34% of seniors. AI handles a lot of what used to be entry-level work. That's not speculation, that's what several hiring managers have said out loud this year.
If the pipeline of junior developers dries up because companies stop hiring them, where do senior developers come from in five years? Senior engineers got good at this job by doing entry-level work first. I don't have a clean answer to this. I just think it's a real problem that's getting glossed over in most "AI is transforming development" takes.
After a lot of trial and error, here's roughly what my workflow looks like:
High trust (let it run):
- Generating test fixtures and mock data
- Writing first drafts of documentation
- Scaffolding repetitive CRUD code
- Explaining unfamiliar code I'm reading
Low trust (always review carefully):
- Anything touching auth or permissions
- Database migrations
- External API integrations
- Anything that handles money
The tool is most useful when I give it enough context and a narrow scope. "Write a function that does X, here's the type it receives and the type it should return" works much better than "implement user authentication."
The biggest shift wasn't speed. It was where I spend my time.
I spend less time on the parts of coding I never found interesting anyway: boilerplate, repetitive patterns, looking up library syntax I've used a hundred times. I spend more time on the parts that are actually hard: figuring out what the system should do, catching edge cases, reviewing output and making sure it's not subtly wrong.
Whether that's a net positive depends on what you liked about programming in the first place. If you liked the craftsmanship of writing every line yourself, it probably feels like something got taken away. If you liked solving hard problems and the boilerplate was always just noise, it feels like a pretty good trade.
I'm somewhere in the middle, if I'm honest.
I don't think the trust gap closes just by making the models better. Even if accuracy improves, trust is also about understanding the failure modes and having enough experience to catch them. That takes time regardless of model quality.
What I do think is that the developers who end up thriving with these tools are the ones treating AI output like code from a smart but distracted coworker. You don't blindly merge their PRs. You read them, you think about them, and sometimes you push back. That relationship is actually pretty manageable once you stop expecting the tool to be either magic or useless.