How to use AI to generate test drafts safely

AI is useful for tests when it removes the blank-page cost. It becomes dangerous when developers start treating generated tests as proof that behavior is already understood.

The safe use case is narrow and practical: you already know what the module should do, but you do not want to start from an empty test file. That is the gap AI can close.

This post belongs next to the broader workflow on the developer AI unit page, the review workflow in AI code review checklist, and the wider starter guide in practical ways developers can start with AI.

1. Start from code whose behavior you already trust

The worst starting point is a vague feature branch where the logic is still moving. The best starting point is a stable helper, parser, validator, formatter, or service function whose expected behavior is already visible in the code.

Good fit: pure functions, validation rules, serializers, mapping logic
Riskier fit: auth flows, payment logic, race conditions, stateful async behavior

2. Ask for draft tests by path, not by confidence

Do not ask AI to “write the tests.” Ask it to draft three paths you can verify quickly:

normal path
failure path
edge case

That framing keeps the output small and reviewable. A short incomplete draft is safer than a polished fake-complete suite.

3. Review assertions before you review style

Most bad AI-generated tests do not fail because of formatting. They fail because the assertion is weak, the behavior is guessed, or the test never touches the real risk.

Check these first:

Is the assertion testing behavior or just implementation detail?
Does the failure path actually fail for the right reason?
Is the edge case a real boundary from production use?
Did the draft assume mocks, fixtures, or setup that do not exist?

A flat explanatory visual showing trusted behavior, an AI-generated test draft, and human review markers on assertions, edge cases, and hidden assumptions.

4. Treat hidden assumptions as the real bug source

The most expensive mistake is not a bad test name. It is the draft silently assuming context that is not there.

Examples:

a fixture that the codebase does not use
an env var that is never set in tests
a mocked response that does not match real production shape
a side effect that the draft forgot to assert

5. Keep one short prompt and one short checklist

A good prompt for this job is narrow on purpose.

Draft tests for this function. Split the output into normal path, failure path, and one realistic edge case. Prefer small assertions over a broad test suite. If any setup detail is unclear, mark it as uncertain instead of inventing it.

Then review the result with a fixed checklist:

Does each test map to a real behavior?
Is at least one failure path included?
Is the edge case based on real usage?
Did the draft invent setup or hidden context?
Can I explain why each assertion should pass or fail?

What to do first

Pick one stable module and ask AI for a three-path draft. Keep only the tests whose assertions you can defend immediately, then write the missing edge case yourself. That keeps AI in the role of acceleration, not authority.