When should a task be done by AI, and when should it be done by a human?

The key to getting the best results from any task.

I think a lot of people’s AI learning follows a similar path:

Start with basic queries that chatbots handle well: prompts that are an alternative to Google, generating images, summarising documents, etc.
Learn better prompting skills and gain experience through trial-and-error. Treat it like a teammate rather than a tool, give the LLM proper context, edit the prompt rather than giving feedback, and so on.
Get overconfident and ask AI to do too much, then get disappointed with the results. You ask it to create a whole presentation or design an entire onboarding flow, and the output is... terrible.

This is when you realise that to get the best results from AI, you have to break down tasks and figure out which parts should be done by a human and which by the LLM.

Almost every design and research team is trying to figure this out at the moment. They’re going through their processes step by step and asking the same question: what is the best blend of human and I, for any given task?

What to consider when deciding who (or what) does each task

The trap a lot of people fall into is asking the AI to do too much, in one go.

When a task feels too risky to hand over or AI just isn’t hitting the quality bar, break it down further. A high-risk task often contains smaller pieces that are lower risk. Split it up and you’ll usually find parts you can comfortably pass to AI, even if the whole thing feels too much for it to do.

This then allows you to evaluate a task against a number of questions:

How high does the quality need to be?
How much input will you have to give, to get that quality?
How long will it take?
What will it cost?
How risky is it: how much does it matter if it’s wrong?

Transcribing a meeting is low risk and needs almost nothing from you, so you can let AI handle it. Writing the executive summary for six months of work is high risk and has a high quality bar, so you’ll need to give a lot more input.

Picking how to work together

Given the profile of the task, you then need to figure out how you’ll work with AI.

Say if you are writing something, there are many ways to do it:

Fully delegate: You give it a brief, it writes the whole thing for you.
You outline, it builds on it: You write a skeleton and it fills in the gaps.
Draft for you: AI writes a first draft and then you edit and build on it.
You dictate, it writes: You give it the full content, but it’s doing the writing for you like a 1950s secretary.
You write, it critiques: You write by hand, it reviews your work and gives you feedback.

There are so many ways to approach a task with AI, so which do you choose? You can easily end up picking the wrong approach and have to backtrack and try another.

This is one of the best examples of where learning how to get the best from AI simply comes from learning by doing. But as a crude rule of thumb, here’s how I would try and summarise what I’ve learned:

Is this a task that the AI can verify it has got the correct answer for? (Coding, data analysis, etc.) → Safer to delegate.
Is this something where there is no correct answer, and quality matters a lot? (Writing, design, etc.) → You lead at the start and end of the process, but it can help you in the middle.

A worked example

Imagine you’re writing a conference talk. This is high risk. People are coming to hear your expertise, and if it’s weak it reflects on you, so the quality bar is as high as it gets.

You could ask AI to create the whole thing: research, narrative, slides, speaker notes. It’ll produce something, but it won’t be as good as what you’d make, because it’s doing every sub-task whether it’s good at it or not. You’ll end up on stage presenting ideas that aren’t quite yours and people will know something is off.

Instead, what you could do is:

You dictate your ideas for the talk and it pulls out the key arguments.
You use it as a sounding board to expand and test your ideas.
You craft the narrative yourself, because that’s the high-value part, and let AI write an outline.
Once the slides are drafted it can review the deck against your audience and flag what’s missing.
When you rehearse, it listens to your run-through and tells you which points you skipped or misrepresented.

The key is to assign tasks to who is best to do them, whether that’s you or AI. If you just delegate everything, it forces AI to do parts it’s not good at.

When you delegate important decisions, you get slop

I love this quote from Anu Atluru:

Slop is the absence of decisions and, more critically, discernment.

When you produce any work, you make hundreds of tiny decisions. For any part of work that you handover to AI, you are asking it to make those decisions for you. When we hand over too much decision-making to AI for work that demands a high quality bar, people notice and it comes across as ‘slop’: insincere, mediocre and average.

Learning when to use AI is therefore an important skill to develop. People talk about a ‘human-in-the-loop’ and who that human is matters a lot. It’s tempting to think this is all about domain expertise, but it isn’t enough on its own. You can be brilliant at your craft and still get mediocre output from AI, because you haven’t developed the judgement of when to use it and when not to.

When should a task be done by AI, and when should it be done by a human?

What to consider when deciding who (or what) does each task

Picking how to work together

A worked example

When you delegate important decisions, you get slop

Read more

How I built Pegs Out, my first iOS app

The anatomy of an AI agent

Do researchers still need to take notes in interviews?

Evals are for everyone

What to consider when deciding who (or what) does each task

Picking how to work together

A worked example

When you delegate important decisions, you get slop

Sign up for Desk Notes

Read more

How I built Pegs Out, my first iOS app

The anatomy of an AI agent

Do researchers still need to take notes in interviews?

Evals are for everyone