← Back to field notes
No. 03 Field note

Do it all at once: parallel tool calls and how to build them

When an agent needs a few independent lookups, there's no reason to do them one at a time. Parallel tool calls, what the model is actually deciding, and the cheapest speed-up there is in an agent.

26 April 2026 · agents · tool-use · fundamentals · 6 min read

Most agents have to look a few things up before they can answer. Not one thing, several. And a lot of the time those lookups have nothing to do with each other: a record out of the database, a search result, a document pulled from somewhere else. The agent needs all of it before it can respond, but no single piece depends on any of the others.

The slow way to do that is in order. Fire off the first request, wait for it to come back, fire the second, wait again, then the third. Three trips out to the network, stacked one behind the other, each one sitting idle while the one before it finishes. The step ends up taking as long as all three added together, when it only ever needed to take as long as the slowest one. For a while I didn’t even clock it as a problem. The agent worked, it just felt sluggish, and the reason was sitting in the logs the whole time.

The fix has a name. Parallel tool calls.

Picking up from the loop

If you’ve read the ReAct piece, you know an agent runs a loop: a Thought, then an Action (one tool call), then an Observation, round and round. It’s easy to assume each pass does exactly one thing. One thought, one tool, one result.

Parallel tool calls is what happens when a single Action step fires off more than one tool call at the same time. The loop doesn’t change. What changes is how much can happen inside one turn of it. If the loop itself still feels fuzzy, go back to the ReAct piece first, because this sits right on top of it.

What it actually is

The idea is almost too simple to need a name. When an agent needs a few bits of information that don’t depend on each other, there’s no reason to go and get them one at a time. The database read doesn’t need anything from the search. The search doesn’t care about the document. So you send all three off together and pick the results up once they’re all back.

That’s it. The only thing standing between the slow version and the fast one is whether independent work waits in a queue or runs at the same time.

Sequential calls stack end to end; parallel calls overlap and finish in the time of the slowest one.

What the model actually does

Here’s the bit that caught me out the first time I looked properly. The parallelism isn’t something you build around the model. The model decides to do it.

In an ordinary turn, the model’s reply contains one tool call. In a parallel turn, that single reply contains several tool calls at once, packaged into the one message. The runtime sees there’s more than one, runs them at the same time instead of one after another, waits for all of them to land, and then hands every result back together as the observations for the next turn. The model reads the whole lot in one go and carries on from there.

So the call the model is making is really just one question: are these independent? If it reckons three lookups don’t rely on each other, it sends all three out together. If one of them needs the answer from another first - say it has to look up a user’s ID before it can pull that user’s orders - then it can’t send them together, because the second call doesn’t even exist yet until the first one comes back. In that case it goes one at a time, which is the right thing to do. Parallel calls only help when the work is genuinely independent, and it’s the model drawing that line, not you.

One model turn carrying several tool_use blocks, run in parallel, with their results funnelled back into the next turn's context.

How to build it

There’s less to do here than you’d think, because most of the work is the model’s. With the Anthropic API you don’t ask for parallel calls directly. You define your tools clearly, and the model comes back with several tool_use blocks in one response whenever it judges the calls to be independent. The part that’s on you is noticing there’s more than one, running them at the same time rather than looping through them, and sending all the results back together in the next message.

The snippet below shows that shape: one response carrying a few tool_use blocks, run together, with the results collected back into the conversation in a single turn. Watch what happens in STEP 2 of the run - two check_calendar calls land inside a single assistant turn, get run together, and feed back as one batch of observations.

parallel_tools.py · py
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

import anthropic
import json
from datetime import date

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"

# --- Tools -------------------------------------------------------------------

tools = [
    {
        "name": "list_emails",
        "description": (
            "Lists the emails of everyone in the team. Use this to look up an "
            "email before checking a calendar or creating an event. No parameters."
        ),
        "input_schema": {"type": "object", "properties": {}},
    },
    {
        "name": "check_calendar",
        "description": (
            "Checks the calendar of a specific person by email. You MUST call "
            "this for every attendee before creating a calendar event."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "email": {"type": "string", "format": "email"},
            },
            "required": ["email"],
        },
    },
    {
        "name": "create_calendar_event",
        "description": (
            "Creates a calendar event. Takes a title, ISO start/end times, and "
            "an optional list of attendee emails."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "title":     {"type": "string"},
                "start":     {"type": "string", "format": "date-time"},
                "end":       {"type": "string", "format": "date-time"},
                "attendees": {
                    "type": "array",
                    "items": {"type": "string", "format": "email"},
                },
            },
            "required": ["title", "start", "end"],
        },
    },
]

# --- Fake infra --------------------------------------------------------------

EMAILS = {
    "James Jones":  "james@abc.com",
    "Sarah Smith":  "sarah@abc.com",
    "Trish Taylor": "trish@abc.com",
}

CALENDAR = {
    "james@abc.com": [
        {"title": "Monthly Review",          "start": "2026-05-25T10:00:00Z", "end": "2026-05-25T11:00:00Z"},
        {"title": "Project Alignment Pitch", "start": "2026-05-26T14:00:00Z", "end": "2026-05-26T15:30:00Z"},
        {"title": "1-on-1 with Manager",     "start": "2026-05-28T09:00:00Z", "end": "2026-05-28T09:45:00Z"},
    ],
    "sarah@abc.com": [
        {"title": "Client Onboarding",    "start": "2026-05-25T13:00:00Z", "end": "2026-05-25T14:00:00Z"},
        {"title": "Marketing Brainstorm", "start": "2026-05-27T10:30:00Z", "end": "2026-05-27T12:00:00Z"},
    ],
    "trish@abc.com": [
        {"title": "Daily Standup",   "start": "2026-05-26T08:30:00Z", "end": "2026-05-26T09:00:00Z"},
        {"title": "Sprint Planning", "start": "2026-05-29T11:00:00Z", "end": "2026-05-29T13:00:00Z"},
    ],
}

def run_tool(name, tool_input):
    if name == "list_emails":
        return EMAILS
    if name == "check_calendar":
        return CALENDAR.get(tool_input["email"], [])
    if name == "create_calendar_event":
        return {"event_id": "evnt_123", "status": "success"}
    return {"error": f"Tool {name!r} not found"}

# --- Claude call -------------------------------------------------------------

SYSTEM = (
    "You are a personal assistant who manages the user's calendar. "
    f"Today's date is {date.today().isoformat()}."
)

def ask(messages):
    return client.messages.create(
        model=MODEL,
        system=SYSTEM,
        max_tokens=1024,
        tools=tools,
        thinking={"type": "adaptive"},
        tool_choice={"type": "auto"},
        messages=messages,
    )

# --- Output helpers ----------------------------------------------------------

def show(label, body):
    print(f"─── {label} " + "─" * max(0, 54 - len(label)))
    if body:
        print("  " + body.replace("\n", "\n  "))

def show_thoughts_and_text(step, blocks):
    for b in blocks:
        if b.type == "thinking" and b.thinking:
            show(f"STEP {step}  ·  THOUGHT", b.thinking.strip())
        elif b.type == "text" and b.text:
            show(f"STEP {step}  ·  SAYS", b.text.strip())

# --- The ReAct loop ----------------------------------------------------------

messages = [{
    "role": "user",
    "content": (
        "Create me a meeting event with James and Sarah please for next "
        "Wednesday at 2pm for 1 hour."
    ),
}]

show("SYSTEM", SYSTEM)
show("USER", messages[0]["content"])

response = ask(messages)

step = 1
while response.stop_reason == "tool_use":
    # 1. Show whatever the model said this turn
    show_thoughts_and_text(step, response.content)

    # 2. Run every tool call the model emitted, in order, collecting results
    tool_uses = [b for b in response.content if b.type == "tool_use"]
    tool_results = []
    for j, tu in enumerate(tool_uses, start=1):
        show(f"STEP {step}  ·  ACTION {j}",
             f"{tu.name}({json.dumps(tu.input)})")
        result = run_tool(tu.name, tu.input)
        show(f"STEP {step}  ·  OBSERVATION {j}", json.dumps(result))
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": tu.id,
            "content": json.dumps(result),
        })

    # 3. Feed every result back in a single user message
    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})

    response = ask(messages)
    step += 1

# Final turn
show_thoughts_and_text("FINAL", response.content)
print(f"─── stop_reason: {response.stop_reason} " + "─" * 18)
Listing 01 - a ReAct loop that supports parallel tool calls
› Run ran in 5.1s
$ python parallel_tools.py
─── SYSTEM ─────────────────────────────────────────────────
  You are a personal assistant who manages the user's calendar. Today's date is 2026-05-21.
─── USER ───────────────────────────────────────────────────
  Create me a meeting event with James and Sarah please for next Wednesday at 2pm for 1 hour.
─── STEP 1  ·  THOUGHT ─────────────────────────────────────
  The user wants to create a meeting with James and Sarah for next Wednesday
  (2026-05-27) at 2pm for 1 hour. I need to first get their emails using
  the list_emails tool, then check their calendars, and then create the event.

  Let me start by getting the email list.
─── STEP 1  ·  SAYS ────────────────────────────────────────
  Sure! Let me first look up James and Sarah's emails right away!
─── STEP 1  ·  ACTION 1 ────────────────────────────────────
  list_emails({})
─── STEP 1  ·  OBSERVATION 1 ───────────────────────────────
  {"James Jones": "james@abc.com", "Sarah Smith": "sarah@abc.com", "Trish Taylor": "trish@abc.com"}
─── STEP 2  ·  SAYS ────────────────────────────────────────
  Got their emails! Now let me check both James and Sarah's calendars at the same time to confirm they're available!
─── STEP 2  ·  ACTION 1 ────────────────────────────────────
  check_calendar({"email": "james@abc.com"})
─── STEP 2  ·  OBSERVATION 1 ───────────────────────────────
  [{"title": "Monthly Review", "start": "2026-05-25T10:00:00Z", "end": "2026-05-25T11:00:00Z"}, {"title": "Project Alignment Pitch", "start": "2026-05-26T14:00:00Z", "end": "2026-05-26T15:30:00Z"}, {"title": "1-on-1 with Manager", "start": "2026-05-28T09:00:00Z", "end": "2026-05-28T09:45:00Z"}]
─── STEP 2  ·  ACTION 2 ────────────────────────────────────
  check_calendar({"email": "sarah@abc.com"})
─── STEP 2  ·  OBSERVATION 2 ───────────────────────────────
  [{"title": "Client Onboarding", "start": "2026-05-25T13:00:00Z", "end": "2026-05-25T14:00:00Z"}, {"title": "Marketing Brainstorm", "start": "2026-05-27T10:30:00Z", "end": "2026-05-27T12:00:00Z"}]
─── STEP 3  ·  THOUGHT ─────────────────────────────────────
  Next Wednesday is 2026-05-27. The user wants a meeting at 2pm for 1 hour,
  so 2:00pm - 3:00pm.

  Let me check both calendars for conflicts on 2026-05-27 at 2pm (14:00):

  James: No events on 2026-05-27, so he's free.
  Sarah: Has "Marketing Brainstorm" from 10:30 to 12:00 on 2026-05-27, so she's free at 2pm.

  Both are available! Let me create the event.

  I'll set the meeting for 14:00:00Z to 15:00:00Z on 2026-05-27, using UTC since no specific timezone was mentioned.
─── STEP 3  ·  SAYS ────────────────────────────────────────
  Both James and Sarah are free next Wednesday at 2pm! Let me go ahead and create the meeting now.
─── STEP 3  ·  ACTION 1 ────────────────────────────────────
  create_calendar_event({"title": "Meeting", "start": "2026-05-27T14:00:00Z", "end": "2026-05-27T15:00:00Z", "attendees": ["james@abc.com", "sarah@abc.com"]})
─── STEP 3  ·  OBSERVATION 1 ───────────────────────────────
  {"event_id": "evnt_123", "status": "success"}
─── FINAL  ·  SAYS ─────────────────────────────────────────
  All done! Here's a summary of the event that's been created:

  - 📅 **Title:** Meeting
  - 📆 **Date:** Wednesday, 27th May 2026
  - 🕑 **Time:** 2:00pm - 3:00pm
  - 👥 **Attendees:** James Jones & Sarah Smith
─── stop_reason: end_turn ──────────────────
exit 0

Why it’s worth doing

Two reasons, both of them concrete. The first is latency. Three independent calls run together take about as long as the slowest one, not the sum of all three. On an agent that makes a handful of lookups per task, that’s the gap between something that feels quick and something that feels like it’s grinding. The second is that you get it for almost nothing. You’re not writing some clever orchestration layer. You’re defining tools well enough that the model can tell what’s independent, and then running what it gives you at the same time instead of in a loop.

It’s about the cheapest speed-up there is in an agent. The model does most of the thinking. You just have to stop putting independent work in a queue.

- Ben