Dec 29, 2025

Tool Calling with the Foundation Models Framework: Building Agentic iOS Apps

A language model that can only generate text is like WALL-E stuck on Earth — it can observe and reason about the world around it, but it can’t actually reach out and change anything. Tool calling changes that equation. It lets the on-device model invoke your Swift functions, pull live data from MapKit or WeatherKit, query a local database, or trigger any side effect your app exposes — then fold the results back into its reasoning before responding to the user.

This post covers the Tool protocol in Apple’s Foundation Models framework end-to-end: defining tools, wiring up @Generable arguments, handling results, chaining multiple tools in a single inference pass, and designing the error boundaries that keep agentic features production-safe. We won’t cover prompt engineering strategies or @Generable fundamentals — those are handled in Designing Prompts for On-Device AI and Foundation Models: Structured Output with @Generable respectively.

Note: Tool calling requires iOS 26+ and macOS 26+. All code in this post requires @available(iOS 26, *) annotations or an appropriate deployment target. The Foundation Models framework was introduced at WWDC25 (see Meet the Foundation Models framework and Build agentic apps with the Foundation Models framework).

The Problem
How Tool Calling Works
Defining a Tool
Providing Tools to a Session
Returning Structured Results
Multiple Tools and Parallel Execution
Advanced Usage
Performance Considerations
When to Use (and When Not To)
Summary

The Problem

Consider an on-device assistant that helps users plan movie nights. The user asks: “What Pixar movies came out after 2020, and what’s the weather like tonight for an outdoor screening?” Without tool calling, your code has to anticipate every possible question, pre-fetch data, and stuff it all into the prompt context:

// The brute-force approach — pre-fetching everything into context
@available(iOS 26, *)
func answerMovieNightQuestion(_ question: String) async throws -> String {
    let allMovies = try await fetchPixarFilmography() // Entire catalog
    let weather = try await fetchWeatherForecast()     // Current conditions
    let nearbyTheaters = try await fetchTheaters()     // Location data

    let megaContext = """
    Available Pixar films: \(allMovies.map(\.title).joined(separator: ", "))
    Tonight's weather: \(weather.summary)
    Nearby theaters: \(nearbyTheaters.map(\.name).joined(separator: ", "))
    """

    let session = LanguageModelSession(instructions: megaContext)
    let response = try await session.respond(to: question)
    return response.content
}

This approach has three compounding problems:

Wasted context window. You’re burning tokens on data the model may never need. If the user asks about a single movie, the entire filmography and theater list are dead weight — and on-device context windows are limited.
Stale data. Everything is fetched once at session creation. If the user asks a follow-up question 20 minutes later, the weather data is outdated and you have no clean way to refresh it.
Rigid coupling. Every new data source requires modifying the session setup code. Adding a new capability means touching the orchestration layer rather than declaring a self-contained unit of functionality.

Tool calling inverts this model entirely. Instead of pushing data into the context, you declare capabilities that the model can pull when it determines they’re relevant.

How Tool Calling Works

The flow has four steps, all managed by LanguageModelSession:

You register one or more tools with the session.
The user sends a prompt.
The model analyzes the prompt and decides which tools (if any) to call, generating structured arguments for each.
The framework invokes your tool’s call(arguments:) method, feeds the result back to the model, and the model produces a final response that incorporates the tool output.

Steps 3 and 4 happen transparently — from your calling code’s perspective, respond(to:) still returns a single response. The session orchestrates the tool invocation loop internally.

Apple Docs: Tool — FoundationModels

Defining a Tool

A tool is any type conforming to the Tool protocol. The protocol requires three things: a name, a description the model uses to decide when to invoke the tool, and a call(arguments:) method that performs the actual work.

Here’s a tool that looks up Pixar films by release year:

import FoundationModels

@available(iOS 26, *)
struct PixarFilmLookup: Tool {
    // The model reads this description to decide when to invoke the tool
    let description = "Looks up Pixar films filtered by release year range"

    // The arguments type must conform to @Generable so the model can
    // produce structured input via constrained decoding
    @Generable
    struct Arguments {
        @Guide(description: "Earliest release year to include")
        var fromYear: Int

        @Guide(description: "Latest release year to include")
        var toYear: Int
    }

    // The return type is a plain String the model incorporates into its response
    func call(arguments: Arguments) async throws -> String {
        let films = pixarCatalog.filter { $0.year >= arguments.fromYear
                                       && $0.year <= arguments.toYear }
        guard !films.isEmpty else {
            return "No Pixar films found between \(arguments.fromYear) and \(arguments.toYear)."
        }
        return films.map { "\($0.title) (\($0.year))" }.joined(separator: "\n")
    }
}

A few things to notice:

Arguments conforms to @Generable. This is the bridge between the model’s reasoning and your Swift function. The model uses constrained decoding to produce a valid Arguments instance — the same mechanism covered in Foundation Models: Structured Output with @Generable. The @Guide descriptions are critical: they tell the model what each parameter means and how to populate it.
description guides tool selection. Write this as if you’re explaining the tool to a colleague — what it does, what kind of questions it answers. The model uses this text to decide whether this tool is relevant to the user’s prompt.
call(arguments:) is async throws. Your tool can do real work: hit the network, query a database, read from disk. The framework awaits your result before continuing inference.

Providing Tools to a Session

Once you have a tool, register it with a LanguageModelSession using the tools parameter:

@available(iOS 26, *)
func createMovieAssistant() -> LanguageModelSession {
    LanguageModelSession(
        instructions: """
        You are a Pixar movie expert embedded in the "Pixar Vault" app.
        Use available tools to look up film data rather than guessing.
        If a tool returns no results, say so honestly.
        """,
        tools: [PixarFilmLookup()]
    )
}

From this point, any call to respond(to:) on this session can trigger PixarFilmLookup if the model decides the user’s question warrants it:

@available(iOS 26, *)
func askAboutRecentFilms() async throws -> String {
    let session = createMovieAssistant()
    let response = try await session.respond(
        to: "Which Pixar films came out between 2022 and 2025?"
    )
    return response.content
}

The model sees the prompt, determines that PixarFilmLookup is relevant, generates Arguments(fromYear: 2022, toYear: 2025) via constrained decoding, the framework calls your call(arguments:) method, and the model weaves the result into a natural-language response. Your calling code doesn’t need to know whether a tool was invoked — it gets a final response.content either way.

Tip: The instructions prompt matters for tool-calling sessions. Explicitly tell the model to use its tools rather than guessing. On-device models are smaller than cloud frontier models and benefit from direct guidance about when to reach for tools.

Returning Structured Results

Tool results don’t have to be plain strings. When your tool returns rich data, define a @Generable result type so the model has clear structure to work with:

@available(iOS 26, *)
struct FilmDetail: Encodable {
    let title: String
    let year: Int
    let director: String
    let tomatoScore: Int
    let synopsis: String
}

@available(iOS 26, *)
struct DetailedFilmLookup: Tool {
    let description = "Retrieves detailed information about a specific Pixar film by title"

    @Generable
    struct Arguments {
        @Guide(description: "The title of the Pixar film to look up")
        var filmTitle: String
    }

    func call(arguments: Arguments) async throws -> String {
        guard let film = pixarCatalog.first(where: {
            $0.title.localizedCaseInsensitiveContains(arguments.filmTitle)
        }) else {
            return "Film not found: \(arguments.filmTitle)"
        }

        // Encode as JSON so the model gets structured data to reason over
        let encoder = JSONEncoder()
        encoder.outputFormatting = .prettyPrinted
        let data = try encoder.encode(film)
        return String(data: data, encoding: .utf8) ?? "Encoding failed"
    }
}

The call(arguments:) method returns String, but encoding your data as JSON gives the model a structured representation it can reference precisely. When the user asks “Who directed Inside Out 2?”, the model can extract exactly the director field from the JSON rather than paraphrasing a free-text blob.

Tip: Keep tool return values concise. Every character the tool returns consumes context window tokens. Return only the fields the model needs to answer the user’s likely questions. For large datasets, filter and paginate in the tool rather than returning everything.

Multiple Tools and Parallel Execution

Real-world assistants need multiple capabilities. The movie night assistant from our opening example needs film data and weather information. Define each tool independently and pass them all to the session:

import FoundationModels
import WeatherKit

@available(iOS 26, *)
struct WeatherForecastTool: Tool {
    let description = "Gets the current weather forecast for outdoor screening conditions"

    @Generable
    struct Arguments {
        @Guide(description: "City name for the weather forecast")
        var city: String
    }

    func call(arguments: Arguments) async throws -> String {
        // In production, geocode the city name to coordinates first
        let weatherService = WeatherService.shared
        let location = try await geocode(city: arguments.city)
        let forecast = try await weatherService.weather(for: location)
        let current = forecast.currentWeather

        return """
        Conditions: \(current.condition.description)
        Temperature: \(current.temperature.formatted())
        Wind: \(current.wind.speed.formatted())
        Outdoor screening recommendation: \(current.temperature.value > 15 ? "Suitable" : "Too cold")
        """
    }
}

@available(iOS 26, *)
func createMovieNightAssistant() -> LanguageModelSession {
    LanguageModelSession(
        instructions: """
        You are a movie night planning assistant for the "Pixar Under the Stars" app.
        Use the film lookup tool for movie questions and the weather tool for screening conditions.
        Combine information from multiple tools when the user's question spans both domains.
        """,
        tools: [PixarFilmLookup(), WeatherForecastTool()]
    )
}

When the user asks “What Pixar movies came out after 2020, and is tonight good for an outdoor screening in Austin?”, the model recognizes that two tools are relevant. The framework can execute both tool calls, collect their results, and produce a single unified response.

Apple Docs: WeatherService — WeatherKit

Tool Selection Is Model-Driven

An important design principle: you don’t decide which tools to call — the model does. This means your tool descriptions need to be precise enough for the model to make correct selection decisions. Vague descriptions lead to tools being invoked when they shouldn’t be, or skipped when they should be called.

Compare these descriptions:

// Vague — the model may call this for any movie-related question
let description = "Gets movie information"

// Precise — the model knows exactly when this tool is useful
let description = "Looks up Pixar films filtered by release year range"

Write tool descriptions the way you’d write a function’s documentation comment: state the input domain, the output type, and the specific use case.

Advanced Usage

Validating Arguments Before Execution

The model generates arguments via constrained decoding, but that doesn’t guarantee the values are semantically valid. A year of 9999 is a valid Int but not a valid release year. Validate arguments at the top of call(arguments:):

@available(iOS 26, *)
struct SafeFilmLookup: Tool {
    let description = "Looks up Pixar films filtered by release year range (1995 to present)"

    @Generable
    struct Arguments {
        @Guide(description: "Earliest release year (1995 or later)")
        var fromYear: Int

        @Guide(description: "Latest release year (up to current year)")
        var toYear: Int
    }

    func call(arguments: Arguments) async throws -> String {
        let currentYear = Calendar.current.component(.year, from: .now)

        // Validate semantic correctness beyond type safety
        guard arguments.fromYear >= 1995,
              arguments.toYear <= currentYear,
              arguments.fromYear <= arguments.toYear else {
            return "Invalid year range. Pixar films span 1995 to \(currentYear)."
        }

        let films = pixarCatalog.filter { $0.year >= arguments.fromYear
                                       && $0.year <= arguments.toYear }
        return films.map { "\($0.title) (\($0.year))" }.joined(separator: "\n")
    }
}

Return a descriptive error string rather than throwing — the model receives the error text as the tool result and can communicate the issue to the user naturally. Throwing from call(arguments:) propagates the error to the caller and terminates the inference pass, which is appropriate for infrastructure failures (network down, database corrupted) but too aggressive for validation issues the model can explain to the user.

Stateful Tools

Tools can hold state. An actor-backed tool can track invocation history, cache results, or accumulate data across multiple calls within a session:

@available(iOS 26, *)
actor FilmComparisonTool: Tool {
    let description = "Compares two Pixar films and remembers previous comparisons in this session"

    @Generable
    struct Arguments {
        @Guide(description: "Title of the first Pixar film")
        var filmA: String

        @Guide(description: "Title of the second Pixar film")
        var filmB: String
    }

    private var previousComparisons: [(String, String)] = []

    func call(arguments: Arguments) async throws -> String {
        previousComparisons.append((arguments.filmA, arguments.filmB))

        guard let a = pixarCatalog.first(where: { $0.title.localizedCaseInsensitiveContains(arguments.filmA) }),
              let b = pixarCatalog.first(where: { $0.title.localizedCaseInsensitiveContains(arguments.filmB) }) else {
            return "One or both films not found in catalog."
        }

        var result = """
        \(a.title) (\(a.year)) vs \(b.title) (\(b.year))
        Directors: \(a.director) vs \(b.director)
        Scores: \(a.tomatoScore)% vs \(b.tomatoScore)%
        """

        if previousComparisons.count > 1 {
            result += "\n\nPrevious comparisons this session: \(previousComparisons.count - 1)"
        }
        return result
    }
}

Warning: Stateful tools accumulate data for the lifetime of the session. If your tool caches results or tracks history, make sure that state doesn’t grow unbounded. Clear or cap it when the session resets.

Integrating with MapKit

Tools shine when they bridge the model to system frameworks. Here’s a tool that finds nearby movie theaters using MapKit:

import FoundationModels
import MapKit

@available(iOS 26, *)
struct NearbyTheatersTool: Tool {
    let description = "Finds movie theaters near a given city for Pixar screenings"

    @Generable
    struct Arguments {
        @Guide(description: "City name to search for theaters")
        var city: String

        @Guide(description: "Maximum number of results to return")
        var limit: Int
    }

    func call(arguments: Arguments) async throws -> String {
        let searchRequest = MKLocalSearch.Request()
        searchRequest.naturalLanguageQuery = "movie theater"
        searchRequest.region = try await regionForCity(arguments.city)

        let search = MKLocalSearch(request: searchRequest)
        let response = try await search.start()

        let theaters = response.mapItems.prefix(arguments.limit)
        return theaters.map { item in
            "\(item.name ?? "Unknown") — \(item.placemark.title ?? "No address")"
        }.joined(separator: "\n")
    }
}

Apple Docs: MKLocalSearch — MapKit

The model doesn’t know MapKit exists. It knows that a tool called “Finds movie theaters near a given city” exists and that it needs to provide a city name and a result limit. The implementation details are entirely yours.

Performance Considerations

Tool calling adds latency to the inference cycle. Each tool invocation means the model must: generate structured arguments (one constrained decoding pass), wait for your call(arguments:) to return, then process the result and continue generating the final response. With multiple tools, this compounds.

Practical guidance for keeping tool-enhanced sessions responsive:

Keep tool execution fast. The model is suspended while your tool runs. Network calls, disk I/O, and database queries all add wall-clock time. Cache aggressively and set tight timeouts.
Limit the number of registered tools. Each tool’s description consumes context window tokens as part of the system prompt. Registering 15 tools when the user’s feature only needs 3 wastes context and can confuse the model’s selection logic. Create purpose-specific sessions with curated tool sets.
Keep tool responses concise. A tool that returns 5,000 characters of JSON eats context that the model needs for reasoning and response generation. Filter, summarize, and paginate at the tool level.
Measure with Instruments. Profile your tool’s call(arguments:) execution time under the os_signpost system. The inference latency you observe in development will be the latency your users feel.

import os
import FoundationModels

private let toolLog = OSLog(subsystem: "com.pixar.vault", category: "ToolExecution")

@available(iOS 26, *)
struct InstrumentedFilmLookup: Tool {
    let description = "Looks up Pixar films by year range with performance logging"

    @Generable
    struct Arguments {
        @Guide(description: "Earliest release year")
        var fromYear: Int

        @Guide(description: "Latest release year")
        var toYear: Int
    }

    func call(arguments: Arguments) async throws -> String {
        let signpostID = OSSignpostID(log: toolLog)
        os_signpost(.begin, log: toolLog, name: "FilmLookup", signpostID: signpostID)
        defer { os_signpost(.end, log: toolLog, name: "FilmLookup", signpostID: signpostID) }

        let films = pixarCatalog.filter { $0.year >= arguments.fromYear
                                       && $0.year <= arguments.toYear }
        return films.map { "\($0.title) (\($0.year))" }.joined(separator: "\n")
    }
}

Apple Docs: os_signpost — os

For streaming sessions, tool calls happen during the stream. The user sees partial text generation, then a pause while tools execute, then the response resumes with tool-informed content. If your tool takes more than a second, consider showing a loading indicator in the UI during the tool execution phase.

When to Use (and When Not To)

Scenario	Recommendation
Model needs live data (weather, location, stock prices)	Use tools — the model’s training data is static
Model needs to query the user’s local data (Core Data, SwiftData)	Use tools — they bridge the model to your persistence layer
Model should trigger app actions (navigate, add to cart, create a reminder)	Use tools — but validate actions before executing side effects
All required data is already in the prompt context	Skip tools — direct context is faster and simpler
Single-shot classification or extraction from provided text	Skip tools — `@Generable` alone is sufficient
High-frequency calls (every keystroke, real-time autocomplete)	Skip tools — the latency overhead of tool invocation is too high
Actions with irreversible consequences (delete data, send payments)	Use tools with extreme caution — require user confirmation before execution

The decision framework is straightforward: if the model needs information it doesn’t have, or needs to cause an effect in the world, tools are the right abstraction. If the model already has everything it needs in the prompt and system instructions, tools add latency without adding value.

Tip: For actions with side effects (creating reminders, modifying user data), consider a two-phase approach: the tool returns a preview of the action, the model presents it to the user, and a second confirmation step (outside the model) actually executes it. This keeps humans in the loop for anything consequential.

Summary

The Tool protocol bridges the on-device model to your app’s capabilities — live data, system frameworks, local databases, and app actions.
Tool arguments use @Generable with @Guide descriptions so the model produces structured input via constrained decoding. Write precise descriptions — they directly affect argument quality.
Tool descriptions drive the model’s selection logic. Be specific about what each tool does, what inputs it expects, and when it’s useful.
Return concise, structured results from tools. Every character consumes context window tokens. Encode rich data as JSON for the model to reference precisely.
Register only the tools a session actually needs. Excess tools waste context and degrade selection accuracy.
Validate arguments semantically inside call(arguments:) and return error strings rather than throwing for non-infrastructure failures.
Profile tool execution with os_signpost — tool latency is user-facing latency.

Tool calling transforms Foundation Models from a text generator into an agent that can reason about and interact with the real world through your app’s APIs. The natural complement is designing the prompts that guide these agentic interactions effectively — see Designing Prompts for On-Device AI for patterns and anti-patterns specific to on-device models, and App Intents: From Siri to Interactive Snippets if you want to expose your app’s tools to Siri and the system intelligence layer as well.