Oct 28, 2025

Integrating Core ML Models in SwiftUI: Image Classification, NLP, and Custom Models

Your app can identify Pixar characters in photos, classify film genres from text descriptions, and flag inappropriate content — all on-device, in milliseconds, without touching the network. Core ML is Apple’s ML inference framework, and it has been shipping since iOS 11. While Foundation Models handles open-ended language tasks, Core ML is your tool when you have a specific, well-scoped prediction task and a model trained to solve it.

This guide covers the full Core ML integration path: adding .mlmodel files to Xcode, running image classification with Vision, classifying text with Natural Language, training custom models with Create ML, and configuring compute units for performance. We won’t cover the Foundation Models framework for LLM tasks — that has its own dedicated post.

The Problem: Cloud Vision and NLP APIs
Core ML Architecture
Image Classification with Vision and Core ML
Text Classification with Natural Language
Training Custom Models with Create ML
Integrating Inference in SwiftUI
Advanced Usage
Performance Considerations
When to Use (and When Not To)
Summary

The Problem: Cloud Vision and NLP APIs

Calling a cloud vision API for image classification looks straightforward until you account for the failure modes:

// Cloud vision API call — works great until it doesn't
func classifyImageViaCloud(_ image: UIImage) async throws -> [String: Double] {
    guard let imageData = image.jpegData(compressionQuality: 0.8) else {
        throw ClassificationError.invalidImage
    }

    var request = URLRequest(url: URL(string: "https://vision.googleapis.com/v1/images:annotate?key=\(apiKey)")!)
    request.httpMethod = "POST"
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")

    let body = ["requests": [["image": ["content": imageData.base64EncodedString()],
                              "features": [["type": "LABEL_DETECTION"]]]]]
    request.httpBody = try JSONSerialization.data(withJSONObject: body)

    let (data, _) = try await URLSession.shared.data(for: request)
    let result = try JSONDecoder().decode(CloudVisionResponse.self, from: data)

    return result.labelAnnotations.reduce(into: [:]) { dict, annotation in
        dict[annotation.description] = annotation.score
    }
}

The failure modes: the feature stops working offline, adds hundreds of milliseconds of latency (plus upload time for large images), costs money per request, and sends your user’s photos to a third-party server. For a feature like “identify characters in a photo,” none of that is acceptable.

Core ML solves all four problems simultaneously.

Core ML Architecture

Apple Docs: Core ML — Apple Developer Documentation

Core ML operates on .mlmodel files — compiled model packages that Xcode processes at build time into optimized .mlmodelc bundles. When you add a .mlmodel to your Xcode project, the build system auto-generates a Swift class with a typed API specific to that model’s inputs and outputs.

For most image tasks, you pair Core ML with the Vision framework, which handles image format normalization, orientation correction, and request management. For text tasks, the Natural Language framework provides higher-level APIs built on Core ML models.

The execution path looks like this:

UIImage / CVPixelBuffer
    ↓ Vision framework normalizes input
VNCoreMLRequest
    ↓ Core ML runtime dispatches to hardware
MLComputeUnit (Neural Engine / GPU / CPU)
    ↓ inference result
VNClassificationObservation[]

Image Classification with Vision and Core ML

To run a Core ML image classifier, you need a .mlmodel file in your Xcode project. Apple’s Model Gallery provides several production-ready classifiers. Drag the .mlmodel file into your Xcode project navigator — Xcode generates the Swift interface automatically.

Apple Docs: VNCoreMLRequest — Vision

Here’s a production-grade image classification function that wraps the Vision/Core ML pipeline in a clean async interface:

import Vision
import CoreML

enum ImageClassificationError: Error {
    case invalidImage
    case modelLoadFailed
    case requestFailed(Error)
    case noResults
}

@available(iOS 14, *)
func classifyPixarCharacter(in image: UIImage) async throws -> [(label: String, confidence: Double)] {
    guard let cgImage = image.cgImage else {
        throw ImageClassificationError.invalidImage
    }

    // Load the compiled model — throws if the .mlmodel is missing or corrupt
    let modelConfig = MLModelConfiguration()
    modelConfig.computeUnits = .cpuAndNeuralEngine
    let coreMLModel = try PixarCharacterClassifier(configuration: modelConfig).model
    let visionModel = try VNCoreMLModel(for: coreMLModel)

    return try await withCheckedThrowingContinuation { continuation in
        let request = VNCoreMLRequest(model: visionModel) { request, error in
            if let error {
                continuation.resume(throwing: ImageClassificationError.requestFailed(error))
                return
            }

            guard let observations = request.results as? [VNClassificationObservation],
                  !observations.isEmpty else {
                continuation.resume(throwing: ImageClassificationError.noResults)
                return
            }

            // Return top 5 predictions sorted by confidence
            let results = observations
                .prefix(5)
                .map { (label: $0.identifier, confidence: Double($0.confidence)) }
            continuation.resume(returning: results)
        }

        // imageCropAndScaleOption controls how Vision handles aspect ratio mismatches
        request.imageCropAndScaleOption = .centerCrop

        let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
        do {
            try handler.perform([request])
        } catch {
            continuation.resume(throwing: ImageClassificationError.requestFailed(error))
        }
    }
}

withCheckedThrowingContinuation bridges Vision’s callback-based API into async/await. The imageCropAndScaleOption setting matters — most classifiers are trained on square images, so .centerCrop produces better results than default letterboxing for non-square inputs.

VNClassificationObservation gives you an identifier (the label string) and a confidence float between 0 and 1. Sorting by confidence descending and taking the top 5 is the standard pattern — the first element is the model’s best guess.

Text Classification with Natural Language

Apple Docs: Natural Language — Apple Developer Documentation

The Natural Language framework provides text classification through NLModel, which loads a Core ML text classifier. Beyond custom models, NLTagger provides built-in capabilities — language detection, part-of-speech tagging, named entity recognition — without any model files.

Here’s how to use NLTagger for named entity recognition to extract character and location names from a film script excerpt:

import NaturalLanguage

func extractEntities(from scriptExcerpt: String) -> (characters: [String], locations: [String]) {
    var characters: [String] = []
    var locations: [String] = []

    let tagger = NLTagger(tagSchemes: [.nameType])
    tagger.string = scriptExcerpt

    // tokenRange controls the granularity — .word gives token-by-token analysis
    let range = scriptExcerpt.startIndex..<scriptExcerpt.endIndex
    tagger.enumerateTags(in: range, unit: .word, scheme: .nameType) { tag, tokenRange in
        guard let tag else { return true }

        let entity = String(scriptExcerpt[tokenRange])
        switch tag {
        case .personalName:
            characters.append(entity)
        case .placeName:
            locations.append(entity)
        default:
            break
        }
        return true // continue enumeration
    }

    return (characters: characters, locations: locations)
}

// Example output for "Woody and Buzz landed in Al's apartment in Tokyo."
// characters: ["Woody", "Buzz", "Al"]
// locations: ["Tokyo"]

For custom text classification — say, categorizing user reviews of Pixar films as “positive,” “negative,” or “neutral” — you load an NLModel trained with Create ML:

import NaturalLanguage
import CoreML

func classifyFilmReview(_ reviewText: String) throws -> String {
    guard let modelURL = Bundle.main.url(forResource: "FilmReviewClassifier", withExtension: "mlmodelc") else {
        throw ClassificationError.modelNotFound
    }

    let compiledModel = try MLModel(contentsOf: modelURL)
    let nlModel = try NLModel(mlModel: compiledModel)

    return nlModel.predictedLabel(for: reviewText) ?? "unknown"
}

NLModel.predictedLabel(for:) returns the highest-confidence class label as a String. For multi-class confidence scores, use predictedLabelHypotheses(for:maximumCount:) to get a dictionary of label-to-probability mappings.

Training Custom Models with Create ML

Apple Docs: Create ML — Apple Developer Documentation

Create ML is Xcode’s built-in model training tool. You don’t need Python, TensorFlow, or a cloud GPU. For common tasks — image classification, text classification, tabular regression — Create ML trains models in hours on a Mac.

To train a Pixar film genre classifier:

Open Xcode, go to Xcode → Open Developer Tool → Create ML.
Choose Text Classifier as the template.
Provide training data: a directory of .txt files organized into subdirectories by label (adventure/, comedy/, drama/), or a CSV with text and label columns.
Configure training: algorithm (Maximum Entropy or Transfer Learning), validation split.
Click Train. Create ML handles tokenization, feature extraction, and optimization.
Export the trained model as a .mlmodel file.

The resulting model file goes directly into your Xcode project. No conversion needed.

For image classifiers, the process is identical but you provide image directories instead of text files. Create ML supports transfer learning from built-in feature extractors (VGG, ResNet variants) — this means useful accuracy with hundreds of training images rather than tens of thousands.

Integrating Inference in SwiftUI

Wrapping Core ML inference in an @Observable class gives SwiftUI a clean interface with proper async lifecycle management:

import SwiftUI
import Vision
import CoreML

@Observable
final class CharacterRecognitionViewModel {
    var classificationResults: [(label: String, confidence: Double)] = []
    var isClassifying: Bool = false
    var errorMessage: String?

    func classify(image: UIImage) async {
        isClassifying = true
        errorMessage = nil
        defer { isClassifying = false }

        do {
            classificationResults = try await classifyPixarCharacter(in: image)
        } catch ImageClassificationError.invalidImage {
            errorMessage = "Could not process this image."
        } catch ImageClassificationError.noResults {
            errorMessage = "No characters recognized."
        } catch {
            errorMessage = "Classification failed. Please try again."
        }
    }
}

struct CharacterRecognitionView: View {
    @State private var viewModel = CharacterRecognitionViewModel()
    let selectedImage: UIImage

    var body: some View {
        VStack(alignment: .leading, spacing: 12) {
            Image(uiImage: selectedImage)
                .resizable()
                .scaledToFit()

            if viewModel.isClassifying {
                ProgressView("Identifying character...")
            } else if let error = viewModel.errorMessage {
                Text(error).foregroundStyle(.red)
            } else {
                ForEach(viewModel.classificationResults, id: \.label) { result in
                    HStack {
                        Text(result.label)
                        Spacer()
                        Text(String(format: "%.0f%%", result.confidence * 100))
                            .foregroundStyle(.secondary)
                    }
                }
            }
        }
        .padding()
        .task {
            await viewModel.classify(image: selectedImage)
        }
    }
}

.task is the right modifier here — it launches an async task tied to the view’s lifetime and cancels it if the view disappears before inference completes.

Advanced Usage

Compute Unit Configuration

MLModelConfiguration.computeUnits controls which hardware Core ML uses for inference:

let config = MLModelConfiguration()

// Default — Core ML chooses the best available hardware automatically
config.computeUnits = .all

// Explicitly use Neural Engine + CPU (good balance for most models)
config.computeUnits = .cpuAndNeuralEngine

// CPU only — slower but deterministic, useful for debugging unexpected results
config.computeUnits = .cpuOnly

Apple Docs: MLModelConfiguration — Core ML

In practice, .all is appropriate for production. The Core ML runtime selects the most efficient hardware for each layer of the model. Use .cpuAndNeuralEngine when you want to exclude the GPU — useful on older devices where GPU memory pressure can cause issues, or when running many concurrent inference tasks.

Batch Predictions

For classifying multiple images simultaneously — a photo library scan, for example — use MLArrayBatchProvider instead of processing images one at a time:

import CoreML
import CoreImage

func batchClassifyFilmPosters(_ images: [UIImage]) throws -> [String] {
    let modelConfig = MLModelConfiguration()
    modelConfig.computeUnits = .cpuAndNeuralEngine
    let model = try FilmPosterClassifier(configuration: modelConfig)

    let featureProviders: [MLFeatureProvider] = try images.compactMap { image in
        guard let ciImage = CIImage(image: image) else { return nil }
        // Resize to the model's expected input dimensions
        let resized = ciImage.transformed(by: CGAffineTransform(
            scaleX: 224.0 / ciImage.extent.width,
            y: 224.0 / ciImage.extent.height
        ))
        var pixelBuffer: CVPixelBuffer?
        CVPixelBufferCreate(kCFAllocatorDefault, 224, 224,
                            kCVPixelFormatType_32BGRA, nil, &pixelBuffer)
        guard let buffer = pixelBuffer else { return nil }
        CIContext().render(resized, to: buffer)
        return try MLDictionaryFeatureProvider(
            dictionary: ["image": MLFeatureValue(pixelBuffer: buffer)]
        )
    }

    let batchProvider = MLArrayBatchProvider(array: featureProviders)
    let predictions = try model.predictions(fromBatch: batchProvider)

    return (0..<predictions.count).compactMap { index in
        predictions.features(at: index).featureValue(for: "classLabel")?.stringValue
    }
}

Batch prediction amortizes the per-request overhead across all inputs, giving substantially better throughput than sequential single-image calls for large batches.

Model Encryption and On-Demand Resources

Large models increase your app’s binary size. Apple supports two mitigation strategies: model encryption (protecting the model IP via MLModelConfiguration.modelDisplayName and key-based loading) and on-demand resources (hosting the .mlmodelc as an on-demand resource that downloads after installation).

For models larger than a few megabytes, host the compiled model as an on-demand resource with NSBundleResourceRequest rather than bundling it. This keeps your initial download size small and only downloads the model when the relevant feature is first used.

Performance Considerations

The Neural Engine is 10–100x faster than the CPU for typical ML inference workloads, and it’s the right target for production. A MobileNet-style image classifier runs in under 10ms on the Neural Engine on iPhone 15 hardware. The same model on CPU takes 50–200ms.

Key performance guidance:

Load and compile your MLModel once — it’s expensive. Store it as a property, not a local variable in an inference function.
Use .cpuAndNeuralEngine rather than .all if you observe GPU memory pressure warnings.
For models in the 10MB+ range, measure inference time and memory with Instruments’ Core ML template before shipping.
VNImageRequestHandler is not thread-safe. Create one per inference call — it’s designed to be disposable.

Apple Docs: Improving Your Model's Accuracy — Core ML

When to Use (and When Not To)

Scenario	Recommendation
Image content classification (objects, scenes, faces)	Core ML + Vision — purpose-built for this task
Language detection or named entity recognition	Natural Language framework on top of Core ML
Custom classification with your own training data	Core ML via Create ML — train on Mac, no Python
Open-ended text generation or chat	Foundation Models — Core ML is not designed for generative tasks
Model from TensorFlow or PyTorch ecosystem	Convert with `coremltools` Python package, then use Core ML
Task requires knowledge of events after model training date	Not a fit for Core ML; use a cloud API with retrieval augmentation
Real-time video analysis (60fps)	Vision’s `VNSequenceRequestHandler` with explicit frame budgeting
Recommendation system with tabular data	Core ML tabular classifier via Create ML

Summary

Core ML enables on-device ML inference with no network dependency, no per-inference cost, and no user data leaving the device.
Pair Core ML with Vision for image tasks — Vision handles normalization, orientation, and the request lifecycle.
The Natural Language framework provides named entity recognition, language detection, and POS tagging out of the box, with NLModel for custom text classifiers.
Create ML trains production-quality image and text classifiers directly in Xcode — no Python required.
Set MLModelConfiguration.computeUnits = .cpuAndNeuralEngine for best performance. Load your MLModel once and reuse it.
For generative language tasks, Core ML is not the right tool — see Apple’s Foundation Models Framework.

Core ML gives you a reliable, high-performance inference runtime. The quality of your results, however, depends heavily on how well your model is defined and how its inputs are prepared — which connects directly to prompt and input design. See Designing Prompts for On-Device AI for the patterns that apply when Foundation Models is in the loop alongside Core ML.