Integrating Core ML Models in SwiftUI: Image Classification, NLP, and Custom Models
Your app can identify Pixar characters in photos, classify film genres from text descriptions, and flag inappropriate content — all on-device, in milliseconds, without touching the network. Core ML is Apple’s ML inference framework, and it has been shipping since iOS 11. While Foundation Models handles open-ended language tasks, Core ML is your tool when you have a specific, well-scoped prediction task and a model trained to solve it.
This guide covers the full Core ML integration path: adding .mlmodel files to Xcode, running image classification with
Vision, classifying text with Natural Language, training custom models with Create ML, and configuring compute units for
performance. We won’t cover the Foundation Models framework for LLM tasks — that has its own
dedicated post.
Contents
- The Problem: Cloud Vision and NLP APIs
- Core ML Architecture
- Image Classification with Vision and Core ML
- Text Classification with Natural Language
- Training Custom Models with Create ML
- Integrating Inference in SwiftUI
- Advanced Usage
- Performance Considerations
- When to Use (and When Not To)
- Summary
The Problem: Cloud Vision and NLP APIs
Calling a cloud vision API for image classification looks straightforward until you account for the failure modes:
// Cloud vision API call — works great until it doesn't
func classifyImageViaCloud(_ image: UIImage) async throws -> [String: Double] {
guard let imageData = image.jpegData(compressionQuality: 0.8) else {
throw ClassificationError.invalidImage
}
var request = URLRequest(url: URL(string: "https://vision.googleapis.com/v1/images:annotate?key=\(apiKey)")!)
request.httpMethod = "POST"
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
let body = ["requests": [["image": ["content": imageData.base64EncodedString()],
"features": [["type": "LABEL_DETECTION"]]]]]
request.httpBody = try JSONSerialization.data(withJSONObject: body)
let (data, _) = try await URLSession.shared.data(for: request)
let result = try JSONDecoder().decode(CloudVisionResponse.self, from: data)
return result.labelAnnotations.reduce(into: [:]) { dict, annotation in
dict[annotation.description] = annotation.score
}
}
The failure modes: the feature stops working offline, adds hundreds of milliseconds of latency (plus upload time for large images), costs money per request, and sends your user’s photos to a third-party server. For a feature like “identify characters in a photo,” none of that is acceptable.
Core ML solves all four problems simultaneously.
Core ML Architecture
Apple Docs:
Core ML— Apple Developer Documentation
Core ML operates on .mlmodel files — compiled model packages that Xcode processes at build time into optimized
.mlmodelc bundles. When you add a .mlmodel to your Xcode project, the build system auto-generates a Swift class with
a typed API specific to that model’s inputs and outputs.
For most image tasks, you pair Core ML with the Vision framework, which handles image format normalization, orientation correction, and request management. For text tasks, the Natural Language framework provides higher-level APIs built on Core ML models.
The execution path looks like this:
UIImage / CVPixelBuffer
↓ Vision framework normalizes input
VNCoreMLRequest
↓ Core ML runtime dispatches to hardware
MLComputeUnit (Neural Engine / GPU / CPU)
↓ inference result
VNClassificationObservation[]
Image Classification with Vision and Core ML
To run a Core ML image classifier, you need a .mlmodel file in your Xcode project. Apple’s
Model Gallery provides several production-ready classifiers.
Drag the .mlmodel file into your Xcode project navigator — Xcode generates the Swift interface automatically.
Apple Docs:
VNCoreMLRequest— Vision
Here’s a production-grade image classification function that wraps the Vision/Core ML pipeline in a clean async interface:
import Vision
import CoreML
enum ImageClassificationError: Error {
case invalidImage
case modelLoadFailed
case requestFailed(Error)
case noResults
}
@available(iOS 14, *)
func classifyPixarCharacter(in image: UIImage) async throws -> [(label: String, confidence: Double)] {
guard let cgImage = image.cgImage else {
throw ImageClassificationError.invalidImage
}
// Load the compiled model — throws if the .mlmodel is missing or corrupt
let modelConfig = MLModelConfiguration()
modelConfig.computeUnits = .cpuAndNeuralEngine
let coreMLModel = try PixarCharacterClassifier(configuration: modelConfig).model
let visionModel = try VNCoreMLModel(for: coreMLModel)
return try await withCheckedThrowingContinuation { continuation in
let request = VNCoreMLRequest(model: visionModel) { request, error in
if let error {
continuation.resume(throwing: ImageClassificationError.requestFailed(error))
return
}
guard let observations = request.results as? [VNClassificationObservation],
!observations.isEmpty else {
continuation.resume(throwing: ImageClassificationError.noResults)
return
}
// Return top 5 predictions sorted by confidence
let results = observations
.prefix(5)
.map { (label: $0.identifier, confidence: Double($0.confidence)) }
continuation.resume(returning: results)
}
// imageCropAndScaleOption controls how Vision handles aspect ratio mismatches
request.imageCropAndScaleOption = .centerCrop
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try handler.perform([request])
} catch {
continuation.resume(throwing: ImageClassificationError.requestFailed(error))
}
}
}
withCheckedThrowingContinuation bridges Vision’s callback-based API into async/await. The imageCropAndScaleOption
setting matters — most classifiers are trained on square images, so .centerCrop produces better results than default
letterboxing for non-square inputs.
VNClassificationObservation gives you an identifier (the label string) and a confidence float between 0 and 1.
Sorting by confidence descending and taking the top 5 is the standard pattern — the first element is the model’s best
guess.
Text Classification with Natural Language
Apple Docs:
Natural Language— Apple Developer Documentation
The Natural Language framework provides text classification through NLModel, which loads a Core ML text classifier.
Beyond custom models, NLTagger provides built-in capabilities — language detection, part-of-speech tagging, named
entity recognition — without any model files.
Here’s how to use NLTagger for named entity recognition to extract character and location names from a film script
excerpt:
import NaturalLanguage
func extractEntities(from scriptExcerpt: String) -> (characters: [String], locations: [String]) {
var characters: [String] = []
var locations: [String] = []
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = scriptExcerpt
// tokenRange controls the granularity — .word gives token-by-token analysis
let range = scriptExcerpt.startIndex..<scriptExcerpt.endIndex
tagger.enumerateTags(in: range, unit: .word, scheme: .nameType) { tag, tokenRange in
guard let tag else { return true }
let entity = String(scriptExcerpt[tokenRange])
switch tag {
case .personalName:
characters.append(entity)
case .placeName:
locations.append(entity)
default:
break
}
return true // continue enumeration
}
return (characters: characters, locations: locations)
}
// Example output for "Woody and Buzz landed in Al's apartment in Tokyo."
// characters: ["Woody", "Buzz", "Al"]
// locations: ["Tokyo"]
For custom text classification — say, categorizing user reviews of Pixar films as “positive,” “negative,” or “neutral” —
you load an NLModel trained with Create ML:
import NaturalLanguage
import CoreML
func classifyFilmReview(_ reviewText: String) throws -> String {
guard let modelURL = Bundle.main.url(forResource: "FilmReviewClassifier", withExtension: "mlmodelc") else {
throw ClassificationError.modelNotFound
}
let compiledModel = try MLModel(contentsOf: modelURL)
let nlModel = try NLModel(mlModel: compiledModel)
return nlModel.predictedLabel(for: reviewText) ?? "unknown"
}
NLModel.predictedLabel(for:) returns the highest-confidence class label as a String. For multi-class confidence
scores, use predictedLabelHypotheses(for:maximumCount:) to get a dictionary of label-to-probability mappings.
Training Custom Models with Create ML
Apple Docs:
Create ML— Apple Developer Documentation
Create ML is Xcode’s built-in model training tool. You don’t need Python, TensorFlow, or a cloud GPU. For common tasks — image classification, text classification, tabular regression — Create ML trains models in hours on a Mac.
To train a Pixar film genre classifier:
- Open Xcode, go to Xcode → Open Developer Tool → Create ML.
- Choose Text Classifier as the template.
- Provide training data: a directory of
.txtfiles organized into subdirectories by label (adventure/,comedy/,drama/), or a CSV withtextandlabelcolumns. - Configure training: algorithm (Maximum Entropy or Transfer Learning), validation split.
- Click Train. Create ML handles tokenization, feature extraction, and optimization.
- Export the trained model as a
.mlmodelfile.
The resulting model file goes directly into your Xcode project. No conversion needed.
For image classifiers, the process is identical but you provide image directories instead of text files. Create ML supports transfer learning from built-in feature extractors (VGG, ResNet variants) — this means useful accuracy with hundreds of training images rather than tens of thousands.
Integrating Inference in SwiftUI
Wrapping Core ML inference in an @Observable class gives SwiftUI a clean interface with proper async lifecycle
management:
import SwiftUI
import Vision
import CoreML
@Observable
final class CharacterRecognitionViewModel {
var classificationResults: [(label: String, confidence: Double)] = []
var isClassifying: Bool = false
var errorMessage: String?
func classify(image: UIImage) async {
isClassifying = true
errorMessage = nil
defer { isClassifying = false }
do {
classificationResults = try await classifyPixarCharacter(in: image)
} catch ImageClassificationError.invalidImage {
errorMessage = "Could not process this image."
} catch ImageClassificationError.noResults {
errorMessage = "No characters recognized."
} catch {
errorMessage = "Classification failed. Please try again."
}
}
}
struct CharacterRecognitionView: View {
@State private var viewModel = CharacterRecognitionViewModel()
let selectedImage: UIImage
var body: some View {
VStack(alignment: .leading, spacing: 12) {
Image(uiImage: selectedImage)
.resizable()
.scaledToFit()
if viewModel.isClassifying {
ProgressView("Identifying character...")
} else if let error = viewModel.errorMessage {
Text(error).foregroundStyle(.red)
} else {
ForEach(viewModel.classificationResults, id: \.label) { result in
HStack {
Text(result.label)
Spacer()
Text(String(format: "%.0f%%", result.confidence * 100))
.foregroundStyle(.secondary)
}
}
}
}
.padding()
.task {
await viewModel.classify(image: selectedImage)
}
}
}
.task is the right modifier here — it launches an async task tied to the view’s lifetime and cancels it if the view
disappears before inference completes.
Advanced Usage
Compute Unit Configuration
MLModelConfiguration.computeUnits controls which hardware Core ML uses for inference:
let config = MLModelConfiguration()
// Default — Core ML chooses the best available hardware automatically
config.computeUnits = .all
// Explicitly use Neural Engine + CPU (good balance for most models)
config.computeUnits = .cpuAndNeuralEngine
// CPU only — slower but deterministic, useful for debugging unexpected results
config.computeUnits = .cpuOnly
Apple Docs:
MLModelConfiguration— Core ML
In practice, .all is appropriate for production. The Core ML runtime selects the most efficient hardware for each
layer of the model. Use .cpuAndNeuralEngine when you want to exclude the GPU — useful on older devices where GPU
memory pressure can cause issues, or when running many concurrent inference tasks.
Batch Predictions
For classifying multiple images simultaneously — a photo library scan, for example — use MLArrayBatchProvider instead
of processing images one at a time:
import CoreML
import CoreImage
func batchClassifyFilmPosters(_ images: [UIImage]) throws -> [String] {
let modelConfig = MLModelConfiguration()
modelConfig.computeUnits = .cpuAndNeuralEngine
let model = try FilmPosterClassifier(configuration: modelConfig)
let featureProviders: [MLFeatureProvider] = try images.compactMap { image in
guard let ciImage = CIImage(image: image) else { return nil }
// Resize to the model's expected input dimensions
let resized = ciImage.transformed(by: CGAffineTransform(
scaleX: 224.0 / ciImage.extent.width,
y: 224.0 / ciImage.extent.height
))
var pixelBuffer: CVPixelBuffer?
CVPixelBufferCreate(kCFAllocatorDefault, 224, 224,
kCVPixelFormatType_32BGRA, nil, &pixelBuffer)
guard let buffer = pixelBuffer else { return nil }
CIContext().render(resized, to: buffer)
return try MLDictionaryFeatureProvider(
dictionary: ["image": MLFeatureValue(pixelBuffer: buffer)]
)
}
let batchProvider = MLArrayBatchProvider(array: featureProviders)
let predictions = try model.predictions(fromBatch: batchProvider)
return (0..<predictions.count).compactMap { index in
predictions.features(at: index).featureValue(for: "classLabel")?.stringValue
}
}
Batch prediction amortizes the per-request overhead across all inputs, giving substantially better throughput than sequential single-image calls for large batches.
Model Encryption and On-Demand Resources
Large models increase your app’s binary size. Apple supports two mitigation strategies: model encryption (protecting the
model IP via MLModelConfiguration.modelDisplayName and key-based loading) and on-demand resources (hosting the
.mlmodelc as an on-demand resource that downloads after installation).
For models larger than a few megabytes, host the compiled model as an on-demand resource with NSBundleResourceRequest
rather than bundling it. This keeps your initial download size small and only downloads the model when the relevant
feature is first used.
Performance Considerations
The Neural Engine is 10–100x faster than the CPU for typical ML inference workloads, and it’s the right target for production. A MobileNet-style image classifier runs in under 10ms on the Neural Engine on iPhone 15 hardware. The same model on CPU takes 50–200ms.
Key performance guidance:
- Load and compile your
MLModelonce — it’s expensive. Store it as a property, not a local variable in an inference function. - Use
.cpuAndNeuralEnginerather than.allif you observe GPU memory pressure warnings. - For models in the 10MB+ range, measure inference time and memory with Instruments’ Core ML template before shipping.
VNImageRequestHandleris not thread-safe. Create one per inference call — it’s designed to be disposable.
Apple Docs:
Improving Your Model's Accuracy— Core ML
When to Use (and When Not To)
| Scenario | Recommendation |
|---|---|
| Image content classification (objects, scenes, faces) | Core ML + Vision — purpose-built for this task |
| Language detection or named entity recognition | Natural Language framework on top of Core ML |
| Custom classification with your own training data | Core ML via Create ML — train on Mac, no Python |
| Open-ended text generation or chat | Foundation Models — Core ML is not designed for generative tasks |
| Model from TensorFlow or PyTorch ecosystem | Convert with coremltools Python package, then use Core ML |
| Task requires knowledge of events after model training date | Not a fit for Core ML; use a cloud API with retrieval augmentation |
| Real-time video analysis (60fps) | Vision’s VNSequenceRequestHandler with explicit frame budgeting |
| Recommendation system with tabular data | Core ML tabular classifier via Create ML |
Summary
- Core ML enables on-device ML inference with no network dependency, no per-inference cost, and no user data leaving the device.
- Pair Core ML with Vision for image tasks — Vision handles normalization, orientation, and the request lifecycle.
- The Natural Language framework provides named entity recognition, language detection, and POS tagging out of the box,
with
NLModelfor custom text classifiers. - Create ML trains production-quality image and text classifiers directly in Xcode — no Python required.
- Set
MLModelConfiguration.computeUnits = .cpuAndNeuralEnginefor best performance. Load yourMLModelonce and reuse it. - For generative language tasks, Core ML is not the right tool — see Apple’s Foundation Models Framework.
Core ML gives you a reliable, high-performance inference runtime. The quality of your results, however, depends heavily on how well your model is defined and how its inputs are prepared — which connects directly to prompt and input design. See Designing Prompts for On-Device AI for the patterns that apply when Foundation Models is in the loop alongside Core ML.