Create ML: Training Custom On-Device Models Without a Data Science Background
You have hundreds of user-submitted photos that need classifying, review text that needs sentiment scoring, or ambient sounds your app should recognize — and you do not have a machine learning team. Core ML handles inference beautifully, but where do the models come from? Apple’s answer is Create ML: a framework (and companion macOS app) that lets you train production-quality models using transfer learning, without writing a single line of Python.
This post covers training image classifiers, text classifiers, and sound classifiers with Create ML’s Swift API and the Create ML app. We will not cover Core ML model integration (that is its own dedicated post) or Apple’s Foundation Models framework for generative AI.
Contents
- The Problem
- Create ML at a Glance
- Training an Image Classifier
- Training a Text Classifier
- Training a Sound Classifier
- Advanced Usage
- Performance Considerations
- When to Use (and When Not To)
- Summary
The Problem
Suppose you are building an app that lets users catalog their Pixar movie poster collection. They snap a photo, and the app should recognize which film it belongs to — Toy Story, Finding Nemo, Inside Out, and so on. You could call a cloud vision API, but that means network latency, per-request cost, and sending user photos off-device.
A naive approach might look like this:
import UIKit
func classifyPoster(_ image: UIImage) async throws -> String {
// Ship the image to a cloud endpoint
let url = URL(string: "https://api.example.com/classify")!
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.httpBody = image.jpegData(compressionQuality: 0.8)
let (data, _) = try await URLSession.shared.data(for: request)
let result = try JSONDecoder().decode(ClassificationResult.self, from: data)
return result.label // "Toy Story", "Finding Nemo", etc.
}
This works, but it requires a backend, ongoing server cost, internet connectivity, and raises privacy concerns. You want the classification to happen entirely on-device, with a model you trained yourself from a folder of labeled images. That is exactly what Create ML provides.
Create ML at a Glance
Create ML is Apple’s framework for training machine learning models on macOS. It ships with two interfaces:
- Create ML app — A visual tool bundled with Xcode (open it via Xcode > Open Developer Tool > Create ML).
Drag-and-drop your data, pick a model type, hit Train, and export a
.mlmodelfile. - Create ML Swift API — A programmatic interface for scripting training pipelines. Run it in a Swift Playground, a macOS command-line tool, or a test target.
Both produce the same output: a compiled .mlmodel file that you drop into your Xcode project and use via
Core ML.
Under the hood, Create ML uses transfer learning — it takes a pre-trained base model (already skilled at recognizing general features like edges, textures, and shapes) and fine-tunes the final layers on your specific data. This is why you can train a useful image classifier with as few as 10 images per category.
Note: Create ML training runs on macOS only. The resulting
.mlmodelfile deploys to iOS, iPadOS, watchOS, tvOS, visionOS, and macOS.
Training an Image Classifier
Let us train a model that identifies Pixar movie posters. The workflow has three phases: organize data, train, and export.
Organizing Training Data
Create ML expects a folder structure where each subfolder name is a label:
PixarPosters/
├── Training/
│ ├── ToyStory/
│ │ ├── poster_001.jpg
│ │ ├── poster_002.jpg
│ │ └── ... (10+ images)
│ ├── FindingNemo/
│ │ └── ...
│ ├── InsideOut/
│ │ └── ...
│ └── Coco/
│ └── ...
└── Testing/
├── ToyStory/
│ └── ...
├── FindingNemo/
│ └── ...
└── ...
Aim for at least 10 images per category in the training set and 2-5 in the testing set. More data generally means better accuracy, but transfer learning is remarkably effective even with small datasets.
Training with the Swift API
Create a macOS command-line target or Swift Playground and use
MLImageClassifier:
import CreateML
import Foundation
// Point to the organized folder structure
let trainingDir = URL(fileURLWithPath: "/Users/you/PixarPosters/Training")
let testingDir = URL(fileURLWithPath: "/Users/you/PixarPosters/Testing")
// Configure the training parameters
let parameters = MLImageClassifier.ModelParameters(
maxIterations: 25,
augmentation: [.crop, .blur, .exposure, .flip, .rotation]
)
// Train — this blocks until training completes
let classifier = try MLImageClassifier(
trainingData: .labeledDirectories(at: trainingDir),
parameters: parameters
)
// Evaluate against the held-out test set
let evaluation = classifier.evaluation(
on: .labeledDirectories(at: testingDir)
)
print("Training accuracy: \(classifier.trainingMetrics.classificationError)")
print("Validation accuracy: \(classifier.validationMetrics.classificationError)")
// Export the compiled model
let modelURL = URL(fileURLWithPath: "/Users/you/PixarPosterClassifier.mlmodel")
try classifier.write(to: modelURL)
The augmentation parameter tells Create ML to generate synthetic variations of your training images (cropping,
blurring, flipping, rotating). This dramatically improves generalization when your dataset is small.
Tip: The training process prints progress to the console. On an M1 Mac or later, expect a 5-category image classifier with 50 images per category to train in under two minutes.
Using the Trained Model
Once you drop PixarPosterClassifier.mlmodel into your Xcode project, Xcode auto-generates a Swift class. Inference is
straightforward with the Vision framework:
import Vision
import UIKit
func classifyPoster(_ image: UIImage) throws -> String {
guard let cgImage = image.cgImage else {
throw ClassificationError.invalidImage
}
let model = try PixarPosterClassifier(configuration: .init())
let visionModel = try VNCoreMLModel(for: model.model)
var bestLabel = "Unknown"
let request = VNCoreMLRequest(model: visionModel) { request, _ in
guard let results = request.results as? [VNClassificationObservation],
let topResult = results.first else { return }
bestLabel = topResult.identifier // "ToyStory", "FindingNemo", etc.
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
return bestLabel
}
No network calls. No server. Classification happens in milliseconds on the Neural Engine.
Training a Text Classifier
Create ML is not limited to images. Suppose your Pixar fan app has a reviews section and you want to automatically flag the sentiment of each review — positive, negative, or neutral.
Preparing Text Data
Text classifiers accept a JSON or CSV file with text and label columns:
[
{ "text": "Toy Story 3 made me cry. Beautiful.", "label": "positive" },
{ "text": "Cars 2 was disappointing.", "label": "negative" },
{ "text": "The animation in Coco was impressive.", "label": "neutral" },
{ "text": "Inside Out changed how I think.", "label": "positive" },
{ "text": "Brave felt generic, not Pixar.", "label": "negative" }
]
You need at least 100 labeled examples for reasonable accuracy. For production use, aim for 500+ per category.
Training the Classifier
Use MLTextClassifier to train:
import CreateML
import Foundation
let trainingFile = URL(fileURLWithPath: "/Users/you/reviews_training.json")
let testingFile = URL(fileURLWithPath: "/Users/you/reviews_testing.json")
let trainingData = try MLDataTable(contentsOf: trainingFile)
let testingData = try MLDataTable(contentsOf: testingFile)
let classifier = try MLTextClassifier(
trainingData: trainingData,
textColumn: "text",
labelColumn: "label"
)
let metrics = classifier.evaluation(
on: testingData,
textColumn: "text",
labelColumn: "label"
)
print("Evaluation accuracy: \(1.0 - metrics.classificationError)")
let modelURL = URL(fileURLWithPath: "/Users/you/ReviewSentiment.mlmodel")
try classifier.write(to: modelURL)
At inference time, the generated ReviewSentiment class exposes a simple prediction(text:) method:
import CoreML
let model = try ReviewSentiment(configuration: .init())
let prediction = try model.prediction(
text: "Up is the most heartwarming Pixar film ever made."
)
print(prediction.label) // "positive"
Tip: Create ML’s text classifier uses transfer learning on top of Apple’s built-in word embeddings. You do not need to handle tokenization, stemming, or any NLP preprocessing — the framework handles it.
Training a Sound Classifier
Your Pixar fan app could also identify iconic movie sounds — Buzz Lightyear’s laser, Nemo’s bubbles, WALL-E’s boot-up
chirp. MLSoundClassifier works the same way.
Preparing Audio Data
Organize audio clips the same way you organize images — one folder per label:
PixarSounds/
├── Training/
│ ├── BuzzLaser/
│ │ ├── clip_001.wav
│ │ └── ...
│ ├── NemoBubbles/
│ │ └── ...
│ └── WallEBoot/
│ └── ...
└── Testing/
└── ...
Each clip should be 1-10 seconds long. Create ML resamples audio to 16 kHz mono internally.
Training and Exporting
import CreateML
import Foundation
let trainingDir = URL(
fileURLWithPath: "/Users/you/PixarSounds/Training"
)
let testingDir = URL(
fileURLWithPath: "/Users/you/PixarSounds/Testing"
)
let parameters = MLSoundClassifier.ModelParameters(
maxIterations: 50
)
let classifier = try MLSoundClassifier(
trainingData: .labeledDirectories(at: trainingDir),
parameters: parameters
)
let evaluation = classifier.evaluation(
on: .labeledDirectories(at: testingDir)
)
print("Classification error: \(evaluation.classificationError)")
try classifier.write(
to: URL(fileURLWithPath: "/Users/you/PixarSoundClassifier.mlmodel")
)
At runtime, pair the compiled model with
SNClassifySoundRequest from the
SoundAnalysis framework for real-time audio classification from the microphone or an audio file.
Advanced Usage
Custom Training with MLJob for Async Workflows
For long-running training sessions, MLImageClassifier.train returns an
MLJob that publishes progress via Combine. This is
especially useful in macOS apps that wrap a training UI:
import CreateML
import Combine
var cancellables = Set<AnyCancellable>()
let job = try MLImageClassifier.train(
trainingData: .labeledDirectories(at: trainingDir),
parameters: parameters
)
job.progress
.publisher(for: \.fractionCompleted)
.sink { fraction in
print("Training progress: \(Int(fraction * 100))%")
}
.store(in: &cancellables)
job.result
.sink(receiveCompletion: { completion in
if case .failure(let error) = completion {
print("Training failed: \(error)")
}
}, receiveValue: { classifier in
try? classifier.write(to: modelURL)
print("Model saved successfully")
})
.store(in: &cancellables)
Warning:
MLJobemits on background threads. If you are updating UI, dispatch back to the main actor.
Model Updates with MLUpdateTask
Starting with iOS 13, Core ML supports on-device model personalization. You can ship a base model trained with Create ML, then fine-tune it on the user’s device with their personal data. This is powerful for apps where each user’s classification needs diverge — say, a Pixar memorabilia collector whose personal items the base model has never seen.
The workflow involves marking specific layers as updatable when exporting from Create ML (or using coremltools in
Python), then running MLUpdateTask on-device:
import CoreML
let updatableModelURL = Bundle.main.url(
forResource: "PixarPosterClassifier",
withExtension: "mlmodelc"
)!
let trainingData = try MLArrayBatchProvider(dictionary: [
"image": userImages,
"label": userLabels
])
let updateTask = try MLUpdateTask(
forModelAt: updatableModelURL,
trainingData: trainingData,
configuration: nil,
completionHandler: { context in
let fileManager = FileManager.default
let appSupportURL = fileManager.urls(
for: .applicationSupportDirectory,
in: .userDomainMask
).first!
let personalizedModelURL = appSupportURL
.appendingPathComponent("PersonalizedClassifier.mlmodelc")
try? context.model.write(to: personalizedModelURL)
}
)
updateTask.resume()
Note: On-device updates use the Neural Engine and typically complete in seconds for small datasets. The original model in the app bundle is never modified — the personalized version is stored separately.
Feature Extraction for Custom Pipelines
Sometimes you do not want an end-to-end classifier. You want embeddings — feature vectors — that you feed into your own
logic.
MLImageClassifier.FeatureExtractorType
controls this:
let parameters = MLImageClassifier.ModelParameters(
featureExtractor: .scenePrint(revision: 2),
maxIterations: 25
)
scenePrint is Apple’s built-in feature extractor optimized for scene and object recognition. Revision 2 (available
since macOS 12) provides improved embeddings and is the recommended choice for most image classification tasks.
Performance Considerations
Training Time
Training time depends on three factors: dataset size, model type, and hardware. Here are rough benchmarks on an M2 MacBook Pro:
| Task | Dataset Size | Training Time | Model Size |
|---|---|---|---|
| Image classifier | 500 images, 5 labels | ~90 seconds | ~17 KB |
| Text classifier | 2,000 samples | ~30 seconds | ~200 KB |
| Sound classifier | 300 clips, 5 labels | ~3 minutes | ~500 KB |
The exported .mlmodel files are remarkably small because Create ML uses transfer learning — only the fine-tuned layers
are stored, not the full base model. The base model ships as part of the OS.
Inference Time
On-device inference with Core ML uses the Neural Engine (on A11 or later and all Apple Silicon Macs). For image classification, expect sub-10ms inference per image. Text classification is even faster at sub-1ms. Sound classification operates in real time on streaming audio.
Apple Docs:
MLImageClassifier— Create ML
Model Size vs. Accuracy Trade-Off
Create ML supports different feature extractors with different accuracy/size trade-offs. For image classification, you
can choose between scenePrint (smaller, faster, good for most tasks) and full neural network transfer learning
(larger, more accurate for complex tasks). Start with the defaults and only increase complexity if your validation
metrics justify it.
Tip: Use the Preview tab in the Create ML app to test your model with drag-and-drop images before exporting. This saves the round-trip of integrating into your Xcode project just to check accuracy.
When to Use (and When Not To)
| Scenario | Recommendation |
|---|---|
| Classifying images into known categories | Create ML image classifier is ideal. Transfer learning excels here. |
| Sentiment or topic classification on text | Create ML text classifier works well. For granular NLP, see the Natural Language framework. |
| Recognizing specific sounds | Create ML sound classifier. For general audio, consider SoundAnalysis built-in classifiers first. |
| Generating text, images, or creative content | Not a Create ML use case. See Foundation Models. |
| Object detection with bounding boxes | Create ML supports MLObjectDetector, but you need bounding-box annotations. Consider Vision’s built-in detectors first. |
| Fewer than 10 samples per category | Results will be unreliable. Gather more data or use Apple’s built-in classifiers. |
| Real-time on-device training | Use MLUpdateTask for personalization. Full training still requires macOS. |
Tip: Before training a custom model, check if Apple already provides a built-in classifier for your use case. The Vision framework includes animal, plant, food, and scene classifiers out of the box. The SoundAnalysis framework ships with a pre-trained classifier covering 300+ everyday sounds.
Summary
- Create ML lets you train image, text, and sound classifiers on macOS using transfer learning — no Python, no data science background required.
- The Create ML app provides a visual training interface; the Swift API enables scripted and automated pipelines.
- Transfer learning means small datasets (as few as 10 images per category) can produce surprisingly accurate models.
- Exported
.mlmodelfiles are compact and run on the Neural Engine, delivering sub-10ms image classification on modern devices. - On-device model updates via
MLUpdateTasklet you personalize models with user data without shipping it to a server.
If your classification needs go beyond static labels and into generative AI territory, explore Apple’s Foundation Models Framework for on-device LLM capabilities. For richer text analysis beyond classification, the Natural Language Framework offers tokenization, entity recognition, and sentiment analysis out of the box.