Create ML: Training Custom On-Device Models Without a Data Science Background


You have hundreds of user-submitted photos that need classifying, review text that needs sentiment scoring, or ambient sounds your app should recognize — and you do not have a machine learning team. Core ML handles inference beautifully, but where do the models come from? Apple’s answer is Create ML: a framework (and companion macOS app) that lets you train production-quality models using transfer learning, without writing a single line of Python.

This post covers training image classifiers, text classifiers, and sound classifiers with Create ML’s Swift API and the Create ML app. We will not cover Core ML model integration (that is its own dedicated post) or Apple’s Foundation Models framework for generative AI.

Contents

The Problem

Suppose you are building an app that lets users catalog their Pixar movie poster collection. They snap a photo, and the app should recognize which film it belongs to — Toy Story, Finding Nemo, Inside Out, and so on. You could call a cloud vision API, but that means network latency, per-request cost, and sending user photos off-device.

A naive approach might look like this:

import UIKit

func classifyPoster(_ image: UIImage) async throws -> String {
    // Ship the image to a cloud endpoint
    let url = URL(string: "https://api.example.com/classify")!
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    request.httpBody = image.jpegData(compressionQuality: 0.8)

    let (data, _) = try await URLSession.shared.data(for: request)
    let result = try JSONDecoder().decode(ClassificationResult.self, from: data)
    return result.label // "Toy Story", "Finding Nemo", etc.
}

This works, but it requires a backend, ongoing server cost, internet connectivity, and raises privacy concerns. You want the classification to happen entirely on-device, with a model you trained yourself from a folder of labeled images. That is exactly what Create ML provides.

Create ML at a Glance

Create ML is Apple’s framework for training machine learning models on macOS. It ships with two interfaces:

  1. Create ML app — A visual tool bundled with Xcode (open it via Xcode > Open Developer Tool > Create ML). Drag-and-drop your data, pick a model type, hit Train, and export a .mlmodel file.
  2. Create ML Swift API — A programmatic interface for scripting training pipelines. Run it in a Swift Playground, a macOS command-line tool, or a test target.

Both produce the same output: a compiled .mlmodel file that you drop into your Xcode project and use via Core ML.

Under the hood, Create ML uses transfer learning — it takes a pre-trained base model (already skilled at recognizing general features like edges, textures, and shapes) and fine-tunes the final layers on your specific data. This is why you can train a useful image classifier with as few as 10 images per category.

Note: Create ML training runs on macOS only. The resulting .mlmodel file deploys to iOS, iPadOS, watchOS, tvOS, visionOS, and macOS.

Training an Image Classifier

Let us train a model that identifies Pixar movie posters. The workflow has three phases: organize data, train, and export.

Organizing Training Data

Create ML expects a folder structure where each subfolder name is a label:

PixarPosters/
├── Training/
│   ├── ToyStory/
│   │   ├── poster_001.jpg
│   │   ├── poster_002.jpg
│   │   └── ... (10+ images)
│   ├── FindingNemo/
│   │   └── ...
│   ├── InsideOut/
│   │   └── ...
│   └── Coco/
│       └── ...
└── Testing/
    ├── ToyStory/
    │   └── ...
    ├── FindingNemo/
    │   └── ...
    └── ...

Aim for at least 10 images per category in the training set and 2-5 in the testing set. More data generally means better accuracy, but transfer learning is remarkably effective even with small datasets.

Training with the Swift API

Create a macOS command-line target or Swift Playground and use MLImageClassifier:

import CreateML
import Foundation

// Point to the organized folder structure
let trainingDir = URL(fileURLWithPath: "/Users/you/PixarPosters/Training")
let testingDir = URL(fileURLWithPath: "/Users/you/PixarPosters/Testing")

// Configure the training parameters
let parameters = MLImageClassifier.ModelParameters(
    maxIterations: 25,
    augmentation: [.crop, .blur, .exposure, .flip, .rotation]
)

// Train — this blocks until training completes
let classifier = try MLImageClassifier(
    trainingData: .labeledDirectories(at: trainingDir),
    parameters: parameters
)

// Evaluate against the held-out test set
let evaluation = classifier.evaluation(
    on: .labeledDirectories(at: testingDir)
)
print("Training accuracy: \(classifier.trainingMetrics.classificationError)")
print("Validation accuracy: \(classifier.validationMetrics.classificationError)")

// Export the compiled model
let modelURL = URL(fileURLWithPath: "/Users/you/PixarPosterClassifier.mlmodel")
try classifier.write(to: modelURL)

The augmentation parameter tells Create ML to generate synthetic variations of your training images (cropping, blurring, flipping, rotating). This dramatically improves generalization when your dataset is small.

Tip: The training process prints progress to the console. On an M1 Mac or later, expect a 5-category image classifier with 50 images per category to train in under two minutes.

Using the Trained Model

Once you drop PixarPosterClassifier.mlmodel into your Xcode project, Xcode auto-generates a Swift class. Inference is straightforward with the Vision framework:

import Vision
import UIKit

func classifyPoster(_ image: UIImage) throws -> String {
    guard let cgImage = image.cgImage else {
        throw ClassificationError.invalidImage
    }

    let model = try PixarPosterClassifier(configuration: .init())
    let visionModel = try VNCoreMLModel(for: model.model)

    var bestLabel = "Unknown"
    let request = VNCoreMLRequest(model: visionModel) { request, _ in
        guard let results = request.results as? [VNClassificationObservation],
              let topResult = results.first else { return }
        bestLabel = topResult.identifier // "ToyStory", "FindingNemo", etc.
    }

    let handler = VNImageRequestHandler(cgImage: cgImage)
    try handler.perform([request])
    return bestLabel
}

No network calls. No server. Classification happens in milliseconds on the Neural Engine.

Training a Text Classifier

Create ML is not limited to images. Suppose your Pixar fan app has a reviews section and you want to automatically flag the sentiment of each review — positive, negative, or neutral.

Preparing Text Data

Text classifiers accept a JSON or CSV file with text and label columns:

[
  { "text": "Toy Story 3 made me cry. Beautiful.", "label": "positive" },
  { "text": "Cars 2 was disappointing.", "label": "negative" },
  { "text": "The animation in Coco was impressive.", "label": "neutral" },
  { "text": "Inside Out changed how I think.", "label": "positive" },
  { "text": "Brave felt generic, not Pixar.", "label": "negative" }
]

You need at least 100 labeled examples for reasonable accuracy. For production use, aim for 500+ per category.

Training the Classifier

Use MLTextClassifier to train:

import CreateML
import Foundation

let trainingFile = URL(fileURLWithPath: "/Users/you/reviews_training.json")
let testingFile = URL(fileURLWithPath: "/Users/you/reviews_testing.json")

let trainingData = try MLDataTable(contentsOf: trainingFile)
let testingData = try MLDataTable(contentsOf: testingFile)

let classifier = try MLTextClassifier(
    trainingData: trainingData,
    textColumn: "text",
    labelColumn: "label"
)

let metrics = classifier.evaluation(
    on: testingData,
    textColumn: "text",
    labelColumn: "label"
)
print("Evaluation accuracy: \(1.0 - metrics.classificationError)")

let modelURL = URL(fileURLWithPath: "/Users/you/ReviewSentiment.mlmodel")
try classifier.write(to: modelURL)

At inference time, the generated ReviewSentiment class exposes a simple prediction(text:) method:

import CoreML

let model = try ReviewSentiment(configuration: .init())
let prediction = try model.prediction(
    text: "Up is the most heartwarming Pixar film ever made."
)
print(prediction.label) // "positive"

Tip: Create ML’s text classifier uses transfer learning on top of Apple’s built-in word embeddings. You do not need to handle tokenization, stemming, or any NLP preprocessing — the framework handles it.

Training a Sound Classifier

Your Pixar fan app could also identify iconic movie sounds — Buzz Lightyear’s laser, Nemo’s bubbles, WALL-E’s boot-up chirp. MLSoundClassifier works the same way.

Preparing Audio Data

Organize audio clips the same way you organize images — one folder per label:

PixarSounds/
├── Training/
│   ├── BuzzLaser/
│   │   ├── clip_001.wav
│   │   └── ...
│   ├── NemoBubbles/
│   │   └── ...
│   └── WallEBoot/
│       └── ...
└── Testing/
    └── ...

Each clip should be 1-10 seconds long. Create ML resamples audio to 16 kHz mono internally.

Training and Exporting

import CreateML
import Foundation

let trainingDir = URL(
    fileURLWithPath: "/Users/you/PixarSounds/Training"
)
let testingDir = URL(
    fileURLWithPath: "/Users/you/PixarSounds/Testing"
)

let parameters = MLSoundClassifier.ModelParameters(
    maxIterations: 50
)

let classifier = try MLSoundClassifier(
    trainingData: .labeledDirectories(at: trainingDir),
    parameters: parameters
)

let evaluation = classifier.evaluation(
    on: .labeledDirectories(at: testingDir)
)
print("Classification error: \(evaluation.classificationError)")

try classifier.write(
    to: URL(fileURLWithPath: "/Users/you/PixarSoundClassifier.mlmodel")
)

At runtime, pair the compiled model with SNClassifySoundRequest from the SoundAnalysis framework for real-time audio classification from the microphone or an audio file.

Advanced Usage

Custom Training with MLJob for Async Workflows

For long-running training sessions, MLImageClassifier.train returns an MLJob that publishes progress via Combine. This is especially useful in macOS apps that wrap a training UI:

import CreateML
import Combine

var cancellables = Set<AnyCancellable>()

let job = try MLImageClassifier.train(
    trainingData: .labeledDirectories(at: trainingDir),
    parameters: parameters
)

job.progress
    .publisher(for: \.fractionCompleted)
    .sink { fraction in
        print("Training progress: \(Int(fraction * 100))%")
    }
    .store(in: &cancellables)

job.result
    .sink(receiveCompletion: { completion in
        if case .failure(let error) = completion {
            print("Training failed: \(error)")
        }
    }, receiveValue: { classifier in
        try? classifier.write(to: modelURL)
        print("Model saved successfully")
    })
    .store(in: &cancellables)

Warning: MLJob emits on background threads. If you are updating UI, dispatch back to the main actor.

Model Updates with MLUpdateTask

Starting with iOS 13, Core ML supports on-device model personalization. You can ship a base model trained with Create ML, then fine-tune it on the user’s device with their personal data. This is powerful for apps where each user’s classification needs diverge — say, a Pixar memorabilia collector whose personal items the base model has never seen.

The workflow involves marking specific layers as updatable when exporting from Create ML (or using coremltools in Python), then running MLUpdateTask on-device:

import CoreML

let updatableModelURL = Bundle.main.url(
    forResource: "PixarPosterClassifier",
    withExtension: "mlmodelc"
)!

let trainingData = try MLArrayBatchProvider(dictionary: [
    "image": userImages,
    "label": userLabels
])

let updateTask = try MLUpdateTask(
    forModelAt: updatableModelURL,
    trainingData: trainingData,
    configuration: nil,
    completionHandler: { context in
        let fileManager = FileManager.default
        let appSupportURL = fileManager.urls(
            for: .applicationSupportDirectory,
            in: .userDomainMask
        ).first!
        let personalizedModelURL = appSupportURL
            .appendingPathComponent("PersonalizedClassifier.mlmodelc")
        try? context.model.write(to: personalizedModelURL)
    }
)

updateTask.resume()

Note: On-device updates use the Neural Engine and typically complete in seconds for small datasets. The original model in the app bundle is never modified — the personalized version is stored separately.

Feature Extraction for Custom Pipelines

Sometimes you do not want an end-to-end classifier. You want embeddings — feature vectors — that you feed into your own logic. MLImageClassifier.FeatureExtractorType controls this:

let parameters = MLImageClassifier.ModelParameters(
    featureExtractor: .scenePrint(revision: 2),
    maxIterations: 25
)

scenePrint is Apple’s built-in feature extractor optimized for scene and object recognition. Revision 2 (available since macOS 12) provides improved embeddings and is the recommended choice for most image classification tasks.

Performance Considerations

Training Time

Training time depends on three factors: dataset size, model type, and hardware. Here are rough benchmarks on an M2 MacBook Pro:

TaskDataset SizeTraining TimeModel Size
Image classifier500 images, 5 labels~90 seconds~17 KB
Text classifier2,000 samples~30 seconds~200 KB
Sound classifier300 clips, 5 labels~3 minutes~500 KB

The exported .mlmodel files are remarkably small because Create ML uses transfer learning — only the fine-tuned layers are stored, not the full base model. The base model ships as part of the OS.

Inference Time

On-device inference with Core ML uses the Neural Engine (on A11 or later and all Apple Silicon Macs). For image classification, expect sub-10ms inference per image. Text classification is even faster at sub-1ms. Sound classification operates in real time on streaming audio.

Apple Docs: MLImageClassifier — Create ML

Model Size vs. Accuracy Trade-Off

Create ML supports different feature extractors with different accuracy/size trade-offs. For image classification, you can choose between scenePrint (smaller, faster, good for most tasks) and full neural network transfer learning (larger, more accurate for complex tasks). Start with the defaults and only increase complexity if your validation metrics justify it.

Tip: Use the Preview tab in the Create ML app to test your model with drag-and-drop images before exporting. This saves the round-trip of integrating into your Xcode project just to check accuracy.

When to Use (and When Not To)

ScenarioRecommendation
Classifying images into known categoriesCreate ML image classifier is ideal. Transfer learning excels here.
Sentiment or topic classification on textCreate ML text classifier works well. For granular NLP, see the Natural Language framework.
Recognizing specific soundsCreate ML sound classifier. For general audio, consider SoundAnalysis built-in classifiers first.
Generating text, images, or creative contentNot a Create ML use case. See Foundation Models.
Object detection with bounding boxesCreate ML supports MLObjectDetector, but you need bounding-box annotations. Consider Vision’s built-in detectors first.
Fewer than 10 samples per categoryResults will be unreliable. Gather more data or use Apple’s built-in classifiers.
Real-time on-device trainingUse MLUpdateTask for personalization. Full training still requires macOS.

Tip: Before training a custom model, check if Apple already provides a built-in classifier for your use case. The Vision framework includes animal, plant, food, and scene classifiers out of the box. The SoundAnalysis framework ships with a pre-trained classifier covering 300+ everyday sounds.

Summary

  • Create ML lets you train image, text, and sound classifiers on macOS using transfer learning — no Python, no data science background required.
  • The Create ML app provides a visual training interface; the Swift API enables scripted and automated pipelines.
  • Transfer learning means small datasets (as few as 10 images per category) can produce surprisingly accurate models.
  • Exported .mlmodel files are compact and run on the Neural Engine, delivering sub-10ms image classification on modern devices.
  • On-device model updates via MLUpdateTask let you personalize models with user data without shipping it to a server.

If your classification needs go beyond static labels and into generative AI territory, explore Apple’s Foundation Models Framework for on-device LLM capabilities. For richer text analysis beyond classification, the Natural Language Framework offers tokenization, entity recognition, and sentiment analysis out of the box.