Getting Started with visionOS: Spatial Computing Concepts for iOS Developers


Your TabView, NavigationStack, List, and Text views all run on visionOS without modification. The problem is that spatial computing introduces three entirely new dimensions — literally — and the mental models you use to design flat screens break down fast when the “screen” is the room itself.

This guide maps your existing SwiftUI knowledge onto visionOS’s spatial model: what a Window, Volume, and Immersive Space are, how RealityKit slots into SwiftUI, how gestures and hand tracking work in three dimensions, and how to add a visionOS destination to an existing iOS app. We won’t cover ARKit scene reconstruction or enterprise passthrough workflows — those deserve their own deep-dives.

Note: All spatial APIs discussed in this post require visionOS 1.0 or later. Conditional compilation with #if os(visionOS) lets you share code with iOS targets.

Contents

The Problem: No Screen to Design For

An iOS developer opening their first visionOS project in Xcode faces an immediate disorientation: the simulator renders an infinite room. There is no phone outline, no bezels, no clear boundary for where the UI lives.

Here is what a naive port looks like. You launch your Pixar film catalog app, and it floats in space as a flat window — which actually works fine. But the moment you try to add a 3D model of the WALL-E robot next to it, you have no idea where to put it. You try adding it inside a ZStack with a depth offset. It doesn’t compile. You look for a “scene origin.” There isn’t one in the UIKit sense.

// ❌ This is the iOS instinct — it doesn't model spatial placement correctly
struct ContentView: View {
    var body: some View {
        ZStack {
            FilmListView()
            // Where does the 3D model go? There's no z-axis here.
            WallEModelView()
                .offset(x: 200, y: 0) // This only moves it in 2D
        }
    }
}

The confusion stems from a fundamental shift: visionOS does not have a single coordinate space tied to a screen. Instead, it has three distinct environment types, each with different rules about what lives where and how it interacts with the rest of the world.

The Three Environments

Apple Docs: Immersive experiences — visionOS

visionOS organizes app presentation into three levels of immersion. Understanding which environment to use is the first architectural decision you make for any visionOS feature.

Shared Space

The Shared Space is visionOS’s default environment, analogous to the desktop on macOS. Your app’s windows float alongside windows from other apps. The user can position them freely in the room.

This is where most of your existing SwiftUI views run unmodified. The Pixar film catalog’s list, navigation, and forms work exactly as on iOS. The trade-off is that you share the visual field with other apps and have no control over the surrounding environment.

Full Space

A Full Space hides all other apps and gives your app exclusive access to the visual field. You can render windows and 3D content anywhere, but the real world (passthrough) is still visible. This is appropriate for productivity apps or immersive storytelling experiences like a Pixar short film viewer that dims everything else.

Immersive Space

An Immersive Space is the most extreme option: you can optionally reduce or disable passthrough entirely, transporting the user into a fully rendered environment — a virtual Pixar studio lot, for instance. This requires the most deliberate UX design and should be reserved for experiences where total immersion is the explicit goal.

Windows and Volumes

Apple Docs: WindowGroup — SwiftUI

The App protocol in visionOS uses the same scene-based architecture as macOS and iOS, but adds new scene types.

WindowGroup: Flat 2D Windows

The familiar WindowGroup creates a flat 2D window that floats in the Shared Space. Your existing SwiftUI views drop in without changes.

@main
struct PixarVisionApp: App {
    var body: some Scene {
        // Flat 2D window — works exactly like iOS/macOS
        WindowGroup {
            FilmCatalogView()
        }
    }
}

Volumetric Windows

A WindowGroup with a .volumetric style creates a three-dimensional bounding box — a “volume” — in which RealityKit entities can live. The volume has defined width, height, and depth measured in meters.

@main
struct PixarVisionApp: App {
    var body: some Scene {
        WindowGroup {
            FilmCatalogView()
        }

        // A 3D volume for RealityKit content
        WindowGroup(id: "character-viewer") {
            CharacterVolumeView()
        }
        .windowStyle(.volumetric)           // ← Enables 3D content
        .defaultSize(width: 0.4, height: 0.4, depth: 0.4, in: .meters)
    }
}

The volume acts as a self-contained 3D canvas. Entities inside it are clipped at the volume boundaries, so users can pick up and reposition the whole volume like a snow globe containing a WALL-E diorama.

@available(visionOS 1.0, *)
struct CharacterVolumeView: View {
    var body: some View {
        RealityView { content in
            // Load a 3D WALL-E model placed at the volume's origin
            if let wallE = try? await Entity(named: "WALL-E", in: .main) {
                wallE.position = [0, -0.15, 0] // Slightly below center
                content.add(wallE)
            }
        }
    }
}

Tip: Use centimeter-scale offsets (e.g., [0, -0.15, 0] for 15 cm below the origin) when positioning entities inside a volume. The default coordinate system is 1 unit = 1 meter.

SwiftUI in visionOS: What’s the Same, What’s Different

What Works Without Changes

The vast majority of SwiftUI Just Works on visionOS:

  • NavigationStack, TabView, List, Form
  • Text, Button, Toggle, Picker
  • @State, @Observable, @Environment (as well as the legacy @StateObject and @EnvironmentObject)
  • NavigationLink, sheets, alerts, and confirmations
  • Custom Shape implementations and Canvas

Ornaments

Apple Docs: Ornaments — visionOS

Ornaments are UI elements that attach to a window but float slightly in front of it in 3D space, detached from the window plane. They are the visionOS equivalent of a toolbar or bottom bar, but exist in their own spatial layer.

@available(visionOS 1.0, *)
struct FilmDetailView: View {
    let film: PixarFilm

    var body: some View {
        FilmContentView(film: film)
            .ornament(attachmentAnchor: .scene(.bottom)) {
                // This toolbar floats below the window in 3D space
                HStack(spacing: 20) {
                    Button("Watch Trailer") { /* ... */ }
                    Button("Add to Watchlist") { /* ... */ }
                }
                .padding()
                .glassBackgroundEffect() // Frosted glass material
            }
    }
}

Hover Effects

Apple Docs: hoverEffect(_:) — SwiftUI

On visionOS, the primary pointing mechanism is gaze (looking at something) combined with a pinch gesture. The .hoverEffect() modifier provides visual feedback when the user looks at an interactive element.

@available(visionOS 1.0, *)
struct FilmCardView: View {
    let film: PixarFilm

    var body: some View {
        VStack {
            AsyncImage(url: film.posterURL)
                .frame(width: 160, height: 240)
            Text(film.title)
                .font(.headline)
        }
        .padding()
        .hoverEffect(.highlight) // Subtle glow when the user looks at this card
        .onTapGesture {
            // Triggered by looking + pinching
        }
    }
}

Depth Modifiers

SwiftUI on visionOS gains a Z-axis. You can push views toward or away from the viewer using depth modifiers.

@available(visionOS 1.0, *)
struct DepthLayeredBadgeView: View {
    var body: some View {
        ZStack {
            // Background panel sits flat
            RoundedRectangle(cornerRadius: 12)
                .fill(.regularMaterial)
                .frame(width: 200, height: 120)

            // Title floats 8 points toward the viewer
            Text("Toy Story")
                .font(.title2.bold())
                .offset(z: 8)

            // Badge floats even further forward — emphasizes importance
            Image(systemName: "star.fill")
                .foregroundStyle(.yellow)
                .offset(z: 20)
        }
    }
}

RealityKit Basics: Entity-Component Architecture

Apple Docs: RealityKit — Apple Developer

RealityKit uses an Entity-Component (EC) architecture. Unlike UIKit’s view hierarchy, 3D scenes are composed of:

  • Entities — objects in 3D space (a WALL-E model, a light source, an invisible collision shape)
  • Components — data and behavior attached to entities (ModelComponent, CollisionComponent, AnimationComponent, PhysicsBodyComponent)

An entity is meaningless without components; components are inert without an entity to attach to. This separation makes it straightforward to add physics to an existing 3D model without changing its visual representation.

RealityView: The SwiftUI Bridge

Apple Docs: RealityView — RealityKit

RealityView is the SwiftUI view type that hosts RealityKit content. It has two closures: a make closure (called once to build the initial scene) and an optional update closure (called when SwiftUI state changes).

@available(visionOS 1.0, *)
struct PixarSceneView: View {
    @State private var isAnimating = false

    var body: some View {
        RealityView { content in
            // make closure — runs once on first render
            guard let woody = try? await Entity(named: "Woody", in: .main) else {
                return
            }
            woody.position = [0, 0, -1.5]  // 1.5 meters in front
            woody.scale = [0.5, 0.5, 0.5]  // Scale to 50% of asset size
            content.add(woody)
        } update: { content in
            // update closure — called when SwiftUI state changes
            guard let woody = content.entities.first else { return }
            if isAnimating {
                // Play all available animations on the entity
                woody.availableAnimations.forEach { woody.playAnimation($0) }
            }
        }
        .gesture(
            TapGesture()
                .targetedToAnyEntity()
                .onEnded { value in
                    // Tapping directly on the entity triggers this
                    value.entity.availableAnimations.forEach { value.entity.playAnimation($0) }
                    isAnimating = true
                }
        )
    }
}

The .targetedToAnyEntity() modifier on the gesture is critical — without it, taps anywhere in the view trigger the handler, not just taps on specific entities.

Custom Components

When Apple’s built-in components don’t cover your use case, you can define your own by conforming to Component.

@available(visionOS 1.0, *)
struct CharacterMetadataComponent: Component {
    var filmTitle: String
    var releaseYear: Int
    var isProtagonist: Bool
}

// Attaching the custom component to an entity
extension Entity {
    func configureAsPixarCharacter(film: String, year: Int, protagonist: Bool) {
        components[CharacterMetadataComponent.self] = CharacterMetadataComponent(
            filmTitle: film,
            releaseYear: year,
            isProtagonist: protagonist
        )
    }
}

Spatial Gestures and Hand Tracking

Apple Docs: Spatial input — visionOS

visionOS input works through gaze targeting combined with hand gestures. The system uses eyes to determine what you’re interacting with and hands to determine how.

Standard SwiftUI Gestures on visionOS

Most SwiftUI gestures work on visionOS flat windows without changes. The .onTapGesture, DragGesture, and LongPressGesture all function as expected on 2D content.

Spatial Tap Gesture

For entities in a RealityView, use SpatialTapGesture to receive the 3D tap location.

@available(visionOS 1.0, *)
struct CharacterStageView: View {
    var body: some View {
        RealityView { content in
            if let buzz = try? await Entity(named: "BuzzLightyear", in: .main) {
                buzz.generateCollisionShapes(recursive: true) // Required for tap targeting
                buzz.position = [0, 0, -1]
                content.add(buzz)
            }
        }
        .gesture(
            SpatialTapGesture()
                .targetedToAnyEntity()
                .onEnded { value in
                    let tapLocation = value.location3D // 3D position in scene space
                    let tappedEntity = value.entity
                    print("Tapped \(tappedEntity.name) at \(tapLocation)")
                }
        )
    }
}

Warning: Entities must have a CollisionComponent (or call .generateCollisionShapes(recursive: true)) before they can receive targeted gestures. Without it, the gesture’s .targetedToAnyEntity() modifier will never fire.

Drag Gesture in 3D

DragGesture targeted to entities provides a translation3D property on visionOS, letting you move entities in all three dimensions.

@available(visionOS 1.0, *)
struct DraggableCharacterView: View {
    @State private var characterOffset: SIMD3<Float> = .zero

    var body: some View {
        RealityView { content in
            if let rex = try? await Entity(named: "Rex", in: .main) {
                rex.generateCollisionShapes(recursive: true)
                rex.position = characterOffset
                content.add(rex)
            }
        } update: { content in
            content.entities.first?.position = characterOffset
        }
        .gesture(
            DragGesture()
                .targetedToAnyEntity()
                .onChanged { value in
                    // translation3D gives full 3D movement vector
                    let t = value.translation3D
                    characterOffset = SIMD3<Float>(
                        Float(t.x) * 0.001,
                        Float(t.y) * -0.001,
                        Float(t.z) * 0.001
                    )
                }
        )
    }
}

Immersive Spaces

Apple Docs: ImmersiveSpace — SwiftUI

An ImmersiveSpace scene type transitions the user from the Shared Space into a Full or Immersive Space. You define it in your App, then open it programmatically using the openImmersiveSpace environment action.

Defining an Immersive Space

@available(visionOS 1.0, *)
@main
struct PixarVisionApp: App {
    var body: some Scene {
        WindowGroup {
            FilmCatalogView()
        }

        // Full immersive experience — Pixar movie theater environment
        ImmersiveSpace(id: "theater") {
            TheaterImmersiveView()
        }
        .immersionStyle(selection: .constant(.full), in: .full)
    }
}

Opening and Dismissing

The openImmersiveSpace and dismissImmersiveSpace environment values are the correct way to transition between environments. Do not attempt to dismiss spaces by navigating or popping views.

@available(visionOS 1.0, *)
struct FilmDetailView: View {
    @Environment(\.openImmersiveSpace) private var openImmersiveSpace
    @Environment(\.dismissImmersiveSpace) private var dismissImmersiveSpace
    @State private var isInTheater = false

    var body: some View {
        VStack {
            Text("Toy Story")
                .font(.largeTitle)

            Button(isInTheater ? "Exit Theater" : "Enter Theater") {
                Task {
                    if isInTheater {
                        await dismissImmersiveSpace()
                    } else {
                        await openImmersiveSpace(id: "theater")
                    }
                    isInTheater.toggle()
                }
            }
        }
    }
}

Warning: Only one ImmersiveSpace can be open at a time. Attempting to open a second immersive space while one is already open will fail silently. Always dismiss the current space before opening a new one.

Porting an iOS App to visionOS

Adding a visionOS destination to an existing iOS app takes minutes in Xcode and requires surprisingly little code change.

Step 1: Add the visionOS Destination

In Xcode, select your app target, go to General > Supported Destinations, and click + to add Apple Vision Pro. Xcode adds the required platform entry to your project.

Step 2: Conditional Code with #if os(visionOS)

Most of your existing code compiles on visionOS without changes. For features that require platform-specific behavior, use conditional compilation.

struct FilmCatalogApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }

        #if os(visionOS)
        // Only available on visionOS — 3D character viewer
        WindowGroup(id: "character-viewer") {
            CharacterVolumeView()
        }
        .windowStyle(.volumetric)
        .defaultSize(width: 0.5, height: 0.5, depth: 0.5, in: .meters)
        #endif
    }
}

Step 3: Audit Deprecated UIKit Patterns

Some UIKit patterns have no direct visionOS equivalent:

  • UINavigationController push/pop — use NavigationStack (works on visionOS)
  • UITabBarController — use TabView (works on visionOS)
  • Manual UIWindow management — replace with SwiftUI scene declarations
  • Proximity sensor and accelerometer input — not available on Vision Pro

Step 4: Design for the Input Model

The biggest behavioral change is the input model. On visionOS there is no touch screen. Interactive elements must:

  1. Be large enough to gaze-target (Apple recommends at least 44×44 pt)
  2. Respond to .hoverEffect() to confirm gaze focus
  3. Support the pinch-to-tap gesture rather than assuming a touch
// ✅ visionOS-friendly interactive element
Button(action: { playFilm() }) {
    Label("Play", systemImage: "play.fill")
        .frame(minWidth: 44, minHeight: 44) // Meets minimum target size
}
.hoverEffect(.highlight)

// ❌ Too small for reliable gaze targeting
Image(systemName: "play.fill")
    .onTapGesture { playFilm() }
    .frame(width: 20, height: 20)

Advanced: Scene Understanding and World Anchors

For apps that need to place content relative to the physical room — placing Woody on the actual coffee table, for instance — visionOS provides ARKit scene understanding and world anchors.

Apple Docs: ARKit on visionOS — ARKit

World Anchors

WorldAnchor lets you pin content to a fixed position in the real world that persists across sessions. The anchor is stored in a WorldTrackingProvider and survives app restarts — Woody stays on the coffee table even after you close and reopen the app.

@available(visionOS 1.0, *)
func anchorWoodyToTable(at transform: simd_float4x4) async throws {
    let session = ARKitSession()
    let worldTracking = WorldTrackingProvider()

    try await session.run([worldTracking])

    // Create a persistent anchor at the given world transform
    let anchor = WorldAnchor(originFromAnchorTransform: transform)
    try await worldTracking.addAnchor(anchor)

    // Store the anchor ID to restore the placement next session
    UserDefaults.standard.set(anchor.id.uuidString, forKey: "woodyAnchorID")
}

Plane Detection

PlaneDetectionProvider identifies horizontal and vertical surfaces — floors, tables, walls — in the user’s environment. This lets you snap entities to real surfaces rather than floating them arbitrarily in space.

@available(visionOS 1.0, *)
func detectSurfaces() async throws {
    let session = ARKitSession()
    let planeDetection = PlaneDetectionProvider(alignments: [.horizontal])

    try await session.run([planeDetection])

    for await update in planeDetection.anchorUpdates {
        switch update.event {
        case .added, .updated:
            let plane = update.anchor
            // plane.geometry gives you the mesh of the detected surface
            print("Found \(plane.alignment) surface at \(plane.originFromAnchorTransform)")
        case .removed:
            break
        }
    }
}

Note: ARKit scene understanding requires the NSWorldSensingUsageDescription key in your Info.plist and explicit user authorization via ARKitSession.requestAuthorization(for:).

When to Use (and When Not To)

Content TypeRecommended SceneReason
Standard app UI (lists, forms, navigation)Shared Space WindowGroupExisting SwiftUI works as-is; user can multitask
Inline 3D model display (product viewer, character showcase)Volumetric WindowGroupSelf-contained 3D canvas; repositionable by the user
Focused single-app experience (presentation, film viewer)Full SpaceReduces visual noise; other apps still accessible via Digital Crown
Fully immersive environment (game, virtual studio, theater)ImmersiveSpace with .full styleMaximum immersion; use sparingly and always provide a clear exit
Real-world surface placement (furniture, art, characters)Shared Space + ARKit anchorsLets content coexist with the physical world
Performance-critical 3D rendering (complex scenes, particle systems)ImmersiveSpaceExclusive GPU access; no compositing with other apps

Warning: Entering an ImmersiveSpace is a significant UX transition. Always give users a clear, discoverable way to exit. Digital Crown always exits immersive experiences, but in-app exit controls reduce friction.

Summary

  • visionOS has three environments: Shared Space (alongside other apps), Full Space (exclusive but passthrough visible), and Immersive Space (optionally fully immersive).
  • WindowGroup creates flat 2D windows. Adding .windowStyle(.volumetric) creates a 3D bounding box for RealityKit content.
  • Most SwiftUI views work unchanged. visionOS-specific additions include ornaments, .hoverEffect(), depth modifiers like .offset(z:), and RealityView.
  • RealityKit uses Entity-Component architecture. RealityView bridges RealityKit into the SwiftUI view hierarchy.
  • Spatial gestures like SpatialTapGesture and 3D DragGesture require entities to have CollisionComponent attached.
  • ImmersiveSpace scenes are opened and dismissed with environment values, not navigation. Only one can be open at a time.
  • Adding visionOS to an existing iOS app is straightforward — most SwiftUI code compiles without changes. Use #if os(visionOS) for platform-specific scenes.

Spatial computing’s input model — gaze + pinch — is the most significant behavioral difference from iOS, and it should inform every interactive element you design. From here, exploring SwiftUI animations will give you the motion vocabulary that makes spatial UI feel alive rather than static.