Getting Started with visionOS: Spatial Computing Concepts for iOS Developers
Your TabView, NavigationStack, List, and Text views all run on visionOS without modification. The problem is
that spatial computing introduces three entirely new dimensions — literally — and the mental models you use to design
flat screens break down fast when the “screen” is the room itself.
This guide maps your existing SwiftUI knowledge onto visionOS’s spatial model: what a Window, Volume, and Immersive Space are, how RealityKit slots into SwiftUI, how gestures and hand tracking work in three dimensions, and how to add a visionOS destination to an existing iOS app. We won’t cover ARKit scene reconstruction or enterprise passthrough workflows — those deserve their own deep-dives.
Note: All spatial APIs discussed in this post require
visionOS 1.0or later. Conditional compilation with#if os(visionOS)lets you share code with iOS targets.
Contents
- The Problem: No Screen to Design For
- The Three Environments
- Windows and Volumes
- SwiftUI in visionOS: What’s the Same, What’s Different
- RealityKit Basics: Entity-Component Architecture
- Spatial Gestures and Hand Tracking
- Immersive Spaces
- Porting an iOS App to visionOS
- Advanced: Scene Understanding and World Anchors
- When to Use (and When Not To)
- Summary
The Problem: No Screen to Design For
An iOS developer opening their first visionOS project in Xcode faces an immediate disorientation: the simulator renders an infinite room. There is no phone outline, no bezels, no clear boundary for where the UI lives.
Here is what a naive port looks like. You launch your Pixar film catalog app, and it floats in space as a flat window —
which actually works fine. But the moment you try to add a 3D model of the WALL-E robot next to it, you have no idea
where to put it. You try adding it inside a ZStack with a depth offset. It doesn’t compile. You look for a “scene
origin.” There isn’t one in the UIKit sense.
// ❌ This is the iOS instinct — it doesn't model spatial placement correctly
struct ContentView: View {
var body: some View {
ZStack {
FilmListView()
// Where does the 3D model go? There's no z-axis here.
WallEModelView()
.offset(x: 200, y: 0) // This only moves it in 2D
}
}
}
The confusion stems from a fundamental shift: visionOS does not have a single coordinate space tied to a screen. Instead, it has three distinct environment types, each with different rules about what lives where and how it interacts with the rest of the world.
The Three Environments
Apple Docs: Immersive experiences — visionOS
visionOS organizes app presentation into three levels of immersion. Understanding which environment to use is the first architectural decision you make for any visionOS feature.
Shared Space
The Shared Space is visionOS’s default environment, analogous to the desktop on macOS. Your app’s windows float alongside windows from other apps. The user can position them freely in the room.
This is where most of your existing SwiftUI views run unmodified. The Pixar film catalog’s list, navigation, and forms work exactly as on iOS. The trade-off is that you share the visual field with other apps and have no control over the surrounding environment.
Full Space
A Full Space hides all other apps and gives your app exclusive access to the visual field. You can render windows and 3D content anywhere, but the real world (passthrough) is still visible. This is appropriate for productivity apps or immersive storytelling experiences like a Pixar short film viewer that dims everything else.
Immersive Space
An Immersive Space is the most extreme option: you can optionally reduce or disable passthrough entirely, transporting the user into a fully rendered environment — a virtual Pixar studio lot, for instance. This requires the most deliberate UX design and should be reserved for experiences where total immersion is the explicit goal.
Windows and Volumes
Apple Docs:
WindowGroup— SwiftUI
The App protocol in visionOS uses the same scene-based architecture as macOS and iOS, but adds new scene types.
WindowGroup: Flat 2D Windows
The familiar WindowGroup creates a flat 2D window that floats in the Shared Space. Your existing SwiftUI views drop in
without changes.
@main
struct PixarVisionApp: App {
var body: some Scene {
// Flat 2D window — works exactly like iOS/macOS
WindowGroup {
FilmCatalogView()
}
}
}
Volumetric Windows
A WindowGroup with a .volumetric style creates a three-dimensional bounding box — a “volume” — in which RealityKit
entities can live. The volume has defined width, height, and depth measured in meters.
@main
struct PixarVisionApp: App {
var body: some Scene {
WindowGroup {
FilmCatalogView()
}
// A 3D volume for RealityKit content
WindowGroup(id: "character-viewer") {
CharacterVolumeView()
}
.windowStyle(.volumetric) // ← Enables 3D content
.defaultSize(width: 0.4, height: 0.4, depth: 0.4, in: .meters)
}
}
The volume acts as a self-contained 3D canvas. Entities inside it are clipped at the volume boundaries, so users can pick up and reposition the whole volume like a snow globe containing a WALL-E diorama.
@available(visionOS 1.0, *)
struct CharacterVolumeView: View {
var body: some View {
RealityView { content in
// Load a 3D WALL-E model placed at the volume's origin
if let wallE = try? await Entity(named: "WALL-E", in: .main) {
wallE.position = [0, -0.15, 0] // Slightly below center
content.add(wallE)
}
}
}
}
Tip: Use centimeter-scale offsets (e.g.,
[0, -0.15, 0]for 15 cm below the origin) when positioning entities inside a volume. The default coordinate system is 1 unit = 1 meter.
SwiftUI in visionOS: What’s the Same, What’s Different
What Works Without Changes
The vast majority of SwiftUI Just Works on visionOS:
NavigationStack,TabView,List,FormText,Button,Toggle,Picker@State,@Observable,@Environment(as well as the legacy@StateObjectand@EnvironmentObject)NavigationLink, sheets, alerts, and confirmations- Custom
Shapeimplementations andCanvas
Ornaments
Apple Docs: Ornaments — visionOS
Ornaments are UI elements that attach to a window but float slightly in front of it in 3D space, detached from the window plane. They are the visionOS equivalent of a toolbar or bottom bar, but exist in their own spatial layer.
@available(visionOS 1.0, *)
struct FilmDetailView: View {
let film: PixarFilm
var body: some View {
FilmContentView(film: film)
.ornament(attachmentAnchor: .scene(.bottom)) {
// This toolbar floats below the window in 3D space
HStack(spacing: 20) {
Button("Watch Trailer") { /* ... */ }
Button("Add to Watchlist") { /* ... */ }
}
.padding()
.glassBackgroundEffect() // Frosted glass material
}
}
}
Hover Effects
Apple Docs:
hoverEffect(_:)— SwiftUI
On visionOS, the primary pointing mechanism is gaze (looking at something) combined with a pinch gesture. The
.hoverEffect() modifier provides visual feedback when the user looks at an interactive element.
@available(visionOS 1.0, *)
struct FilmCardView: View {
let film: PixarFilm
var body: some View {
VStack {
AsyncImage(url: film.posterURL)
.frame(width: 160, height: 240)
Text(film.title)
.font(.headline)
}
.padding()
.hoverEffect(.highlight) // Subtle glow when the user looks at this card
.onTapGesture {
// Triggered by looking + pinching
}
}
}
Depth Modifiers
SwiftUI on visionOS gains a Z-axis. You can push views toward or away from the viewer using depth modifiers.
@available(visionOS 1.0, *)
struct DepthLayeredBadgeView: View {
var body: some View {
ZStack {
// Background panel sits flat
RoundedRectangle(cornerRadius: 12)
.fill(.regularMaterial)
.frame(width: 200, height: 120)
// Title floats 8 points toward the viewer
Text("Toy Story")
.font(.title2.bold())
.offset(z: 8)
// Badge floats even further forward — emphasizes importance
Image(systemName: "star.fill")
.foregroundStyle(.yellow)
.offset(z: 20)
}
}
}
RealityKit Basics: Entity-Component Architecture
Apple Docs: RealityKit — Apple Developer
RealityKit uses an Entity-Component (EC) architecture. Unlike UIKit’s view hierarchy, 3D scenes are composed of:
- Entities — objects in 3D space (a WALL-E model, a light source, an invisible collision shape)
- Components — data and behavior attached to entities (
ModelComponent,CollisionComponent,AnimationComponent,PhysicsBodyComponent)
An entity is meaningless without components; components are inert without an entity to attach to. This separation makes it straightforward to add physics to an existing 3D model without changing its visual representation.
RealityView: The SwiftUI Bridge
Apple Docs:
RealityView— RealityKit
RealityView is the SwiftUI view type that hosts RealityKit content. It has two closures: a make closure (called once
to build the initial scene) and an optional update closure (called when SwiftUI state changes).
@available(visionOS 1.0, *)
struct PixarSceneView: View {
@State private var isAnimating = false
var body: some View {
RealityView { content in
// make closure — runs once on first render
guard let woody = try? await Entity(named: "Woody", in: .main) else {
return
}
woody.position = [0, 0, -1.5] // 1.5 meters in front
woody.scale = [0.5, 0.5, 0.5] // Scale to 50% of asset size
content.add(woody)
} update: { content in
// update closure — called when SwiftUI state changes
guard let woody = content.entities.first else { return }
if isAnimating {
// Play all available animations on the entity
woody.availableAnimations.forEach { woody.playAnimation($0) }
}
}
.gesture(
TapGesture()
.targetedToAnyEntity()
.onEnded { value in
// Tapping directly on the entity triggers this
value.entity.availableAnimations.forEach { value.entity.playAnimation($0) }
isAnimating = true
}
)
}
}
The .targetedToAnyEntity() modifier on the gesture is critical — without it, taps anywhere in the view trigger the
handler, not just taps on specific entities.
Custom Components
When Apple’s built-in components don’t cover your use case, you can define your own by conforming to Component.
@available(visionOS 1.0, *)
struct CharacterMetadataComponent: Component {
var filmTitle: String
var releaseYear: Int
var isProtagonist: Bool
}
// Attaching the custom component to an entity
extension Entity {
func configureAsPixarCharacter(film: String, year: Int, protagonist: Bool) {
components[CharacterMetadataComponent.self] = CharacterMetadataComponent(
filmTitle: film,
releaseYear: year,
isProtagonist: protagonist
)
}
}
Spatial Gestures and Hand Tracking
Apple Docs: Spatial input — visionOS
visionOS input works through gaze targeting combined with hand gestures. The system uses eyes to determine what you’re interacting with and hands to determine how.
Standard SwiftUI Gestures on visionOS
Most SwiftUI gestures work on visionOS flat windows without changes. The .onTapGesture, DragGesture, and
LongPressGesture all function as expected on 2D content.
Spatial Tap Gesture
For entities in a RealityView, use
SpatialTapGesture to receive the 3D tap
location.
@available(visionOS 1.0, *)
struct CharacterStageView: View {
var body: some View {
RealityView { content in
if let buzz = try? await Entity(named: "BuzzLightyear", in: .main) {
buzz.generateCollisionShapes(recursive: true) // Required for tap targeting
buzz.position = [0, 0, -1]
content.add(buzz)
}
}
.gesture(
SpatialTapGesture()
.targetedToAnyEntity()
.onEnded { value in
let tapLocation = value.location3D // 3D position in scene space
let tappedEntity = value.entity
print("Tapped \(tappedEntity.name) at \(tapLocation)")
}
)
}
}
Warning: Entities must have a
CollisionComponent(or call.generateCollisionShapes(recursive: true)) before they can receive targeted gestures. Without it, the gesture’s.targetedToAnyEntity()modifier will never fire.
Drag Gesture in 3D
DragGesture targeted to entities provides a translation3D property on visionOS, letting you move entities in all
three dimensions.
@available(visionOS 1.0, *)
struct DraggableCharacterView: View {
@State private var characterOffset: SIMD3<Float> = .zero
var body: some View {
RealityView { content in
if let rex = try? await Entity(named: "Rex", in: .main) {
rex.generateCollisionShapes(recursive: true)
rex.position = characterOffset
content.add(rex)
}
} update: { content in
content.entities.first?.position = characterOffset
}
.gesture(
DragGesture()
.targetedToAnyEntity()
.onChanged { value in
// translation3D gives full 3D movement vector
let t = value.translation3D
characterOffset = SIMD3<Float>(
Float(t.x) * 0.001,
Float(t.y) * -0.001,
Float(t.z) * 0.001
)
}
)
}
}
Immersive Spaces
Apple Docs:
ImmersiveSpace— SwiftUI
An ImmersiveSpace scene type transitions the user from the Shared Space into a Full or Immersive Space. You define it
in your App, then open it programmatically using the openImmersiveSpace environment action.
Defining an Immersive Space
@available(visionOS 1.0, *)
@main
struct PixarVisionApp: App {
var body: some Scene {
WindowGroup {
FilmCatalogView()
}
// Full immersive experience — Pixar movie theater environment
ImmersiveSpace(id: "theater") {
TheaterImmersiveView()
}
.immersionStyle(selection: .constant(.full), in: .full)
}
}
Opening and Dismissing
The openImmersiveSpace and dismissImmersiveSpace environment values are the correct way to transition between
environments. Do not attempt to dismiss spaces by navigating or popping views.
@available(visionOS 1.0, *)
struct FilmDetailView: View {
@Environment(\.openImmersiveSpace) private var openImmersiveSpace
@Environment(\.dismissImmersiveSpace) private var dismissImmersiveSpace
@State private var isInTheater = false
var body: some View {
VStack {
Text("Toy Story")
.font(.largeTitle)
Button(isInTheater ? "Exit Theater" : "Enter Theater") {
Task {
if isInTheater {
await dismissImmersiveSpace()
} else {
await openImmersiveSpace(id: "theater")
}
isInTheater.toggle()
}
}
}
}
}
Warning: Only one
ImmersiveSpacecan be open at a time. Attempting to open a second immersive space while one is already open will fail silently. Always dismiss the current space before opening a new one.
Porting an iOS App to visionOS
Adding a visionOS destination to an existing iOS app takes minutes in Xcode and requires surprisingly little code change.
Step 1: Add the visionOS Destination
In Xcode, select your app target, go to General > Supported Destinations, and click + to add Apple Vision Pro. Xcode adds the required platform entry to your project.
Step 2: Conditional Code with #if os(visionOS)
Most of your existing code compiles on visionOS without changes. For features that require platform-specific behavior, use conditional compilation.
struct FilmCatalogApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
#if os(visionOS)
// Only available on visionOS — 3D character viewer
WindowGroup(id: "character-viewer") {
CharacterVolumeView()
}
.windowStyle(.volumetric)
.defaultSize(width: 0.5, height: 0.5, depth: 0.5, in: .meters)
#endif
}
}
Step 3: Audit Deprecated UIKit Patterns
Some UIKit patterns have no direct visionOS equivalent:
UINavigationControllerpush/pop — useNavigationStack(works on visionOS)UITabBarController— useTabView(works on visionOS)- Manual
UIWindowmanagement — replace with SwiftUI scene declarations - Proximity sensor and accelerometer input — not available on Vision Pro
Step 4: Design for the Input Model
The biggest behavioral change is the input model. On visionOS there is no touch screen. Interactive elements must:
- Be large enough to gaze-target (Apple recommends at least 44×44 pt)
- Respond to
.hoverEffect()to confirm gaze focus - Support the pinch-to-tap gesture rather than assuming a touch
// ✅ visionOS-friendly interactive element
Button(action: { playFilm() }) {
Label("Play", systemImage: "play.fill")
.frame(minWidth: 44, minHeight: 44) // Meets minimum target size
}
.hoverEffect(.highlight)
// ❌ Too small for reliable gaze targeting
Image(systemName: "play.fill")
.onTapGesture { playFilm() }
.frame(width: 20, height: 20)
Advanced: Scene Understanding and World Anchors
For apps that need to place content relative to the physical room — placing Woody on the actual coffee table, for instance — visionOS provides ARKit scene understanding and world anchors.
Apple Docs: ARKit on visionOS — ARKit
World Anchors
WorldAnchor lets you pin content to a fixed position in
the real world that persists across sessions. The anchor is stored in a WorldTrackingProvider and survives app
restarts — Woody stays on the coffee table even after you close and reopen the app.
@available(visionOS 1.0, *)
func anchorWoodyToTable(at transform: simd_float4x4) async throws {
let session = ARKitSession()
let worldTracking = WorldTrackingProvider()
try await session.run([worldTracking])
// Create a persistent anchor at the given world transform
let anchor = WorldAnchor(originFromAnchorTransform: transform)
try await worldTracking.addAnchor(anchor)
// Store the anchor ID to restore the placement next session
UserDefaults.standard.set(anchor.id.uuidString, forKey: "woodyAnchorID")
}
Plane Detection
PlaneDetectionProvider identifies horizontal and vertical surfaces — floors, tables, walls — in the user’s
environment. This lets you snap entities to real surfaces rather than floating them arbitrarily in space.
@available(visionOS 1.0, *)
func detectSurfaces() async throws {
let session = ARKitSession()
let planeDetection = PlaneDetectionProvider(alignments: [.horizontal])
try await session.run([planeDetection])
for await update in planeDetection.anchorUpdates {
switch update.event {
case .added, .updated:
let plane = update.anchor
// plane.geometry gives you the mesh of the detected surface
print("Found \(plane.alignment) surface at \(plane.originFromAnchorTransform)")
case .removed:
break
}
}
}
Note: ARKit scene understanding requires the
NSWorldSensingUsageDescriptionkey in yourInfo.plistand explicit user authorization viaARKitSession.requestAuthorization(for:).
When to Use (and When Not To)
| Content Type | Recommended Scene | Reason |
|---|---|---|
| Standard app UI (lists, forms, navigation) | Shared Space WindowGroup | Existing SwiftUI works as-is; user can multitask |
| Inline 3D model display (product viewer, character showcase) | Volumetric WindowGroup | Self-contained 3D canvas; repositionable by the user |
| Focused single-app experience (presentation, film viewer) | Full Space | Reduces visual noise; other apps still accessible via Digital Crown |
| Fully immersive environment (game, virtual studio, theater) | ImmersiveSpace with .full style | Maximum immersion; use sparingly and always provide a clear exit |
| Real-world surface placement (furniture, art, characters) | Shared Space + ARKit anchors | Lets content coexist with the physical world |
| Performance-critical 3D rendering (complex scenes, particle systems) | ImmersiveSpace | Exclusive GPU access; no compositing with other apps |
Warning: Entering an
ImmersiveSpaceis a significant UX transition. Always give users a clear, discoverable way to exit. Digital Crown always exits immersive experiences, but in-app exit controls reduce friction.
Summary
- visionOS has three environments: Shared Space (alongside other apps), Full Space (exclusive but passthrough visible), and Immersive Space (optionally fully immersive).
WindowGroupcreates flat 2D windows. Adding.windowStyle(.volumetric)creates a 3D bounding box for RealityKit content.- Most SwiftUI views work unchanged. visionOS-specific additions include ornaments,
.hoverEffect(), depth modifiers like.offset(z:), andRealityView. - RealityKit uses Entity-Component architecture.
RealityViewbridges RealityKit into the SwiftUI view hierarchy. - Spatial gestures like
SpatialTapGestureand 3DDragGesturerequire entities to haveCollisionComponentattached. ImmersiveSpacescenes are opened and dismissed with environment values, not navigation. Only one can be open at a time.- Adding visionOS to an existing iOS app is straightforward — most SwiftUI code compiles without changes. Use
#if os(visionOS)for platform-specific scenes.
Spatial computing’s input model — gaze + pinch — is the most significant behavioral difference from iOS, and it should inform every interactive element you design. From here, exploring SwiftUI animations will give you the motion vocabulary that makes spatial UI feel alive rather than static.