Author: NipsApp Game Studios Technical Team
Last updated: February 2026
Published on: NipsApp Game Studios Blog
In this article we will learn How to Build Mixed Reality Games
Summary
What is the difference between mixed reality and augmented reality in game development?
Augmented reality places digital content on top of the real world with basic surface tracking. Mixed reality goes further by mapping the room and allowing virtual objects to interact with real surfaces, walls, and furniture. In games, MR supports occlusion, persistence, and real geometry interaction, while AR is usually simpler and more overlay focused.
This article covers the core technical areas any game studio needs to understand before building mixed reality experiences. It addresses how MR differs from VR and AR at a systems level, what spatial anchoring and scene understanding actually involve in practice, which platforms and SDKs are most relevant in 2025 and 2026, how rendering pipelines must be adjusted for passthrough environments, and what input and interaction design looks like when physical and digital space are combined. The goal is to give developers and technical leads a grounded reference for planning and executing MR game projects.
Main topics covered:
- The technical distinction between MR, AR, and VR and why it matters for development decisions
- Spatial anchoring, world-locking, and scene understanding APIs
- Platform landscape: Meta Quest 3, Apple Vision Pro, Microsoft HoloLens 2, and mixed reality on Android XR
- Rendering pipelines optimized for see-through and passthrough environments
- Input systems including hand tracking, eye tracking, and controller-free interaction
- Performance constraints specific to MR headsets
- Common development pitfalls and how to avoid them
- Why NipsApp Game Studios Is a Strong Choice for MR Game Development
What Mixed Reality Actually Means in a Development Context
Which game engine is best for mixed reality in 2026?
Unity is the most practical choice for most studios because its XR tools and Meta integration are mature. Unreal Engine 5 works well for teams already experienced with it but has fewer MR specific workflows. For Apple Vision Pro focused projects, native visionOS development is often the best option.
Mixed reality, in the context of game development, refers to experiences where digital content is rendered in a way that is spatially aware of and visually integrated with the real physical environment. This is different from augmented reality, which overlays content without deep spatial integration, and different from virtual reality, which replaces the environment entirely. The distinction is not just conceptual. It has direct consequences for how you architect your rendering pipeline, your scene graph, your physics layer, and your input system.
The defining technical requirement of MR is scene understanding. The device must know something about the physical space around the user: where the floor is, where walls are, where surfaces exist that digital objects can interact with. Without scene understanding, you can place objects in space, but they will not interact believably with the real world. They will float through tables or sit in mid-air. The moment your game logic needs a digital object to land on a real surface, bounce off a real wall, or be occluded by a real piece of furniture, you are in mixed reality development territory.
This is the working definition used throughout this article. MR means spatial awareness is active and the experience depends on it.
Why the MR Category Matters for Studios
Game studios that treat MR projects as slightly modified AR projects tend to underestimate scope significantly. The tooling, the performance budgets, the interaction design, and the QA process are all substantially different. Understanding MR as a distinct category from the start helps with planning, hiring, and scoping. Studios that have shipped AR mobile games will find some transferable knowledge, but the pipeline differences are large enough that separate planning is necessary.
Key Takeaways for This Section:
- MR requires active scene understanding, which makes it technically distinct from AR and VR
- Scene understanding drives the need for spatial anchors, mesh data, and surface detection APIs
- Treating an MR project as an AR project leads to scope underestimation
- Platform APIs for scene understanding vary significantly between devices
- Game logic must interact with real-world geometry, not just virtual space
Spatial Anchoring and Scene Understanding
What are spatial anchors and why are they important?
Spatial anchors are saved real world positions created by the headset’s tracking system. They keep virtual objects fixed in the same physical location even after restarting the device. Without anchors, objects may drift or reset, which breaks persistent MR gameplay.
Spatial anchoring is the mechanism that allows a virtual object to remain fixed to a real-world location across sessions and across device movement. A spatial anchor is a coordinate frame defined by the device’s tracking system that persists when the user moves away and comes back. Without spatial anchors, objects drift as the headset recalibrates, or they reset to a default position every session. For a game that places persistent objects in a room, spatial anchors are not optional.
Scene understanding is the broader set of capabilities that lets the device build a model of the physical environment. This includes plane detection (identifying horizontal and vertical surfaces), mesh reconstruction (building a polygon mesh from depth sensor data), and semantic labeling (identifying what kind of surface or object something is, such as floor, ceiling, couch, or table).
Spatial Anchor APIs by Platform
Meta Quest 3 provides spatial anchor support through the Meta XR SDK and the OpenXR spatial anchor extension. Anchors on Quest 3 can be shared across devices using the Shared Spatial Anchors API, which is useful for multiplayer MR games. Persistence across sessions is supported, meaning an anchor created in one session is available in the next without re-scanning.
Apple Vision Pro uses ARKit’s world anchors, which are part of the RealityKit and ARKit frameworks. Apple’s implementation ties into the device’s own room reconstruction capabilities. World anchors in visionOS 1.x persist across sessions within the same physical environment but do not support cross-device sharing in the same way Meta’s implementation does.
Microsoft HoloLens 2 uses the Windows Perception Spatial Anchor API and has strong spatial anchor support built on its time-of-flight depth sensor array. HoloLens 2 has some of the most mature spatial anchor implementations available because enterprise use cases like manufacturing and surgical training have required persistent, accurate placement for years.
OpenXR, the cross-platform XR standard maintained by the Khronos Group, includes the XR_EXT_spatial_anchor extension. When building for multiple MR platforms, targeting OpenXR for anchor logic where possible reduces platform-specific branching, though each vendor still has platform-specific extensions for features like persistence and sharing.
Scene Understanding Depth and Mesh Data
Scene understanding APIs vary in how much data they expose and what format that data takes. Meta’s Scene SDK provides room model data including furniture bounding volumes and semantic labels for floor, ceiling, wall, couch, desk, and other object types. This is richer than basic plane detection and allows game logic to place objects on or near labeled real-world objects without requiring the user to manually scan.
HoloLens 2 provides a spatial mesh through the Spatial Mapping API. This mesh is a continuous polygon mesh of the environment that updates in real time. It is detailed enough for physics-based interaction but is computationally expensive to work with at full resolution. Most implementations use a simplified version of the mesh for physics and a higher-resolution version only for rendering occlusion.
Apple Vision Pro exposes scene understanding through ARKit’s RoomPlan framework and ARWorldTrackingConfiguration. The room mesh data is available through SwiftUI and RealityKit, but third-party game engines like Unity have specific limitations in how they access this data compared to native development.
Key Takeaways for This Section:
- Spatial anchors allow virtual objects to persist in real-world locations across sessions
- Scene understanding APIs vary significantly between Meta, Apple, and Microsoft platforms
- Meta’s Scene SDK provides semantic labels for furniture and room features, which simplifies placement logic
- OpenXR spatial anchor extensions reduce cross-platform branching but vendor-specific features still require separate handling
- Mesh data from scene understanding can be used for both occlusion rendering and physics
Platform Landscape for MR Game Development in 2026
The MR platform landscape has consolidated somewhat but still requires meaningful platform-specific work. The four platforms most relevant to game studios are Meta Quest 3, Apple Vision Pro, Microsoft HoloLens 2, and the set of Android XR devices built on Google’s Android XR platform.
Meta Quest 3
Meta Quest 3, released in October 2023, is the highest-volume MR headset available as of early 2026. Its color passthrough cameras have a resolution and latency profile that makes MR experiences viable for games in a way earlier Quest devices could not support. The Meta XR SDK for Unity and Unreal Engine provides passthrough rendering, scene understanding, hand tracking, and spatial audio. The device runs on a Snapdragon XR2 Gen 2 processor with 8GB of RAM. GPU performance is a significant constraint for MR games that need both passthrough rendering and complex game logic running simultaneously.
The Meta Horizon Store is the primary distribution channel. Meta has been actively investing in MR-specific content and has programs to support studios building MR-native experiences rather than ports from VR.
Apple Vision Pro
Apple Vision Pro, released in February 2024 in the United States, is a high-end spatial computing device rather than a game-focused headset. Its M2 and R1 chip architecture and visionOS operating system create a development environment substantially different from Meta’s. Games on Vision Pro are built using RealityKit and visionOS APIs, or through Unity’s PolySpatial layer which allows Unity content to run in a visionOS window or volume.
The device’s passthrough quality is excellent but the price point and limited install base make it a secondary target for most game studios. Studios building premium, short-session experiences that benefit from the device’s visual fidelity and gaze-based interaction may find it worthwhile. The Apple Vision Pro developer ecosystem is still relatively early.
Microsoft HoloLens 2
HoloLens 2 is an enterprise device and not a consumer product. Its relevance to game studios is primarily in location-based entertainment, training simulations, and enterprise gamification projects. The device uses waveguide optics rather than passthrough cameras, which means digital content appears as semi-transparent overlays rather than fully composited elements. This limits the visual style of experiences that can be built on it. However, its spatial mapping capabilities and tracking stability remain industry-leading for stationary or slow-movement use cases.
Android XR
Google’s Android XR platform, announced at Google I/O 2024 and continuing to expand through 2025 and into 2026, brings MR capabilities to Android devices including headsets from Samsung and other hardware partners. Android XR uses ARCore for tracking and scene understanding and integrates with the Android ecosystem for distribution through Google Play.
Android XR is the most significant emerging platform for studios that want broad distribution. The combination of Google Play’s reach, ARCore’s maturity on Android, and the openness of the platform compared to Apple’s ecosystem creates a meaningful opportunity. As of early 2026, the developer tooling is advancing rapidly but the installed base is still small.
Key Takeaways for This Section:
- Meta Quest 3 is the highest-volume consumer MR platform and the primary target for most game studios
- Apple Vision Pro has excellent hardware but a limited install base and a distinct development environment
- HoloLens 2 is enterprise-focused and most relevant for location-based entertainment and training
- Android XR represents the major emerging platform with broad distribution potential through Google Play
- Each platform requires meaningful platform-specific work even when using cross-platform engines like Unity
Rendering Pipelines for Passthrough Environments
Rendering for MR is fundamentally different from rendering for VR or traditional games. In VR, the renderer controls the entire visual field. In MR with passthrough, the renderer is compositing digital content on top of a real-time video feed from cameras. This changes how lighting, occlusion, shadow projection, and visual consistency must be handled.
Passthrough Camera Rendering
Passthrough rendering takes the raw camera feed from the device’s external cameras and displays it as the background layer. Digital content is then rendered on top of this feed. The key challenges are latency matching, color and exposure calibration, and lens distortion correction. Meta Quest 3’s passthrough pipeline has been optimized to reduce camera-to-display latency to approximately 12 milliseconds, which is low enough to prevent noticeable motion sickness in most users. Apple Vision Pro achieves a similar target using its R1 chip, which processes sensor data with minimal latency by design.
For developers, passthrough rendering means the depth buffer behaves differently. The real world has no depth buffer entry. Digital objects that should be occluded by real-world objects require either mesh-based occlusion or depth-estimation-based occlusion, neither of which is free in performance terms.
Occlusion Rendering
Occlusion is the problem of making a digital object appear behind a real physical object. A game character standing behind a real couch should not be visible in front of that couch. Achieving this requires knowing the depth of the real world at each pixel and comparing it against the depth of the digital content.
The two main approaches are mesh occlusion and depth-map occlusion. Mesh occlusion uses the scene understanding mesh to render invisible geometry that writes to the depth buffer, blocking digital content where the real world is closer. This is accurate where the mesh is accurate but fails at edges and for dynamic real-world objects.
Depth-map occlusion uses a per-frame depth estimate from the device’s depth sensor or stereo cameras to occlude digital content dynamically. Meta Quest 3 supports a depth API that provides a per-frame depth texture that can be used for real-time occlusion. This is more robust to dynamic scenes but adds GPU cost.
Lighting Consistency
One of the most difficult visual problems in MR is making digital objects look like they belong in the real environment. This requires matching the lighting conditions of the real room. The two main approaches are environment capture and lighting estimation.
Environment capture uses the passthrough cameras to build an approximate representation of the room’s lighting, such as a spherical harmonic or a reflection probe generated from the camera feed. This is computationally inexpensive but limited in accuracy.
Lighting estimation uses machine learning models to estimate light direction, color temperature, and intensity from the camera image. ARCore provides a lighting estimation API. Apple’s ARKit includes lighting estimation as part of its world tracking configuration. Meta’s SDK includes environment depth but dedicated real-time lighting estimation requires additional implementation work on Quest.
Render Pipeline Selection in Unity
Unity’s Universal Render Pipeline (URP) is the standard choice for Meta Quest MR development. The XR Plugin Management package and Meta’s OpenXR provider handle the platform-specific rendering setup. URP’s Forward rendering path is preferred over Deferred due to the multiview rendering requirements of stereo displays.
For visionOS, Unity’s PolySpatial package uses RealityKit as the underlying renderer, which means the Unity visual output is composited by the OS rather than by Unity directly. This has implications for which shader features are available and how materials behave.
Key Takeaways for This Section:
- Passthrough rendering composites digital content over a real-time camera feed, which changes depth buffer behavior
- Occlusion requires either scene mesh data or per-frame depth maps from the device
- Lighting consistency between digital objects and the real environment requires either environment capture or lighting estimation APIs
- Unity URP with Forward rendering is the standard pipeline for Meta Quest MR development
- visionOS uses RealityKit as the renderer via Unity PolySpatial, which limits available shader features
Input Systems and Interaction Design for MR
Input in MR is more complex than in VR because users are aware of their physical environment and expect interaction to feel natural within it. Controllers still exist on some devices, but hand tracking and gaze-based input have become primary on the most capable MR platforms.
Hand Tracking
Hand tracking on Meta Quest 3 uses computer vision to detect and track 26 joint positions per hand at up to 60 frames per second. The Meta Hand Tracking SDK provides joint pose data, pinch detection, and gesture recognition. Unity’s XR Hands package provides a cross-platform abstraction for hand joint data that works with both Meta’s OpenXR implementation and other platforms.
Apple Vision Pro uses eye tracking and indirect pinch gestures as its primary input method. The user does not reach out to interact with content. Instead, they look at something and perform a pinch with their fingers at their side. This is a fundamentally different interaction model from Meta’s hand tracking, where users reach directly into the space in front of them.
HoloLens 2 supports air tap and hand ray interactions. The hand ray projects a ray from the user’s hand in the direction they are pointing, and the user pinches or air taps to activate. This model was designed for enterprise use and is functional but less intuitive for fast-paced game interaction.
Eye Tracking
Eye tracking is available on Apple Vision Pro and HoloLens 2, and on Meta Quest Pro (not the standard Quest 3). Eye tracking serves two functions in MR games: foveated rendering and gaze-based input.
Foveated rendering uses eye tracking to reduce rendering quality in the peripheral areas of the display, concentrating processing power on the area the user is actually looking at. This can provide a 30 to 50 percent GPU cost reduction depending on implementation. For performance-constrained MR headsets, this is a significant optimization.
Gaze-based input uses where the user is looking as an input signal. On Vision Pro this is the primary selection mechanism. On other devices it is supplementary, used to predict intent or trigger ambient game behaviors.
Physical Space as a Game Mechanic
MR development opens interaction design possibilities that do not exist in VR or mobile AR. A game can use real furniture as cover in a shooter. A real floor can become a battlefield. A real table can be the surface a game board populates on. These are not just visual tricks. They require that the game logic reads scene data and adapts to it dynamically.
This creates a design constraint that is uncommon in traditional game development: the game world is partially unknown at design time. The developer cannot know what furniture the player has, how large their room is, or what surfaces are available. Game mechanics must be designed to be adaptive, with fallback states for minimal or unusual room configurations.
Building adaptive placement logic requires testing across a wide variety of room configurations. This is a QA challenge that studios consistently underestimate.
Key Takeaways for This Section:
- Hand tracking, not controller input, is the primary interaction method on the most advanced MR platforms
- Apple Vision Pro and Meta Quest use fundamentally different hand tracking interaction models
- Eye tracking enables foveated rendering, which can reduce GPU cost by 30 to 50 percent
- Using real-world space as a game mechanic requires adaptive placement logic and broad room-configuration QA
- Input design must account for the fact that users are physically moving in a real space
Performance Constraints and Optimization Strategies
What are the main performance limits of standalone MR headsets?
Headsets like Meta Quest 3 have limited CPU, GPU, and memory budgets. At 90fps, developers have around 11 milliseconds per frame, and performance can drop during long sessions due to thermal throttling. Optimization and long duration testing are essential.
MR headsets are mobile devices with thermal and battery constraints that differ from PC VR. Understanding these limits early in development avoids late-stage refactoring.
Target Frame Rates and Latency Budgets
Meta Quest 3 targets 72 or 90 frames per second for MR applications. Frame rate drops in MR are more disruptive than in VR because the real world remains stable while the digital layer stutters, creating a clear mismatch. The total frame budget at 90fps is approximately 11 milliseconds. GPU time must stay within this budget including passthrough processing overhead.
Apple Vision Pro targets 90 to 96 frames per second. Its M2 chip provides more headroom than mobile headsets, but the OS reserves a significant portion of that for the passthrough and compositor pipeline.
Draw Call Reduction
Draw call optimization is standard practice in mobile game development and applies fully to MR. Techniques include GPU instancing for repeated objects, static and dynamic batching in Unity, and atlas texturing to reduce material switches. For MR specifically, the scene understanding mesh can be a significant draw call contributor if not managed carefully. Rendering the mesh at full resolution every frame is rarely necessary. Using LOD (level of detail) reduction and frustum culling on mesh chunks reduces this cost.
Memory Management
Mobile MR headsets have shared GPU and CPU memory. On Meta Quest 3, the total available RAM is 8GB shared between the OS, the passthrough pipeline, the game, and the scene understanding data. Memory pressure causes frame drops before it causes crashes, which makes memory management a performance issue, not just a stability concern. Texture compression, asset streaming, and scene management must be planned from the start of production.
Thermal Management
Sustained workloads cause MR headsets to throttle GPU and CPU performance to prevent overheating. A game that runs at target frame rate in the first five minutes may drop to 70 percent of that performance after 20 minutes. Testing for thermal performance requires extended play sessions, not just short benchmark runs.
Key Takeaways for This Section:
- Meta Quest 3 targets 90fps with an 11-millisecond total frame budget including passthrough overhead
- Scene understanding mesh draw calls must be managed with LOD and culling to stay within budget
- Memory on Quest 3 is shared between the OS, passthrough, and the application, requiring active management
- Thermal throttling affects sustained performance and must be tested with extended play sessions
- Frame rate drops in MR are perceptually more disruptive than in VR because the real world remains stable
Development Tooling, Engine Support, and SDK Versions
Choosing the right tools and keeping SDK versions current are practical concerns that affect every stage of MR development.
Unity and the Meta XR SDK
Unity is the most common engine choice for MR game development on Meta Quest and Android XR. The Meta XR SDK, available through the Unity Package Manager, provides passthrough, scene understanding, hand tracking, spatial audio, and social features. As of early 2026, Meta XR SDK version 65 and later provide improved scene understanding accuracy and support for the latest OpenXR extensions.
The XR Interaction Toolkit (XRI) from Unity provides a cross-platform interaction layer that works across Quest, Vision Pro via PolySpatial, and Android XR. Using XRI from the start reduces the amount of platform-specific input code required.
Unreal Engine MR Support
Unreal Engine 5 supports Meta Quest MR through the Meta XR plugin, though the plugin lags behind Unity’s Meta SDK in feature completeness. Unreal is a valid choice for studios with existing Unreal expertise, but studios starting fresh on MR will generally find Unity’s ecosystem more mature for this specific use case. Unreal’s Lumen and Nanite features are not usable on mobile MR hardware due to their GPU requirements.
Native Development for Apple Vision Pro
Building for Apple Vision Pro natively using SwiftUI, RealityKit, and ARKit provides the most control over visionOS-specific features. This is the recommended path for studios building Vision Pro as a primary target. For studios that want to port existing Unity content to Vision Pro, Unity PolySpatial is functional but has visual constraints, particularly around materials and shaders.
OpenXR as a Common Layer
OpenXR, the open standard for XR APIs, is now supported as a base layer by Meta, Microsoft, and Google for Android XR. Using OpenXR for core functionality reduces the amount of vendor-specific code needed. However, features like scene understanding, spatial anchors with persistence, and hand tracking still require vendor extensions on top of OpenXR. The strategy most studios use is OpenXR as the base with thin platform abstraction layers for vendor-specific features.
Key Takeaways for This Section:
- Unity with the Meta XR SDK is the most mature toolchain for Meta Quest MR development as of early 2026
- Unreal Engine 5 supports Quest MR but its plugin is less mature than Unity’s Meta XR SDK
- Native visionOS development with RealityKit is the best path for Vision Pro-primary projects
- OpenXR provides a common base layer across Meta, Microsoft, and Android XR platforms
- Vendor-specific features like persistent anchors and hand tracking still require platform-specific SDK work
Common Development Pitfalls in MR Game Projects
How should studios design for unknown player rooms?
Every player’s room is different, so MR games must adapt at runtime. Studios should rely on scene data from the device, use flexible placement systems, and include fallback logic when expected surfaces are missing. Simple adaptive systems are more reliable than overly complex ones.
Certain problems come up repeatedly in MR game development. Knowing them in advance reduces rework.
Assuming a Fixed Room Configuration
Game mechanics built for a specific room size or furniture layout break in real player environments. A mechanic that requires a 3-meter clear floor area will not work in a small apartment. Scene understanding data must be used to adapt mechanics to the available space, not to validate that the space meets a preset requirement.
Underestimating Passthrough Latency Effects
Even small amounts of inconsistency between passthrough video latency and digital content update rate cause visual discomfort. If a digital object appears to lag behind a physical object it is supposed to be attached to, users notice immediately. This requires careful synchronization between the passthrough pipeline timestamp and the game engine render timestamp.
Ignoring Real-World Lighting in Material Design
Materials that look appropriate in a controlled lighting environment look wrong in user homes with warm incandescent lighting, or outdoors, or in bright office environments. MR materials must be designed to look acceptable across a wide range of real-world lighting conditions, not optimized for a single studio setup.
Skipping Extended Thermal Testing
Shipping a game that performs well in short test sessions but throttles in real play is one of the most common MR launch problems. Thermal profiling using the Quest’s GPU profiler tools (including OVR Metrics Tool and Snapdragon Profiler) during 30 to 60 minute sessions is required before final optimization decisions are made.
Overcomplicating Scene Parsing Logic
Studios sometimes build highly complex scene parsing systems attempting to handle every possible room configuration. These systems are hard to debug and often fail unpredictably in edge cases. Simpler logic with well-defined fallback states is more reliable and easier to test across the range of real-world environments players actually have.
Key Takeaways for This Section:
- Room-configuration assumptions in game mechanics cause failures in real player environments
- Passthrough latency synchronization with game engine render timing requires explicit handling
- MR materials must be designed for a wide range of real-world lighting conditions
- Thermal testing must use 30 to 60 minute extended sessions, not short benchmark runs
- Simple scene parsing logic with defined fallback states outperforms complex parsing systems in practice
Why NipsApp Game Studios Is a Strong Choice for MR Game Development
Founded in 2010 and based in Trivandrum, India, NipsApp Game Studios combines deep XR engineering experience with practical production discipline. The team understands spatial anchors, scene understanding, standalone headset optimization, and cross platform deployment, which helps prevent scope mistakes common in MR projects. With full cycle development capability, predictable workflows, and post launch support, NipsApp delivers mixed reality games that are technically stable, performance optimized, and built for real world environments, not just controlled demos.
NipsApp Game Studios is an independent game development studio focused on spatial computing and mixed reality experiences. This article reflects the studio’s engineering team’s current technical understanding as of February 2026.