Apple launches new model that recreates 3D objects with realistic lighting
Photo by Amanz (unsplash.com/@amanz) on Unsplash
While most AI tools can only infer a shape from a photo, Apple’s new model actually rebuilds the object in 3D with consistent reflections and highlights, 9to5Mac reports.
Key Facts
- •Key company: Apple
Apple’s research team unveiled a new AI pipeline called LiTo: Surface Light Field Tokenization, which encodes a single RGB‑depth photograph into a compact 3D latent vector and then decodes it into a full‑geometry model with view‑dependent lighting — 9to5Mac reports. The key breakthrough is the joint representation of shape and surface light field, allowing the system to reproduce specular highlights, Fresnel reflections and other illumination cues that change with viewpoint. Prior methods either reconstructed geometry without accurate lighting or assumed diffuse, view‑independent appearance; LiTo sidesteps both limitations by treating the RGB‑depth image as a sample of the object’s surface light field and learning a unified embedding that captures how light interacts with the material from any angle.
The architecture follows a familiar encoder‑decoder paradigm but operates in a multi‑dimensional latent space rather than raw pixel space. An encoder compresses the input image into a set of latent vectors that encode both the object’s spatial structure and its photometric response. A decoder then expands those vectors back into a full 3D mesh together with a parametric description of view‑dependent reflectance, effectively “painting” the model with realistic highlights as the virtual camera moves — 9to5Mac explains that this approach reproduces complex lighting effects that would otherwise require multiple images taken from different perspectives.
Training the model relied on large collections of RGB‑depth pairs that sample the surface light field across diverse materials and lighting conditions. By randomly subsampling these pairs and feeding them into the encoder, the system learns to infer the underlying latent representation that can later regenerate the missing views. The researchers note that the resulting latent space is compact enough to be stored and queried efficiently, opening the door to real‑time applications such as AR object insertion or on‑device 3D scanning without the computational overhead of traditional multi‑view reconstruction pipelines.
Because LiTo operates from a single image, it dramatically lowers the barrier for developers and end‑users who need 3D assets. Apple’s paper suggests potential integration points across its ecosystem: developers could capture a product photo on an iPhone and instantly obtain a manipulable 3D model with accurate lighting for use in ARKit experiences, while designers might leverage the technology for rapid prototyping without dedicated photogrammetry rigs. The company has not announced a commercial product yet, but the research aligns with Apple’s broader push to embed advanced generative AI directly into its hardware and software stack.
The study also contributes to the academic discourse on latent‑space modeling of visual data. By extending the concept of embeddings—traditionally applied to text tokens like “king” or “queen”—to encompass both geometry and illumination, Apple demonstrates that multi‑modal latent representations can bridge the gap between static image analysis and dynamic scene synthesis. As 9to5Mac points out, this unified approach could inspire future work that combines other sensory modalities, such as audio or tactile feedback, within a single latent framework, further blurring the line between perception and generation in AI‑driven visual computing.
Sources
Reporting based on verified sources and public filings. Sector HQ editorial standards require multi-source attribution.