Google has published an overview on its AI Blog that details how it developed the technology that powers the Portrait Light feature found on its newer Pixel smartphones.
While the entire process is performed on the device using machine learning, not everything about the process is artificial. Google explains how it used real-world examples to create two machine learning algorithms that ‘help create attractive lighting at any moment for every portrait — all on your mobile device.’
The first is Automatic Light Placement. This machine learning model attempts to replicate a photographer’s job of assessing the lighting in a scene and compensating with artificial light accordingly to achieve the best possible portrait.
To do this, Google says it first ‘estimate[s] a high dynamic range, omnidirectional illumination profile for a scene based on an input portrait’ using technology Google researchers detailed in a white paper earlier this year. The result is a sphere that ‘infers the direction, relative intensity, and color of all light sources in the scene coming from all directions, considering the face as a light probe.’ Google also estimates the head positioning of the subject using MediaPipe Face Mesh, a neural network-powered tool that tracks the position of a subject’s face in real-time using 468 3D ‘face landmarks.’
Google uses this information to determine where the synthetic light should be positioned using real-world studio portrait examples. Specifically, Google says it tries to recreate ‘a classic portrait look, enhancing any pre-existing lighting directionality in the scene while targeting a balanced, subtle key-to-fill lighting ratio of about 2:1.’
Using all of the data Portrait Light has parsed thus far, Google then uses the second tool, Data-Driven Portrait Relighting, to ‘add the illumination from a directional light source to the original photograph.’
To make the lighting look as realistic as possible, Google trained the machine learning model using ‘millions of pairs of portraits both with and without extra light.’ To create this dataset, Google used the Light Stage computational illumination system, pictured below, which is a ‘spherical lighting rig includes 64 cameras with different viewpoints and 331 individually-programmable LED light sources.’
Google photographed multiple subjects with varying face shapes, genders, skin tones, hairstyles and more to create a diverse dataset for the neural network to work with. As for how each person was photographed using Light Stage, Google breaks down the process:
‘We photographed each individual illuminated one-light-at-a-time (OLAT) by each light, which generates their reflectance field — or their appearance as illuminated by the discrete sections of the spherical environment. The reflectance field encodes the unique color and light-reflecting properties of the subject’s skin, hair, and clothing — how shiny or dull each material appears. Due to the superposition principle for light, these OLAT images can then be linearly added together to render realistic images of the subject as they would appear in any image-based lighting environment, with complex light transport phenomena like subsurface scattering correctly represented.’
To ensure the process of applying lighting affects to the image is as efficient and effective as possible, the team trained the relighting model to output a low-resolution quotient image. This allows the process to be less resource-intensive and ‘encourages only low-frequency lighting changes, without impacting high-frequency image details.’
All of this information alone though doesn’t account for various parts of a subject’s face being closer to or further away from the synthetic light source. To keep the synthetic lighting as realistic as possible, Google applies Lambert’s law to the input image to create a ‘light visibility map’ for the desired synthetic lighting direction, which results in final product that more accurate represents studio lighting.
Google notes the entire process results in a model capable of running on mobile devices and which takes up just under 10MB—impressive considering how much is going on behind-the-scenes.