How Spectacles Use Voice, Object, and Gesture For Seamless AR Experiences

Augmented and Virtual Reality experiences are reshaping the way potential customers interact with products. We believe that the XR product marketing space will continue to grow for years to come.

Introduced in May 2021, the Snap Spectacles is a wearable spatial computing device that integrates audio, video, camera, and microphone to create seamless experiences. Leveraging SnapML, it can track voice, gestures, and external objects for interaction.

Still in its early phases, the spectacles offer only 30 minutes of battery life. This means they are geared for short experiences. As technology improves, however, the Snap Spectacles will provide a wide variety of use cases, spread out over a longer period of time.

Some of Snap Spectacle’s possible use cases include:

AR product experiences
Voice-controlled spatial games
Visual search on real-world objects
Object, Pet, and Car detection

Object Recognition

Sources: Lens Studio, Pexels, JT

The Snap Spectacles use computer vision through an RGB camera to track objects in the external environment. A built in machine learning model can recognize over 1000+ common objects.

For example, users looking at a chair can see “chair” appear as a label on their screen if object recognition and classification are enabled. This feature can even distinguish between different plants and animals.

You can also import your own custom ML model for recognizing objects. This means that the possibilities are practically endless for object detection. Snap has already begun partnering with brands to help consumers identify products in the real world- distinguishing a Ferrari from a Ford Escape, for example.

Sources: Pexels, Lens Studio

Snap Spectacles makes it easier than ever to use visual search, in which you look up info about an object by simply pointing a camera at it. Already, over 400 million people worldwide use visual search. This effectively makes the real world “shoppable”, without needing a full store.

Companies can pay for preferred placement in visual search results, much as they do with text search results. The market for paid visual search results is projected to reach $1.9B by 2025. As more and more people get comfortable with AR, visual search may become the default way to search for certain kinds of information.

Voice Recognition

Snap Spectacles use a built-in microphone and ML software to recognize spoken keywords. Like with object recognition, this feature can recognize many default words, and you can augment with your own set of keywords.

Built-in Snap commands include keywords like “change”, “down”, and “back”. Developers have nearly limitless possibilities here- for example, you could add the keyword “bigger” to make a virtual product appear bigger on screen.

Game developers can use this feature to create voice-command games. Liquid IV is one such game that Fishermen Labs created using MLVoice. Users move a bottle to catch falling water by using commands like “Start”, “Left,” and “Right”.

Voice command on computers has a long history of enabling people with motor impairments. Instead of pushing a button, they can issue a voice command. Snap Spectacles takes this a step further by making voice a seamlessly integrated part of the device ecosystem. Since there’s no builtin keyboard, users interact with voice by default.

Gesture Recognition

Snap Spectacles can track a user’s hands to interpret unique gestures. It does this by tracking 25 different finger joints with the camera to build an internal model of the user’s hands. An ML model then identifies different and unique gestures like thumbs up, pointing, or “Spock”. Similar custom ML models for keyword and object detection, developers can create custom gestures that align with a specific brand.

Source: Lens Studio

For example, an AR experience can be created for Verizon using a “telephone” gesture to initiate a world game lens. Gestures can also be used to trigger any unique event, like controlling objects, enablish artistic visual effects, or playing sound effects.

Gesture recognition can enable people who having difficulties speaking. And some people with hearing impairments may already prefer communicating with sign language, making Snap Spectacles an intuitive transition.

The Future of AR Brand Experiences

Snap Spectacles, although early in development, display a step towards comprehensive AR brand experiences. They take the AR experiences that millions of people use on their phones, and translate them into a fuller, more immersive environment.

Brands will be able to use Snap Spectacles to create shopping without shops. Consumers wearing Snap Spectacles will see brands all around them, and can find more information without even needing to push a button. Brands can also leverage their capital to compete in the rapidly growing domain of visual search.

Sources: Hello Fresh, Pexels, Lens Studio

Consumers will benefit from immediate product info, comparisons, and recommendations. They will be able to “try-out” virtual products from home, and manipulate them with gestures.

And brands will be able to accommodate people with greater neurodiversity and physical diversity. While traditional computers rely on keyboard and mouse, Snap Spectacles provide a wide range of interaction options, including voice and gesture.