How Facebook Is Using AI To Help Bring Photos To Life For The Blind

New Ai technology can interpret the content of photos and provide more context than was possible before. But there’s still a long way to go.

Trevor Thomas is a professional hiker. Heading out on some of America’s most famous trails with his dog, Tennille, for treks of up to six months is par for his course, and as he blows through the miles, he regularly updates his many fans and followers with photos and videos of his exploits.

There’s just one catch: Thomas is blind.

“When people first started talking about Facebook, being blind, I didn’t understand it,” Thomas recalls in a new Facebook Stories video about his use of the social network. “Why would I use this? I can’t see the photos. I don’t care. But now, I can’t understand what I would do without it.”

A new Facebook Stories video spotlights how the social network is helping people like blind hiker Trevor Thomas.

With varying degrees of success, many Internet companies have made it possible for blind people to interact with their services, mainly by converting text to speech for what are called screen readers.

Four years ago, Facebook launched an Accessibility initiative, spearheaded by project lead Jeff Wieland. Over that time, the team has worked on ways to improve Facebook’s usefulness for the blind, the deaf, and people with other disabilities. When it comes to helping the blind navigate Facebook, a main focus has been translating the service’s menu and button structure for screen readers.

“We want to make accessibility part of everything we build,” Wieland tells Fast Company. “We worked closely with [our] Infrastructure team to make sure [core components like menus, buttons, and links] are designed with accessibility in mind.”

Dominated By Visual Content

When Facebook first launched in 2004, it was largely about sharing text-based information. Over the years, though, it has become the world’s largest repository of photographs, and most of Facebook’s 1.5 billion users’ news feeds are now dominated by photos and videos. Until recently, there was nothing even the most sophisticated screen readers could do with that visual content.

Now, Facebook has begun using artificial intelligence to enrich blind users’ experiences, developing algorithms that automatically interpret some photos and videos into spoken words that offer blind users context that had not been possible before.

For example, while a sighted person would have no trouble understanding the impact of a photo of a stunning sunset over San Francisco Bay, a blind person would get nothing out of it without additional context. With Facebook’s AI technology, however, their screen reader could tell them this: “This image may contain ‘nature, outdoor, cloud, grass, horizon, plant, [or a] tree.’”

Another example is of a couple with their child outside a Solvang, California restaurant and its famous windmill. The AI system offered this context for the photo: “This image may contain 3 people, smiling, outdoor.”

These interpretations obviously don’t tell the whole story, but they do fill in some of the blanks without manual help, and for someone like Thomas or Matt King, Facebook’s first blind engineer, and a member of its Accessibility team, that may make a world of difference.

“For the longest time, if you posted a photo, [Facebook] wouldn’t do anything for me,” says Thomas, whose Facebook Stories video is tied to his recent hike along the Appalachian Trail. “Now, if someone posts a photo to Facebook…all I have to do is click on it and it describes it. That’s so empowering, not to have to ask somebody to interpret what somebody has sent me.”

As a Facebook employee, King obviously has a bias. But as a blind user of Facebook himself, as well as a World Wide Web Consortium (W3C) editor and authoring-practices task force chair for the LINK Accessible Rich Internet Applications (ARIA) technical specification, he has an interest in the experience getting better—and he lauded the application of AI to accessibility.

“It takes it from zero percent to maybe at least half of the level of enjoyment,” King says, “which is enough to…enable me to get engaged in ways that were not previously possible to do.”

Connecting the world, including the 20% who are disabled

If you’ve ever heard Facebook founder Mark Zuckerberg speak about his company’s mission—connecting the entire world—applying AI to Facebook’s efforts to bring its service to the blind and other disabled people fits right in.

“For 20% of the world, if you do not make things accessible, they will not connect,” says King. “That’s over a billion people, and that’s totally counter [to our mission]. To not include those people would be a big mistake.”

To be sure, there’s a business case for Facebook and other Internet companies to improve their services for the blind, given that doing so would likely bring in many more users, and dramatically increase potential revenue. Wieland acknowledges that the goals of his Accessibility team are not solely altruistic.

To King, though, that doesn’t matter.

“It’s so mission-driven,” King says. “I don’t think anybody can say the primary driver is business, or the primary driver is altruism. Because making the world more open and more connected is [Facebook’s] core mission, you really can’t separate accessibility from that mission.”

Or, he added, “said another way, you couldn’t possibly fulfill that mission without working on accessibility.”

Lean more on the social side

Not everyone is enamored of Facebook’s ongoing accessibility efforts.

“There are constant accessibility barriers, and constant screen reader incompatibilities, says Joshua Miele, director of the Video Description Research and Development Center at the Smith-Kettlewell Eye Research Institute. “There are little features that you either can’t use, or which are extremely difficult to use, things that don’t behave in an expected manner and features you simply can’t get to with a screen reader. That’s a much bigger problem than using sexy AI technology…to label videos and images.”

Asked about Miele’s concerns about the consistency of Facebook’s accessibility features, Wieland replied that the company continuously works to make its screen reader experience better and offers a range of navigation features, such as semantic headings, ARIA landmarks, and keyboard shortcuts for Apple and top Windows screen readers.

“We are aware of areas where screen reader navigation and interaction are not yet as simple as we would like them to be,” Wieland says, “and have a roadmap for creating the same level of enjoyment for all people who want to connect with our products.”

Miele, blinded in a tragic childhood physical attack, thinks the real power of Facebook is in its social features and its massive community, and argues that the nuance of, say, a photograph of a beaming couple at their wedding, can’t yet be captured by technology. He thinks Facebook would do its blind users a much bigger service by working to improve the ways users add descriptions or request descriptions, or even have discussions about photo descriptions.

“What we’re talking about is taking advantage of this social platform,” Miele says, “to be smart about how descriptions are generated. You don’t want an automated system generating descriptions for these social artifacts….You want the crowd, the social consumers, and the friends of the media owner to be offering comments and descriptions.”

King says Facebook would love to have more users providing more descriptive context about the visual content they or their friends post.

“We have studied other solutions that crowdsource this type of service,” King says, cautioning that other solutions are needed as well. “While such approaches have their own advantages, there are significant limitations on how they could be implemented and used as well as to how well they could provide accessibility to content at Facebook’s scale.”

While it’s by no means perfect at this point, King adds, artificial intelligence technology “that can provide automated captioning is now at a point where we are confident it complements the social tools we provide and is one of the most promising keys to a future where all content is fully accessible.”

For his part, Thomas appreciates the freedom Facebook’s algorithms offer him given that the thing that’s most “taken away when you’re blind is your independence.”

In his video Thomas notes that being able to post photos and videos and make comments on Facebook without having to ask anyone for help is a highly empowering experience.

“If there wasn’t any Facebook, and I didn’t have fans and followers that cared about our expeditions,” Thomas says in his video, “I wouldn’t get to keep challenging myself. So Facebook is fuel for the engine to get me to my goals.”

Source: Fast Company, Daniel Terdiman

Photo: Facebook Accessibility Team, including Matt King, the company’s first blind engineer (in red) King, who joined Facebook in June after more than two decades at IBM, most recently as senior technical lead for IT Accessibility for IBM Workplace, knows firsthand how “extremely isolating” a disability can be.