Searches for images will soon get a lot smarter, with Google unveiling new technology that can understand entire scenes with a high level of accuracy.
A research collaboration between the internet giant and Stanford University is producing software that increasingly describes the entire scene portrayed in a picture, not just individual objects.
Algorithms written by the team attempt to explain what’s happening in images—in language that actually makes sense to the average reader.
For example, sentences like “a group of young people playing a game of frisbee” or “a person riding a motorcycle on a dirt road.”
View some samples here:
View larger image here
The machine-learning software developed by Google used two neural networks – one which deals with image recognition, the other with natural language processing.
Neural networking is a computational model that mimics some of the same architecture used in the brain. Such systems have a series of interconnected neurons which can take information from a variety of sources and are also capable of learning.
The neural network developed by Google was the work of four scientists – Oriol Vinyals, Alexander Toshev, Samy Bengio and Dumitru Erhan.
“A picture may be worth a thousands words,” they wrote on the Google Research blog.
“But sometimes it’s the words that are the most useful – so it’s important we figure out ways to translate from images to words automatically and accurately.”
Two years ago Google researchers created image-recognition software and showed it 10 million images taken from YouTube videos. After three days the programme had taught itself how to pick out pictures of cats.
View the blog post here