Making Applications Truly Intelligent
Question: what do you get when you combine a learner capable of classifying objects and actions in a photo with a learner capable of understanding naturally phrased questions?
Answer: a system capable of answering questions about the contents of a photo. In other words, Facebook’s new toy.
Layering multiple specialized learners into a single system is the next great frontier of machine learning. Why? Because learners can make great interfaces between other learners and the human beings attempting to derive insights from them. It is somewhat analogous to the capability that SQL gives database developers to quickly gather insights from millions of rows of tabular data.
For example, a computer vision learner that excels at determining whether a certain picture contains a cat or a dog has no idea that the entities it is differentiating between are called “cat” and “dog” until we assign those labels to its output. Even after we tell it that one neuron firing strongly means “dog” and another neuron firing means “cat”, the learner has no clue what those words mean in the linguistic sense. The features learned by the neural net that enable it to differentiate cats and dogs so well are radically different from the set of features needed to understand that “dog” and “cat” are nouns in the English language and should be utilized a certain way.
So couldn’t we make the neural net capable of learning the features needed to do both? Yes, but that would require expanding it to a much greater size (which has computational resource costs) and would make its implementation more complicated. Right now, it’s easier (better) to stack a net capable of understanding natural language queries on top of a net capable of photo object identification. Facebook takes this one step further by giving their learners a contextual memory that allows them to understand basic cause and effect. Thus, a picture of a dog with a frisbee in its mouth allows the learner to answer “frisbee” when asked what game the dog is playing.
Some day, applications will be capable of self-determining what learners they need to apply to a particular problem, and in what order to apply them. Throw in a few orders of magnitude more processing power and storage and we’ll probably be very close to achieving artificial general intelligence.