
Google open sourced TensorFlow, its elegant and powerful artificial intelligence engine. Google uses this machine learning software internally to add capabilities like speech recognition and object detection to its products. Now, it’s available for everyone to use. What will this mean for the design of artificial intelligence systems? As wonderful as TensorFlow is, I fear that it may accelerate the design of AI systems that are hard to understand and hard to communicate with. I think it will focus our attention on experimenting with mathematical tricks, rather than on understanding human thought processes.
TensorFlow is aimed at the development of machine learning systems that require heavy numerical computation, like artificial neural networks (ANNs). The trouble with these systems is that they consist of millions of numbers—too many for people to sift through and make sense of.
Suppose we train an ANN to recognize cats. When it recognizes a previously unseen cat in an image, it can’t explain to us why or how it did this. And if the ANN fails to recognize a spotted cat, it’s hard for us to fix the problem. We’re not going to tell it something like, “change element 341375’s value from 0.3265 to 0.4271, element 1954236’s value from 0.9218 to 0.8612, …” That would be a long list, and we don’t even know what numbers to change to get the desired result (this is especially true for end-users, but it’s true for the researchers developing these systems, as well). More likely, we’ll either ignore the error, retrain the ANN with better cat data, or modify the training algorithm. These are blunt tools, because they don’t operate in the domain of interest, namely cats. Rather, they operate in the domain of ANNs. It would be better if we could simply tell the system that cats can be spotted.
I’d rather see us design AI systems that are understandable and communicative.
AI systems are going to be increasingly involved in our lives, as we turn to them more and more for important decision-making tasks. When AI systems make bad decisions (as they’ve done before, and inevitably will again), we need to be able to understand why they made those decisions and communicate with them to fix the problem.
AI Designed for the Real World
How do we get an AI system to engage in these sorts of interactions? Earlier this year, Google unveiled a chatbot that could debate the meaning of life, or help a human user troubleshoot internet connectivity problems. The bot is impressive, but its responses are disconnected from the real world. For instance, when the chatbot tells a human something like “seems the mail is not loading,” it’s making this up. It hasn’t actually observed whether the mail has loaded or not. The machine’s responses are based solely on training data and the history of the conversation at hand.
Similarly, I don’t think that the Google chatbot will be able to reliably implement interactions like the credit limit example, given above, because the search space for the ANN representation of potential financial transactions is too large. I think that implementing these interactions reliably requires discrete, human-readable representations like equations, logical formulas, rules, frames, models, and diagrams. These representations avoid the added complexity of ANNs, so the search space is more tractable.
How do we implement the credit limit example? The AI system needs to be able to query the user’s financial information. A natural language parser would parse the incoming utterances, a rule-based dialogue manager would handle the incoming utterances and carry out the appropriate database queries, and a natural language generator would generate the appropriate responses.
How do we acquire human-readable representations? Acquisition of these representations needs to be a multifaceted process. An AI system can acquire human-readable representations as it interacts with people. We can also use machine learning to suggest human-readable representations, although these representations are often questionable to humans.
For example, here are some of the parts of a building as learned by one machine learning system: rubble, floor, facade, basement, roof, atrium, exterior, tenant, rooftop, and wreckage. Parts like rubble and wreckage seem like strange additions to this list, because buildings are not in ruins most of the time. Here are some paraphrases of X asks Y, as learned by another machine learning system: X tells Y, X meets with Y, X informs Y, X contacts Y, and X writes to Y. These are certainly related to X asks Y, but they are not synonymous in all contexts. And here is an event sequence for cooking as learned by yet another machine learning system: A boil B, A slice B, A peel B, A saute B, A cook B, A chop B. To a human cook, many of these tasks appear to be out of order (one typically peels before slicing, and chops before cooking). These are top-ranked results generated by state-of-the-art machine learning systems. Lower ranked results are even worse.
Representations suggested by machine learning need to be vetted by humans, and not just because they contain errors. We need to examine machine-learned representations with a critical eye because, as humans, it’s up to us to decide what we want our world to look like.
We don’t have to accept incomprehensible and uncommunicative AI systems. We can build understandable and communicative systems that (1) learn human-understandable representations through interaction with users as well as manual curation of knowledge and (2) maintain human-understandable representations of the states of users and the world. It’s hard work, but it can be done.