Everything I Learned About Training A Machine-Learning Model That I Learned From My Kids
“You who are on the road must have a code that you can live by . . . teach your children well . . . and feed them on your dreams.”
Graham Nash was spot-on — about machine-learning. Training a well-adjusted machine learning model that won’t get in trouble and embarrass you in public requires the same assiduous guidance, supervision, and ongoing engagement required to raise a successful, ethical child. Give them good training data, validate their predictions, and give them a set of business rules (a moral code) to prevent catastrophic errors in judgment, and you will be well on your way to raising a machine-learning model (or child) that you can be proud of.
Machine-learning models, like children, are endowed with enormous potential — potential to do great things that benefit society as well as the obverse. However, major gaffes with machine learning in recent years highlight that we have a long way to go in improving the development of these models. Take, for example, Microsoft’s now-infamous Tay bot. Designed to mimic the style and vernacular of a teenager, Tay was a short-lived AI chatbot whose goal was to learn from the experience of interacting with users on Twitter and offer playful responses. Within a day of corresponding with internet trolls, however, the friendly Tay was spouting racial epithets along with other polemics and had to be quickly shut down by its creators. It’s easy to blame the users who corrupted Tay, but the humans who created the bot were just as culpable. If they had approached the development of Tay as if they were raising a child, they would have taken many steps that would have prevented Tay’s corruption or at least made it much harder to corrupt.
Expose Them To The Right Information
All parents intuitively know that not all training data is good data. Raising a wholesome, well-balanced child requires careful curation and adaptation of the information they receive. No parent would dare let the darkest corners of the internet educate their child about the Holocaust or 9/11. Just as you wouldn’t expose your 5-year-old to this kind of data without supervision, so must you protect your machine-learning algorithm. Anticipate harmful information before it becomes an issue, and preemptively adopt a strategy — before it’s too late. For algorithms, this, at the minimum, involves validating your training data and telling the algorithm how to identify and deal with outliers in your data, as well as how to deal with missing data.
Teach Them Right From Wrong
Wouldn’t it be wonderful if our kids learned right from wrong by themselves? While we want our kids to avoid our own misconceptions, not giving our children any moral guidance and insisting they develop their moral compass entirely from experience, on their own, is a recipe for disaster. It would doom them to undergo one painful mistake after another before ultimately, or perhaps never, arriving at a moral code that would enable them to thrive in society. Instead, we take a shortcut; we teach them rules and morals to live by (for example, not to lie, steal, or kill) that should override unacceptable inputs and actions they may learn on their own.
The same holds true for machine-learning models. Designers should implement business rules to overrule inputs and override actions that could lead to a harmful outcome, such as broadcasting racism, sexism, or any other catastrophically bad decision. What we decide to broadcast to the world must be carefully curated. You wouldn’t put your child on the world stage and allow them to broadcast freely to millions of people without first vetting what they would say. So why would you risk tarnishing your brand’s reputation by unleashing your AI in such a manner?
Parenting Is For Life
Much like parenting, training a machine-learning model is never done. There will always be a need for better data, ongoing guidance, and supervision to validate predictions (no, you can’t stay home because it’s your birthday). As 2018 comes to a close, I, like the rest of my colleagues at Forrester, am looking forward to spending some time relaxing parenting over the holidays. As I pivot my attention away from algorithmic brain-children to human ones, I am constantly reminded that raising a virtuous, ethical, and well-balanced individual, just like a machine-learning model, requires tremendous effort and forethought.
Happy holidays!
(With many thanks to Jeremy Vale, who coauthored this post.)