An Executive's Guide to Demystifying AI and Machine Learning
Few senior executives have received formal data science training, so the inner workings of AI and Machine Learning might seem mysterious. But successful AI solutions may be neither as mysterious nor as easy as you might expect.
Many organizations are looking to implement Artificial Intelligence (AI) solutions, yet most C-level executives are not data scientists with significant AI experience. Although AI is a well-established concept, only during the last decade has AI made the advances that have brought it within the grasp of thousands of organizations. Today, when we speak to our mobile device and it replies, when we receive an individualized recommendation from our streaming service or online shopping portal, or when a radiologist reviews an image to determine a diagnosis, we are experiencing AI-enabled solutions.
AI-enhanced navigation apps were science fiction 30 years ago; now they are a centerpiece of our lives. What might be next?
- In another decade or two, will robots be as ubiquitous as today’s nav apps and voice assistants?
- Will our vehicles be self-driving for some trips, with augmented piloting available for other trips?
- Will our faces or our retinas become our universal ID?
- Will understanding of language extend to simultaneous translation of natural conversations during video conferences?
Over the last four years, we have helped organizations implement hundreds of AI solutions. During these engagements, we have observed that while many senior executives are enthusiastic about what AI can help them accomplish, few can spare the time to master the fundamentals of how AI works. Demystifying AI can help senior execs make better decisions about the strategies, solutions, and resources that underpin their investments in AI initiatives.
The following glossary is geared toward non-practitioners of data science who want an introduction to AI, Machine Learning, and the foundational terminology and processes that characterize many AI solutions.
Strategic Implications of AI
Before we begin, it’s useful to address a question that typically arises when balancing organizational priorities - why do we need AI?
AI is useful to the extent that it advances an organization’s strategy. Creating a successful AI culture may entail a transformation of many processes related to systems of record and systems of intelligence. Such a transformation often requires that a company reorganize itself around its data while thoughtfully considering the strategies, business architecture, data hygiene, and potential impact of its AI objectives and the use cases tied to success. In an AI world, data is the fuel that powers every success.
Upon embarking on an AI initiative, it’s beneficial to begin with the end in mind. In addition to Michael Porter’s classic business strategies that most organizations pursue – cost leadership, differentiation, and focus – there are additional organizational strategies that AI solutions are advancing:
- Collective Intelligence
- Data-driven decision culture
- Lower costs of predictions
Of the newer strategies enabled by AI, the transformational objectives of collective intelligence and adoption of a data-driven decision culture are most frequently cited by senior executives as the intentional consequences of the AI-powered solutions they seek to implement. Cultural transformation often rides into an organization on the horseback of a successful AI use case. Many use cases can create value, but the the art of the possible for AI also requires the practicality of engineering and the experienced scrutiny of the data scientist.
Each of the concepts explained in the following glossary will be encountered by senior execs along their journey to AI success. Let’s start by defining AI, then consider a few technical and data science aspects as we go deeper.
* * *
Artificial Intelligence (AI) is a broad umbrella term that encompasses multiple disciplines and technologies, including some of the terms listed below. MIT’s Computer Science AI Laboratory defines AI, at its simplest, as “machines acting intelligently.”
The Harvard Business Review defines AI as “... the science and engineering of making intelligent machines. This includes intelligence that is programmed and rules-based, as well as more advanced techniques such as machine learning, deep learning, and neural networks.” The MIT definition was originally coined in 1956 by the founding fathers of AI, a group of academic researchers and computer scientists, as a result of work arising from the Dartmouth Summer Research Project on Artificial Intelligence organized by John McCarthy and Marvin Minsky. McCarthy subsequently refined his definition to call AI “the science and engineering of making intelligent machines” without reference to the advanced techniques that were not yet conceived in 1956.
Modern uses of artificial intelligence include many tasks that ordinarily require human perception: seeing, listening, visualizing, interpreting, sensing, and responding according to logical rules that are derived from observations and interactions with data sources. These use cases often feature computer vision, language processing, and high-speed interactive transactions that can be automated.
Machine Learning (ML) is the means of achieving AI without complex programming, rules, and human constructed decision trees. Instead, with machine learning, data—often in very large amounts—is fed into an algorithm so the algorithm can train itself and learn.
Machine learning is most frequently composed of -
When people refer to ML they are often referring to supervised machine learning. Supervised ML entails supplying the ML algorithms with data—usually lots of it. Supervised ML teaches the machine by giving it information on the parameters of the desired categories and letting the algorithms decide how to classify them. In this context, “supervised” means that this type of ML requires plenty of human input and monitoring, especially in terms of the governance of datasets that power its algorithms. Data scientists refer to supervised ML as the task of inferring a classification or regression from labeled training data. Labeled data examples might include credit card data, tagged by card number or transaction types; images tagged as chickens, ducks, or ostriches; or any other source data that a human identifies for purposes of initial model training so that the resulting ML algorithm might then “learn” on its own. When you file an auto insurance claim using a photo taken from your mobile device, your insurer’s model may quickly compare your photo to a supervised ML model trained on photos of a vehicle make, model, and year identical to that of your vehicle. The model can then compare the expected value of your claim for repair to a repository of nearly identical claims similar to yours, granting approval or escalating to a claims adjustor immediately.
In contrast, unsupervised ML does not require training data to learn. This makes it more complex and currently less common, though adoption of this technique is increasing. Unsupervised machine learning is already being used or is under development for applications such as image recognition, cancer detection, music composition, robot navigation, autonomous driving, and many other innovations. Data scientists refer to unsupervised ML as the task of drawing inferences from datasets consisting of input data without labeled responses. Unsupervised ML is useful for categorizing unknown data, dividing recognized entities from anomalies without labeling an initial training set. Examples of how unsupervised ML is used include identification of outliers, identification of suspicious financial services activities, or segmentation of similar buying behaviors across a population in order to provide recommendations for subsequent purchases.
Reinforcement learning (RL) is the area of machine learning that deals with sequential decision-making. A key aspect of RL is that an agent – the RL model - learns a 'good,' or rewarded behavior. This means that it modifies or acquires new behaviors and skills incrementally. Another important aspect of RL is that it uses trial-and-error experience (as opposed to dynamic programming that assumes full knowledge of the environment a priori). Thus, the RL agent does not require complete knowledge or control of the environment; it only needs to be able to interact with the environment and collect information. Data scientists refer to Reinforcement Learning as the task of learning how agents ought to take sequences of actions in an environment in order to maximize cumulative rewards. Reinforcement Learning is often used for online ad services, in-game modifications, complex routing challenges, and other highly dynamic interactive situations.
Modern Reinforcement learning often relies on a “digital twin” or simulator of the actual physical data environment. The advantage of using a simulator is that rare events (black swans) can be generated much more easily than by actually collecting physical data. For instance, the appearance of a cat jumping down from a tree while a vehicle is approaching would be a rare event unlikely to be captured via collection from autonomous vehicle data. By using simulated data, it becomes possible to build robust models to handle rare situations (e.g. avoiding running over the cat).
Deep Learning is a subset of machine learning. Deep Learning can be thought of as a way to use a set of more complex algorithms to enable a machine to mimic the human brain. Deep Learning consists of hundreds, thousands, conceptually even millions of artificial neurons – called “units” – that perform calculations designed to provide a statistical evaluation of massive data volumes.
Each layer of artificial neurons builds up higher and higher level of abstractions or concepts, to finally culminate with the desired output (e.g. recognizing a dog from an image.) Due to the enormous computational requirements of deep neural networks (see below), accelerated compute capabilities are often used to expedite the performance of neural network model training. Graphical Processing Units (GPU) are frequently used to accelerate training of models using Deep Learning techniques.
A neural network is a series of algorithms that works to recognize the underlying relationships in a set of data through a process that mimics the way the human brain operates. However, the educational process, or training, of a neural network is unlike our own process. Unlike our brains, where any neuron can connect to any other neuron within a physical distance, artificial neural networks have separate layers, connections, and directions of data flow. This process within a neural network is known as ‘propagation.’
When training a neural network, training data is put into the first layer of the network, and individual neurons assign a weighting to the input — how correct or incorrect it is — based on the task being performed.
A typical neural network has anything from a few dozen to hundreds, thousands, or - conceptually - millions of artificial neurons called ‘units’ arranged in a series of layers, each of which connects to the layers on either side. Some of them, known as input units, are designed to receive various forms of information from data sources that the neural network will attempt to learn about, recognize, or otherwise process.
Other units sit on the opposite side of the network and signal how it responds to the information it's learned; those are known as output units. In between the input units and output units are one or more layers of hidden units, which, together, form the majority of the artificial brain.
Yes, it’s complicated, which is why neural networks require vast amounts of data and tremendous computational power to achieve meaningful confidence levels.
Robotics is a branch of AI engineering that involves the conception, design, manufacture, and operation of robots. This field frequently overlaps with electronics, computer science, mechatronics, nanotechnology and bioengineering. Science-fiction author Isaac Asimov is often given credit for being the first person to use the term robotics in a short story he composed in the 1940s. In the story, Asimov suggested three principles to guide the behavior of robots and smart machines. Asimov's Three Laws of Robotics, as they are called, have survived to the present:
- Robots must never harm human beings.
- Robots must follow instructions from humans without violating rule 1.
- Robots must protect themselves without violating the other rules.
Science fiction is frequently more interesting than reality. Most robots used today perform manual tasks that are highly routine and narrowly defined, such as in manufacturing or warehouse processes. Robots are now also used to clean and service public spaces of large office buildings and hotels. There are even robots that deliver room service.
However, Artificial General Intelligence (AGI) has not yet been achieved, and the concept of self-aware robots is still confined to Hollywood films and the novels of Asimov, Philip K. Dick and Arthur C. Clarke. Robots are currently neither conscious nor threatening to humans. That is, robots are non-threatening unless humans have specifically designed them to be threatening, and there are a number of policing and military robotic solutions that have been developed by humans that would be threatening to their targets.
Robotic Process Automation (RPA):
RPA is the use of software bots to automate repetitive, rules-based processes. Software bots used on RPA are virtual robots that have no physical form, they exist only as software. RPA is a form of business process automation that allows anyone to define a set of instructions for a robot or ‘bot’ to perform. RPA bots are capable of mimicking many human-computer interactions to carry out tasks error-free, at high volume and speed. There is disagreement in the field as to whether RPA should be categorized as a form of AI. For example, MIT Computer Science AI Laboratory (the nursery of AI) does not include RPA but the Harvard Business Review asserts that RPA should qualify, according to its definition of AI. When considering the tasks for RPA applications to address, organizations should consider if the tasks are core to the organization’s intellectual property and strategically core assets, or if the area for RPA is a cost center or non-core area where friction and overhead costs can be reduced via RPA.
Explainable AI is an AI system where the outcomes or results can be understood in proper context by humans. This differs from the “black box” model of AI, where humans cannot trace or understand how AI arrived at a particular result. Explainable AI can be equated to ‘showing your work’ in a math problem. There is considerable debate today focused on the desire to ensure that AI decision-making processes and machine learning doesn't take place in a black box that cannot be discovered and understood by human practitioners. These factors relate to issues of AI bias, overfitting, and model deterioration over time. Best practices in model management and data governance are helpful means of developing explainable AI systems.
Natural Language Processing (NLP):
Natural language processing (NLP) pertains directly to the growing number of human-to-machine voice interactions in both professional and personal contexts. NLP is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding. Voice-to-text applications, Siri, Baidu, Alexa, and many other voice assistants and chat bots use NLP to understand language, including context and slang. NLP has also shown promise in determining the emotional state of communications.
If NLP is largely about how machines hear (and respond to) humans, computer vision could be thought of as how machines “see”—not just humans, but potentially any image or even video stream. Computer vision is an interdisciplinary field that deals with how computers can learn to gain high-level understanding from digital images or videos.
Computer vision applications often seek to automate tasks that human vision can do. Unmanned aerial vehicles (drones), security applications, and quality assurance for manufacturing all use versions of this technique. Computer vision is likely to continue drawing attention not just for its productive applications, but also for potential risks such as AI bias and other concerns related to facial recognition, object detection, and autonomous vehicles.
Inference is the process of using a trained machine learning algorithm to make a prediction. In the human world, we receive training so that we can do a job or perform some other task, like cooking, where we apply what we have learned about the cooking process to the new ingredients at hand. In machine learning, models are trained on vast amounts of data using neural networks to formulate rules (algorithms) that reflect the training data. The step known as inference applies knowledge from a trained neural network model and uses it to infer a result based on new data that it encounters.
Based upon the accuracy of its inference performance over time, many ML models can then continually improve themselves by learning from the new data. This process differs from traditional data science methods; previously, a human was required to continually revise the algorithms using advanced mathematical modeling techniques and specialized algorithmic coding. In machine learning environments – particularly with reinforcement learning models - the model adjusts itself to improve performance, using each inference event as a learning moment.
Inference applied at scale is an exceptional creator of value for ML adopters – the models learn from each event, never forget, continually become more accurate at their predictions, and apply what they have learned collectively to more accurately execute their tasks in independent situations. Autonomous vehicle navigation and retail recommendation engines are two examples of inference working to generate guidance – predictions that recommend or automate actions - from a trained model to support decisions in complex scenarios.
Some AI solutions require a degree of autonomy from centralized models, typically because real-time performance of a service is required in order to provide a viable solution. Autonomous vehicles, for example, need to respond to an ever-changing driving environment immediately and correctly – each and every time. The success of autonomous vehicles will be tied to their ability to enhance the human driving experience rather than to put humans at greater risk. For this reason, models are trained centrally from multiple data sources and versions of guidance models are provided to each vehicle (the edge) that permit autonomous predictions made by the edge device (automobile). Rather than reporting back to the central model for guidance for each traffic decision, the edge AI model performs its predictions (stop, slow down, turn, accelerate, change lanes) independently of the centralized guidance model while a vehicle is in operation. At regular intervals, the inference results provided by each of the autonomous vehicles serve to update the centralized guidance model, which in turn aggregates and learns from each inference event from every vehicle – success/fail or new object scenario encountered - and then resupplies each autonomous vehicle with updated guidance to improve performance at the edge.
Likewise, edge AI solutions are now deployed in some surgical operating rooms to augment a surgeon’s view inside of living organs as surgery is performed. By enabling a 3-dimensional view of a patient’s heart, surgeons can improve patient outcomes and reduce damage to nearby tissue. As one might imagine, a patient has a great deal of interest in having an accurate model at the surgeon’s immediate disposal rather than waiting for a centralized model many miles and milliseconds away to generate a visual representation of a beating heart or a breathing lung.
Edge AI solutions require high-speed computational capability, typically in small form factors, in order to perform. Many chip makers, telecommunications carriers, and technology companies are actively working to provide the elements for edge AI solutions.
When we think of bias, we often think of synonyms such as discrimination, prejudice, unfairness, or injustice. But when data scientists refer to bias, they may be speaking about several different issues that impact model accuracy and variance.
AI bias can occur in several ways –
Model deterioration – deterioration in the forecasting accuracy of predictive models is a risk if models become stale. A consequence of model deterioration (aka model drift) is a drop-off in predictive performance when the model is applied against new conditions. To guard against model deterioration, organizations should monitor the frequency and quality of the data upon which models are trained.
Overfitting – a risk of having too little data upon which to train a model. Overfitting means that a model provides poor predictions of outcomes based upon the limited dataset and patterns upon which it was trained. For example, if a dog facial recognition model is trained only on beagles, it will have difficulty recognizing Dobermans. Until more data is supplied to the model pertaining to other breeds, the model will overfit it’s predictions of beagles. Generally, there is less overfitting when the volume and variety of data is large and representative of the universe of outcomes against which the model is intended to solve.
Model complexity – the more complex the model, the greater the risk of the model underperforming when applied to multi-dimensional, highly variable behaviors. For example, some predictive models used during the COVID pandemic have been criticized for imperfectly predicting the spread of the disease. Because these models are complex and significantly based upon variables of human behavior – mask wearing, compliance to CDC guidelines, estimates of asymptomatic carriers, availability of testing – their forecasting accuracy is more limited the farther into the future that their forecasts are targeted. Like weather forecasting, complex models are exponentially more accurate when applied to scenarios with less variance, i.e., hurricane forecasts are more accurate two days before landfall than two weeks before landfall because the variabilities of their direction, speed, and intensity is more limited within the narrower timeframe.
Objectionable outcomes - as a result of the AI techniques used to develop models, there is a risk that the model will narrowly focus on the outcome it is trained to deliver based solely on its training data. In one example, if a job applicant selection model is trained on a dataset that only contains information about the attributes of current employees, it will prioritize a selection of job applicants that are most like current employees without consideration of, or sensitivity to, potential discrimination factors such as race, gender, age, disability, or other attributes. In another example, facial recognition models trained only on one ethnicity will return poor results for other ethnicities. To simplify, just as textbooks may express the bias of their authors based on what information they include as well as what they may omit, datasets also have authors since the data used to train models is collected by people. Bias can thus be introduced via the choice of data used to train the model.
AI bias has also been discussed in the context of voice recognition assistants such as Siri or Alexa (female voice bots) vs other genders or authority roles as representative voices. However, the choice of a voice is an application decision made by human software developers, not by biased ML or Deep Learning models. Generally, AI bias should be addressed at both the level of appropriate data governance – the data upon which the models are trained should reflect the tactical, organizational, or social outcomes that the AI solution will serve – as well as via continual supervision by humans with the responsibility to revise the frameworks upon which the models are built.
AIOps specifically refers to IT operations. AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination. AIOps is about using AI and data to automatically identify or handle issues that would have once depended on a human to perform manually. AIOps is a specific AI-driven category of the broader automation trend in IT, but not all forms of automation would be considered AIOps. Use cases for AIOps include automated network monitoring for reliability, security, and management; diagnostics and predictive detection of likely network failure; and a number of cyber security use cases that assure network integrity.
MLOps is a mashup of “machine learning” and “operations.” MLOps is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML (or deep learning) lifecycle. Similar to a DevOps approach, MLOps looks to increase automation and improve the quality of production ML while also focusing on business, governance, and regulatory requirements.
MLOps started as a set of best practices but it is quickly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
For organizations seeking to place their ML or Deep Learning models into working solutions – a process sometimes referred to as “productionalization” – a repeatable, agile, well-governed process for model development and refinement is crucial to scaling AI success. To accelerate time-to-value of your AI initiatives, think about MLOps in the context of how your organization currently develops, refines, deploys, scales, and audits your ML models and the business impact they provide.
* * *
These are just a few of the concepts that business leaders and strategists will encounter as they develop, deploy, and scale solutions that leverage AI effectively. Meanwhile, significant innovation and important AI research advances are being made with startling speed. New companies have emerged that provide specialized industry solutions, functional applications, useful data science tools, and powerful computational technologies to enable greater efficiency for data scientists and AI solution developers.
But AI excellence is not only a data science or technology challenge – success at scale requires thoughtful use case prioritization and continual collaboration between business leadership, data science, IT, software development, and security. Although there are many sources to turn to for data science and technology information, the strategic considerations, processes and resources required for significant AI achievements are intrinsic to success and deserving of greater dialogue.
AI success has a thousand parents, but immediate success with AI solutions is not assured. Experience teaches us that committed and fully engaged senior executive sponsorship, combined with an appreciation of the solution outcomes, timelines, resources, and data governance best practices are required for success and strategic value creation.
For more information about AI strategy, research, testing, and practical roadmaps for success, access WWT resources or request a WWT briefing to discuss your AI objectives.
# # #
The author would like to thank WWT chief data scientist Jason Lu, PhD; WWT data scientist Patrick McDermott, PhD; WWT managing director Brian Vaughan, PhD; and WWT analyst Tre Moore for their contributions to this article.
1. Porter, Michael E. (1980). Competitive Strategy. Free Press. ISBN 0-684-8418-7
2. Leimeister, Jan Marco (2010) "Collective Intelligence," Business & Information Systems Engineering: Vol. 2: Iss. 4, 245-248. Available at: https://aisel.aisnet.org/bise/vol2/iss4/6
3. Salge, Christoph and Polari, Daniel (2017) “Empowerment as Replacement for Three Laws of Robotics” Frontiers in Robotics and AI: Vol. 4 Available at https://www.frontiersin.org/article10.3389/fronbt.2017.00025,
4. Kozyrkov, Cassie (2019) Google Head of Decision Intelligence. “What is AI Bias?” in Towards Data Science Available at: https://towardsdatascience.com/what-is-ai-bias-6606a3bcb814.