Top 6 Data Science Trends in 2023
In this article
Some of 2022's greatest technology advancements were driven by data science, and the data science industry has evolved at such a rapid pace that it can often be hard to keep up. To help navigate, we have compiled key data science trends in 2022 from the perspective of practitioners, why they are trending, and how they may continue in 2023.
In 2023, we anticipate an increased need for AIOps and MLOps so that companies can remain competitive while having access to smart, real-time data and analytics. As conventional cloud computing becomes increasingly underequipped to manage copious amounts of data in 2023, edge computing will become crucial and generative models will allow companies to employ fresh content across industries. At the same time, companies are facing increasing pressure to ensure their data is safe, secure and can mitigate harm in all aspects. Data governance, ethical AI and sustainable AI will continue to be central themes we see companies prioritizing throughout the year.
The first noteworthy trend of 2022 was Artificial Intelligence for IT Operations (AIOps), a term coined in 2016 by Gartner that pertains to the application of AI and AI-related technologies – from machine learning (ML) and natural language processing (NLP) to traditional IT team activities and tasks. AIOps is an umbrella term encompassing both domain-centric solutions, such as log monitors, application monitors and network monitoring, and domain-agnostic solutions, which operate across clouds and on-premise infrastructure. While this practice is not new, it has continued to be a trend that many businesses have leaned on in 2022 due to its recently advanced analytics capabilities.
The year 2022 shined a light on the capabilities of AIOps, such as allowing companies to automatically spot and react to issues in real-time, and proactively alerting them to potential issues. Therefore, we foresee strong demand for more AIOps throughout 2023. More generally, we foresee a continued demand for automation throughout the next few years, so businesses can continue to become smarter and more predictive. This is also indicated in an industry study where 70 percent of respondents say their organizations were at least piloting automation technologies in 2022, up from 66 percent in 2020 and 57 percent in 2018. 2023 will be a year of maturity for AIOps.
In 2022, AI paved the way for many generative models to create content such as blogs, poetry and artwork. While Generative AI is not new, the recent convergence of several trends has made it possible to productize generative models and bring them to everyday applications. Even domain-oriented use cases gained attention – from converting x-ray or CT scans into photo-realistic images in healthcare to converting satellite images into map views in the transportation industry.
The trend started in 2014 with the introduction of Generative Adversarial Networks (GANs), which created face images using random noise. Along with Variational Autoencoder (VAE), they ushered the trend of generating deepfakes. In 2017, we saw the introduction of transformers, a deep learning architecture that supports large language models (LLMs) such as GPT-3. By 2021, OpenAI released the text-to-image generator DALL-E, which introduced techniques such as CLIP and Diffusion to generate high-resolution images with stunning details. Finally, by the end of 2022, we were introduced to ChatGPT, which brought the power of an LLM into a conversational chatbot.
We anticipate increased adoption of Generative AI by the mainstream, not only across small businesses but also within BigTech companies (e.g., Microsoft is already actively investing in OpenAI with an eye to its own search products and implementing it in its Office suite). Generative AI also allows synthetic data generation that could improve model performance and save time and costs in ML deployments. However, the generative industry also brings ethical complications, such as the use of deepfakes producing undesirable outputs, e.g., spreading misinformation.
We're shifting from an era of generating blurry, fingernail-sized black-and-white images of faces to a time where lone amateurs can copy an artist's style and companies are selling AI-generated prints that are explicit knock-offs of living designers. Questions of legality and ethics will inevitably become more pressing and more important. In the coming years, organizations will need to ensure the responsible and ethical use of these algorithms.
Humankind has always been inspired by nature. Over the years, our understanding has grown from the laws governing the physical world to the inner working of the human genome. We are now closer to extracting this potential, and this has numerous new industrial applications.
AI methods like Deep Neural Networks rely on large data with examples of real-world behavior to learn. For vision-based applications where it is possible to gather this huge amount of data, these methods work well but fail to learn complex behaviors that lack adequate data. Physics-informed AI refers to a new set of AI methods that attempt to infuse the physical knowledge of our world. These approaches bring a new AI scale to science and open the realm of possible industrial applications into the metaverse. Digital twin is one such tool, which creates a virtual replica of the real-time environment. Companies are creating "digital twins" of warehouses and factories, for example, to optimize their layout and logistics. Companies such as NVIDIA have already been working on frameworks, which will be helpful in the future.
Composite AI, also known as Multidisciplinary AI, is defined as the combined application of different AI techniques to improve the efficiency of learning and broaden the level of knowledge representations. Organizations face more complex problems than ever before, involving vast quantities and multiple types of data. To solve this, we require a combination of AI solutions – varying from traditional machine learning approaches, NLP and graph techniques to traditional rule-based approaches. An example is using predictive AI, forecasting technique and conversational AI in the retail industry to identify consumer preferences and offer incentives for customers to make additional purchases during the return process.
According to Gartner, by 2025, 70 percent of organizations will shift their focus from big data to "small and wide data" (i.e., using fewer and more diverse data sources to drive insights). Composite AI can help by combining human expertise with approaches like few-shot learning and synthetic data generation. This will not only embed more reasoning and intelligence but will also expand the scope and quality of AI applications.
Edge computing occurs close to where data is collected, enabling real-time decision-making based on data collected from internet-connected sensors on factory floors, transport networks, retail outlets and remote locations. By 2025, Gartner predicts that more than 50 percent of enterprise-managed data will be created and processed outside the data center or cloud.
We are seeing continuous growth in edge analytics, especially in use cases where reduced latency is required to act upon real-time data. To facilitate this, edge is now getting "foggy." Fog Computing, as it's known, is "edge to the edge." It brings processing to edge further in terms of more computation, storage and communication, facilitated by "micro" data centers. As the company Meta continues to invest in more network opportunities for the Metaverse, it has seen how edge computing can take pressure off the public cloud, thus providing a faster customer experience.
In the next two to five years, chips and operating systems will get optimized for edge and companies will adopt ultralow latency 5G networks – both core components of the Industrial internet of things (IIoT). This will help accelerate the next generation of automation and will open a variety of new AI use cases.
2022 was also the year of data governance, or the data policies that apply to how data is gathered, stored, processed and disposed of, and by whom. Similarly, Ethical AI ensures that AI follows well-defined ethical guidelines regarding privacy, non-discrimination, individual rights and non-manipulation. Mitigating harm is also a strong goal of sustainable AI, a movement to drive change in the development, implementation and governance of AI products toward reducing carbon footprints and minimizing the impact on the environment.
Executives giving importance to AI ethics grew from 50 percent to 75 percent from 2018 to 2021, and in 2023, ethical AI will be at the forefront in helping companies operate more efficiently while preventing misuse and mitigating potential harm.
As the need for companies to report on sustainability increases in 2023, they will need to increase their understanding of the ramifications of the software that they are using. Last year showed an increasing need for data protection and privacy and for organizations to increase sustainability, ensuring their data is trustworthy and not misused. Effective data governance will need to be implemented in 2023. Data governance will also be driven by more governments introducing laws to regulate the use of personal and other types of data.
A related topic that will gain attention in the next five to 10 years will be "AI TRiSM," a framework that supports AI model governance, trustworthiness, fairness, reliability, robustness, efficacy and privacy.. By 2026, organizations that operationalize AI transparency, trust and security will see their AI models achieve a 50 percent improvement in adoption, business goals, and user acceptance.
Finally, an overarching trend to shake the data science space in 2022 was Machine Learning Operations (MLOps): a set of best practices for processes, people and technologies to deploy and maintain machine learning models reliably and efficiently. Model management has become a center of gravity for ML as it ensures that the model is set up for success, and more companies are looking to expand their MLOps skillsets. Shifting from model-driven to data-driven MLOps has been a central focus for organizations seeking to maximize the business value of AI/ML models by using data-improving techniques (e.g., consistent labeling, ground truthing, spot-checking and increasing the size of datasets).
As the data science space pivots from understanding data science to scaling data science, we forecast more businesses will center around model management and scalability. Going forward, we anticipate an increased focus on ModelOps, an expansion of MLOps focusing on the operationalization and governance of all AI and decision models.