Partner POV | Data Sovereignty and AI: Why You Need Distributed Infrastructure
In this article
- Understanding data sovereignty
- How data sovereignty influences AI infrastructure decisions
- Country and regional-specific considerations
- Industry-specific drivers
- Infrastructure strategies that support data sovereignty
- Distributed infrastructure use cases for data sovereignty
- Implementing data sovereignty solutions
- Download
Article written by, Jon Lin, Chief Business Officer, Equinix.
The volume of data that enterprises need to manage continues to grow exponentially, while regulations around data locality, residency and sovereignty simultaneously continue to multiply across jurisdictions worldwide. Companies must be vigilant about keeping up with rapidly evolving national and regional policies around who can access specific data; how it's collected, processed and stored; and where it's accessed from or transferred to. It's getting more complicated every day.
Effective data governance is essential to ensuring AI transparency and compliance with emerging regulations. This means considering what's required to access the data and knowing exactly which path data will follow to its destination. Before they can establish data governance policies, businesses need to understand local laws and how those laws impact where they can generate, collect and store data. In McKinsey's findings from their State of AI survey, 70% of respondents said they've experienced difficulties with data, including defining processes for data governance.
Further complicating data management is the massive amount of data that companies are sourcing to train their AI models. Not only do they need to ensure their data doesn't get used by the wrong AI models, but they must also ensure their models use the right data in the right places. In order to meet global data sovereignty laws and regulations, companies must also carefully consider where they'll store their AI data. Distributed infrastructure and a future-proof AI data strategy can help companies navigate and manage the complexities of data sovereignty in an AI-driven world.
Understanding data sovereignty
Data sovereignty refers to making data collected or stored in a specific locality, country or region subject to the governing entity's laws and regulations. Many jurisdictions have created and are enforcing rules around how data is accessed, stored, processed and moved within their borders.
Data stored within specific borders is governed by that jurisdiction's legal framework, regardless of the company's headquarters location or ownership. For instance, a company based in California that gathers data from individuals or businesses in multiple countries must follow each country's data sovereignty and localization laws, even though the company is in the U.S.
Some laws set conditions around cross-border transfers, while others prohibit them altogether. For instance, in some jurisdictions, companies need to demonstrate a legal requirement to move the data, retain a local copy of the data for compliance reasons, or both. Other regulations govern whether companies can access data stored in a region, generate insights and then export those insights to HQ for further analysis or model training.
Subsets of data sovereignty, such as data localization and residency, relate to laws and regulations that govern aspects of data management. Data residency refers to the physical (geographic) location where a business stores its data. Businesses may select a specific region for regulatory compliance, security, or performance optimization. However, many industries, including finance, healthcare, and government, may be required to store data in specific jurisdictions to comply with local laws.
It's important to note that storing data in a particular country does not necessarily mean it's governed only by that country's laws. Companies may still be subject to foreign legal obligations based on their country of incorporation or contractual agreements. Further, a governing entity can enforce strict data security, access control and localization requirements, which could include controlling access to data by users or companies based outside its borders. Some laws also grant government agencies access to data without the owner's consent.
Making data sovereignty compliance an essential part of their AI strategies can help companies incorporate and prioritize continuous monitoring for new or changing laws.
How data sovereignty influences AI infrastructure decisions
Companies must adapt their data management practices for compliance and ensure they have the right AI infrastructure in the right locations. Understanding your entire data estate–what data you own, where it came from and how it's structured–can reveal the privacy or regulatory risk associated with that data.
Then there's the matter of where to store data. While choosing a public cloud provider may seem convenient, it often means relinquishing some level of control, such as knowing exactly where the data is stored. Importantly, companies can't rely on cloud providers to enforce data sovereignty requirements on their behalf. Knowing the exact geographic location of the infrastructure in question is crucial to ensure it aligns with relevant data sovereignty rules. Expanding from a single cloud provider or incorporating private infrastructure may make sense to avoid vendor lock-in and data-related costs.
Consider what would happen if your cloud provider needed to fail over from a cloud in London to another in Amsterdam. Would the network path go directly from the U.K. to the Netherlands, or would it traverse through other countries, introducing additional data sovereignty regulations? If the data you're transmitting is highly regulated, then it would be especially important to have visibility into the underlying infrastructure, and you can typically only get that level of visibility if you own the infrastructure.
While much of the responsibility for complying with data sovereignty regulations falls to the company that owns the data, cloud service providers and storage solution vendors can help. They can be transparent by providing details about where specific data is stored and disclosing how they manage data transfer paths in the case of cloud failovers or other outages.
To enable interconnected, distributed AI infrastructure, it's essential to establish secure connectivity with the ability to rapidly connect to (or disconnect from) many different services and locations and respond to any changes or additions to the regulatory landscape. Doing so allows companies to access data quickly, transfer data securely and exchange data seamlessly with ecosystem participants.
It's crucial to have complete transparency into what your distributed infrastructure looks like and how it's all connected. You need to be able to attest to how your data is being handled all the way through, from collection to storage to processing to transfer. Understanding and documenting this across the entire value chain will set you up for maximum compliance with data sovereignty regulations. You can't afford not to be thorough when performing due diligence on your distributed infrastructure. Otherwise, you risk incurring significant penalties and damaging your reputation.
Country and regional-specific considerations
In addition to local or national data sovereignty laws, certain regions have issued regulations that apply to more than one country. Take the EU's General Data Protection Regulation (GDPR). It requires any business storing or processing the personal data of EU residents to follow strict privacy rules, regardless of whether that data is stored inside or outside the EU. If a U.S.-based company stores EU citizen data on servers in the U.S., it must comply with both U.S. law and GDPR.
Another EU-driven regulation, the European Union Artificial Intelligence Act, sets new data governance requirements around datasets used to train models, technical redundancy systems and technical solutions to address AI-specific vulnerabilities.
The Cybersecurity Administration of China introduced the Provisions on Promoting and Regulating the Cross-border Flow of Data, easing the stringent requirements for cross-border data transfers. While this new provision allows global companies to manage their data more efficiently while maintaining compliance, there are still many instances when data must remain within China.
One trend gaining momentum is that individual countries are developing sovereign clouds to help them govern their data privacy, protection and storage regulations, without interference from other countries. Germany is one of the first countries to do so. In other countries, governments are partnering with enterprises to develop cloud sovereignty.
A region-specific data storage strategy can be helpful for managing compliance. Select cloud providers offer compliance certifications and dedicated data centers to ensure data stays within the specific jurisdictions. This allows businesses to use local cloud infrastructure and data residency solutions that, together, help prevent unintentional cross-border transfers and enable compliance with regional data sovereignty laws.
Industry-specific drivers
As mentioned earlier, certain industries are more heavily regulated than others. This adds a layer of complexity to complying with data sovereignty laws.
In healthcare, maintaining patient privacy and confidentiality when transferring data internationally requires strict data handling practices. Sensitive patient data is protected and governed by the laws of the organization's home country. In the U.S., this means complying with the Health Insurance Portability and Accountability Act (HIPAA). With HIPAA, patient data is subject to a broad set of security protocols and safeguards, minimizing the risk of unauthorized access, data breaches and cyber threats. Complying with HIPAA strengthens healthcare organizations' ability to comply with other data governance regulations.
The financial services industry has similar regulations for protecting customer data. The Digital Operation Resilience Act (DORA) is a comprehensive European financial regulation that sets uptime standards and includes specific data protection and sovereignty regulations. It's a more robust set of regulations that goes beyond what's included in the GDPR. Similar to HIPAA, financial services companies will be better prepared to meet data sovereignty requirements if they've already had to comply with other data governance regulations.
Infrastructure strategies that support data sovereignty
To ensure compliance with these stringent regulations, businesses typically find that maintaining regulated and sensitive data on equipment and hardware they control in locations they can access is the most reliable approach. They can surround that data with the appropriate governance and security infrastructure and make it accessible by implementing a dedicated private storage environment, which we at Equinix call an Authoritative Data Core. It allows you to make the data available for consumption in-house or on SaaS and public cloud workloads. It also reduces the number of copies of regulated data that exist across corporate infrastructure boundaries, lowering the risk of leaked or stolen data.
Bringing applications to your data, rather than moving data to applications and workloads, is another option. Taking this approach allows you to focus on ensuring robust governance and control, through direct monitoring. We're seeing more and more regulated companies choose dedicated private storage to maintain data lakes and repositories, including archival data. The increased capabilities of malicious actors combined with a rapidly changing regulatory environment creates a level of risk that requires more control over regulated data.
Companies are also using federated AI to build AI models that meet governance requirements. This involves training smaller models locally at various edge sites, near the data source. This removes the need for companies to transfer raw data. Instead, they only need to transfer the model weights—the parameters that reflect what the local models learned during the training process—and aggregate them to form a global model.
Centralizing AI development efforts into AI centers of excellence is another approach many companies take for cost and governance reasons. They want to understand and manage how different groups across their companies use external data and models to ensure compliance with data sovereignty. For instance, they require that data owners scrub personal data before sending it to a central location for AI development work. Then the developed models can be distributed for use across different organizations.
Distributed infrastructure use cases for data sovereignty
We've been working with customers on how they can meet the various data storage and processing requirements of the jurisdictions where they collect and process data. Here are three examples of how they're deploying distributed infrastructure for data sovereignty compliance:
AI inference: A communications technology company established storage nodes worldwide to meet regulations requiring that they store and process data within the regions where it's collected.
Data processing and storage: An auto manufacturer needs to deploy distributed infrastructure to store and process data in multiple locations across several regions.
Deploying AI clusters: A graphic design software company will use AI clusters to meet regulations for storing and processing customer data in the region where it's generated.
With Equinix, companies can deploy the specific compute, power and storage capabilities they need in each location and connect those locations with physical and virtual connectivity that's fast, private and secure.
Implementing data sovereignty solutions
Protecting and managing data in a world of evolving regulations, technologies and risks requires collaboration. Equinix is dedicated to partnering with our customers to develop distributed infrastructure solutions that set them up to meet data sovereignty requirements. Equinix is the connection point of distributed data sources, wherever they live: on-premises, on our colocation infrastructure or in the cloud. This is especially important for AI. But this task is too big for just two companies to solve. We rely on our digital ecosystem partners and providers to help us stitch together solutions to address these challenges.
Equinix AI-ready data centers are strategically located in the world's most connected markets and provide a scalable infrastructure foundation that enterprises can use to advance their AI capabilities. With 260+ data centers in 74 metros around the world, our global portfolio of Equinix IBX® data centers allows you to place, interconnect and securely govern your data anywhere in the world. We offer the only global platform with different types of data centers, a dense ecosystem of clouds and service providers and on-demand virtual interconnection services.
As the global leader in cloud on-ramps, Equinix provides connections to all the major cloud providers in our data center locations throughout the world. Enterprises can incorporate a dedicated storage environment into their hybrid multicloud architecture, helping them maintain control over their data and meet data privacy and sovereignty requirements.