Skip to content

Is Microsoft using your data to train Copilot AI – the facts

Social media is full of allegations that Microsoft is spying on customers Word and Excel documents to train their Copilot AI system.  Here’s the facts and the ‘gotchas’ that Microsoft leaves unanswered.

In this article we’ll go into the possibility of Microsoft ‘spying’ on customers. We’ll start with a summary then dig into the details with citations of our sources, something missing from attention grabbing social media posts.

As you’ll see in this article, Microsoft’s assurances about a “misunderstanding” and other public statements leave too much unclear or not stated.

Over 25 years we’ve learned to be cynical of Microsoft’s actions, statements and motives. The company has talked about customer privacy a lot but not always practiced what they preach.  Any company statements need to be carefully read for what they exactly say and don’t say. Office Watch is no shill for Microsoft, uncritically repeating their press releases.

The social media explanations aren’t complete or just plain wrong.  They started with a single, totally wrong post and has spun off from there. See Word ‘AI free’ cure is worse than the disease

What is AI training?

Modern AI systems need huge amounts of real world data to learn facts, writing styles and more.  It’s called a LLM Large Language Model. 

Adding information is called “AI training”.

There’s been some controversy and litigation over what’s been grabbed by all the major AI players (OpenAi, Google, Meta, Microsoft and many others).

Are my documents, emails and data in Microsoft AI system?

Does Microsoft use data from Word, Excel, PowerPoint or emails to train Copilot AI?

Microsoft says NO, we’ll explain below.

Regular Office Watch readers know we’re always skeptical about Microsoft promises. In this case, we’re inclined to believe them because it’s too risky legally and commercially.

Why? If Microsoft used private data in their AI systems, they’d eventually get caught when Copilot gave results that could have only been sourced from confidential or private sources. In some countries, using the data would be illegal. 

Maybe you don’t trust Microsoft but you can trust Microsoft to follow its own interests.

On the other hand, Microsoft could make this a LOT clearer. The company has to accept some blame for the misinformation that’s been put out and is spreading.

Their explanations are buried deep in FAQs which aren’t entirely clear on some points. The statements are not included in the company’s Terms and Conditions so they aren’t enforceable and could be changed at any time.  As you’ll see, they admit that’s a possibility.

Many of the online comments point to the “Connected Services” options in modern Office as ‘proof’ that data is being used for AI training. However, those options are NOT related to Copilot and AI training. And they aren’t recent or ‘slyly added’ having been in Office since 2019. See Word ‘AI free’ cure is worse than the disease

What data is NOT used to train Copilot AI

Microsoft says what is NOT used to train Copilot AI, in short:

We do not train Copilot on data from:

  • Our commercial customers, or any data from users logged into an organizational M365/EntraID account
  • Users logged in with M365 personal or family subscriptions
  • Users who are not logged into Copilot using a Microsoft Account or other third-party authentication
  • Authenticated users under the age of 18

Source: FAQ on Microsoft.com see below for the full list of exclusions.

In other words, all Microsoft 365 customers, individuals, businesses etc do NOT have their Office documents or emails fed into Copilot.

Could be clearer

Some elements of Microsoft’s statements could be clearer. For example, this mention of customers who do NOT have their data used for AI training.

“ Our commercial customers, or any data from users logged into an organizational M365/EntraID account ”

It doesn’t mention education, schools, government or non-profits explicitly and it should for reassurance and clarity.  Instead, Microsoft seems hope that people understand that “organizational M365/EntraID account” would include those groups (at least we hope it does).

But there are catches

There are always catches with Microsoft, we’ve noticed five and there might be more.  These are points that Microsoft could and should clarify.

Office 2024, 2021 etc

Not in Microsoft exclusion are users of the perpetual license versions of Microsoft Office. Office 2024, Office 2021 and before.  Only Microsoft 365 is mentioned in their AI training exclusions.

Those versions of Office don’t support Copilot but there are other “Connected Services” (Microsoft’s term).

Since those customers aren’t explicitly excluded, it’s possible their data is being fed to Microsoft’s AI training system. Or customer documents saved on OneDrive could be scraped into Copilot.

Office users without a Microsoft 365 plan

Another notable exclusion is the web-based Office apps for Word, Excel and PowerPoint. These are available to anyone, all you need is a free Microsoft account.

The exclusion list only mentions Microsoft 365 Family and Personal plans – no mention of non-paying customers.

Microsoft 365 Basic

Microsoft 365 Basic (which features more OneDrive and Outlook.com space) isn’t on the exclusion list either.

Like ‘free’ Office users, it’s possible that Microsoft 365 Basic customer data is available for Copilot AI training.

OneDrive / Sharepoint

There’s no mention of Microsoft’s two online storage services, OneDrive and Sharepoint.  There should be to reassure skeptics.

Can change at any time

This is the big one. This paragraph caught our eye …

“ We may eventually expand model training and opt-out controls to users in certain countries where we do not currently use consumer data for model training (see What data is excluded from model training?).

But we will do so gradually, to ensure we get this right for consumers and so we comply with local privacy laws around the world.”

Source: Microsoft FAQ

The phrase “get this right for consumers” is open to many interpretations.

There’s no mention of notifying customers of any change in the use of “consumer data”, just that it will be done “gradually”.

Even the term “consumers” is open to interpretation.  Normally Microsoft uses the term “consumers” to mean individual customers, for example Microsoft 365 Family and Personal are called “consumer” plans.  In this context “consumer” could mean all customers including businesses and organizations.

All the people excluded from Copilot AI data grab

Here’s Microsoft’s full list of customers/groups that are excluded from AI model training. Some people will be covered by more than one exclusion.

“We do not train Copilot on data from:

  • Our commercial customers, or any data from users logged into an organizational M365/EntraID account
  • Users logged in with M365 personal or family subscriptions
  • Users who are not logged into Copilot using a Microsoft Account or other third-party authentication
  • Authenticated users under the age of 18
  • Users who have opted out of training
  • Users in certain countries including: Austria, Belgium, Brazil, Bulgaria, Canada, China, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Latvia,  Liechtenstein, Lithuania, Luxembourg, Malta, the Netherlands, Norway, Nigeria, Poland, Portugal, Romania, Slovakia, Slovenia, South Korea, Spain, Sweden, Switzerland, the United Kingdom, and Vietnam. This includes the regions of Guadeloupe, French Guiana, Martinique, Mayotte, Reunion Island, Saint-Martin, Azores, Madeira, and the Canary Islands. This means that AI offerings will be available in those markets, but no user data will be used for generative AI model training in those locations until further notice.

We limit the data we use for training. We do not train AI models on personal account data like your Microsoft account profile data or email contents. If any images are included in your AI conversations, we take steps to de-identify them such as removing metadata or other personal data and blurring images of faces. “

Source: FAQ on Microsoft.com .

Opt out of training

Microsoft offers unclear instructions to exclude your Copilot use from AI training. See How to turn off Copilot model training to get the full explanation.

“Users in certain countries” – what does it mean?

Some countries have privacy laws which stop companies from using customer data for AI training.

Notable countries with those laws are: Canada, UK, France, Germany and all the Scandanavian countries (see above).

Countries that do NOT have any protections for their citizens include the USA, Australia, New Zealand and India, among many.

What does “users in certain countries” specifically mean? Customers can use Microsoft software anywhere in the world with an account based in one country. Are customers based in an excluded country protected everywhere in the world or just when they are in their home location? What about customers based in an unprotected country (say the USA) who move into Canada (with stricter privacy laws) for a time, is the data made in Canada protected from AI training or not?

We suspect that the customers base location in their Microsoft account is what matters, not the current location of the user. This is another point that Microsoft should make clear to their paying customers.

Word ‘AI free’ cure is worse than the disease

How to turn off Copilot model training

About this author

Office-Watch.com

Office Watch is the independent source of Microsoft Office news, tips and help since 1996. Don't miss our famous free newsletter.