Discovering and Managing Data with Microsoft Purview (Part 1)

With more organisations wanting to explore AI assistants such as Copilot for Microsoft 365, one of the first questions they want to be able to answer is regarding their data governance and security. Having helped several organisations implement Microsoft Purview in general and to satisfy these questions we wanted to share our approach.

With cloud-based storage locations such as OneDrive, Teams etc, permissions and document sprawl can become a challenge, with employees able to share files themselves, the IT teams are no longer in the loop.

Now, we are not suggesting these modernisations should be removed and everything should go back to being controlled centrally, the speed and flexibility that they provide are fantastic for productivity and have enhanced the workflows and collaboration. But the right controls do need to be in place.

What is Microsoft Purview?

Microsoft Purview is a suite of tools that target data governance, data security and compliance, including tools you might have heard of such as Data Loss Prevention, Information Protection and Insider Risk management.

Whilst the scope and complexity of implementing Microsoft Purview can seem overwhelming when you start exploring all it has to offer, it can be simplified into three high level steps.

For this article we will be focusing on discovering data, with part two looking at the classification and protection steps.

How can we identify information?

Although it can seem daunting to gather insights across your organisational data estate, Microsoft Purview may be already doing some of the groundwork for you. Within the Microsoft Purview portal, the inbuilt classifiers are already being searched for across your cloud estate in locations such as Email, SharePoint, OneDrive and Teams. So, what do we mean by classifiers?

Trainable classifiers

Trainable classifiers are trained models that identify a type of document type that will be searched for within your organisation and can then be included within protections. Microsoft has over 110 built in options to get you started with some examples shown below.

You are also able to create your own custom Trainable classifiers to suit your organisation needs, allowing you to train Microsoft Purview to search for the type of documents that are important to your organisation.

Sensitive Info Types (SITs)

The second classifier that you can use are Sensitive Info types (SITs), which unlike the trainable classifiers are designed around specific patterns or words that are within files or communications. In our experience these SITs have been incredibly quick to use and invaluable in helping organisations understand where sensitive files or communications are being stored. Again, Microsoft provide over 300 built in examples that you can use right now with some examples shown below.

So how do they work? As a simple example, if we are searching for Credit card numbers the SIT would look for 14 to 19 digits in a sequence. On top of this, each time an SIT is found, it can be graded as a Low, Medium or High confidence. So, if you are looking for Credit cards and find a 16-digit number and next to that number you have the word Visa or Mastercard then your confidence of this being correct could be considered higher than normal.

You can build your own custom SITs on many things, simple number strings, complex Regex based patterns, keywords and more.

We know what we are looking for, but where is it?

Now you understand what the classifiers are, you can dive into Data explorer which will let you see where these might have already been found within your cloud environment. In the example below we have started searching for the Trainable classifier for HR documents and we can then see that these are found 44 times in OneDrive and 16 times in SharePoint sites.

When given the appropriate permissions you can then expand on these locations and review exactly where these are being stored.

The experience is the same for SITs as we can see below, where we are reviewing where the SIT “All Medical Terms and Conditions” has been detected within not only SharePoint and OneDrive, but also Email, Teams and even Copilot interactions.

So, we found what we want, now what?

You now know what information you are looking for and where it is being stored, now you can work with the wider organisation to ensure that these locations are suitable or whether information should be stored elsewhere. You can also decide if the information saved is still valid or whether it should be archived or removed and ensure that any important data is suitably protected.

In part two of this blog we will cover how to classify information and how that classification can then provide protection to files both in and outside your Microsoft 365 environment.

Want to find out more about Copilot for Microsoft 365?

Learn more

Major AI Trends for 2025

•

February 20, 2025

These AI trends for 2025 are where we think AI will make the most significant advancements, from recruitment to collaboration, AI will continue to change the world of work.

2025 Cybersecurity Trends: Future-Proof Your Estate

•

February 14, 2025

The cybersecurity landscape is rapidly evolving as we head into 2025, looking at cybersecurity trends, this is what our experts anticipate for the coming year.