# Data Quality

Data Quality plugins inspect the transformed data and provide a list data quality issues. These can be anything from extra newlines, to decimals with too many places, or the presence special characters. &#x20;

{% hint style="success" %}

* Need to modify the data before it's ever touched by the Transforms? Use [File IO](https://docs.perigee.software/transform-sdk/authoring-plugins/file-io-process).
* Need to modify the data, maps, options, etc after Transforms has successfully loaded the data into a table? Use the [Transform Process](https://docs.perigee.software/transform-sdk/authoring-plugins/transform-process)
* Need to generate data quality reporting? Use [Data Quality](#authoring-the-plugin).
  {% endhint %}

## Use Cases

* A company file doesn't allow for decimals with more than 2 places
* We need to verify that every group of transactions in the file sum to positive amounts
* The combination of multiple fields cannot be greater than n length.

## Creating a plugin project

{% hint style="success" %}
If you would like to actually create a plugin library (`dll` project), follow these steps first and we'll put our code here. Otherwise, [skip this step](#authoring-the-plugin), and create the code directly within your project.
{% endhint %}

1. Create a new DLL project, and for the time being, set the framework to **`net6.0`**.&#x20;
2. Install the latest version of Perigee using `install-package perigee` - OR use Nuget Package Manager.&#x20;
3. Open the `.proj` file by double clicking on the DLL project in your code editor. You should see the XML for the project below.
4. The two changes you need to make are:
   * Add the **`<EnableDynamicLoading>true</EnableDynamicLoading>`** to the `PropertyGroup` tag
   * For the `PackageReferences`, add **`<Private>false</Private`** and **`<ExcludeAssets>runtime</ExcludeAssets>`**

```xml
<Project Sdk="Microsoft.NET.Sdk">

	<PropertyGroup>
		<EnableDynamicLoading>true</EnableDynamicLoading>
		<TargetFramework>net6.0</TargetFramework>
		<ImplicitUsings>enable</ImplicitUsings>
		<Nullable>enable</Nullable>
	</PropertyGroup>

	<ItemGroup>
		<PackageReference Include="perigee" Version="24.6.1.1">
			<Private>false</Private>
			<ExcludeAssets>runtime</ExcludeAssets>
		</PackageReference>
	</ItemGroup>

</Project>

```

That's it! You've created a new DLL Project that when built, will produce a `plugin.dll` that Transforms is able hot reload and run dynamically at runtime.

## Authoring the plugin

The plugin can contain many Data Quality checks. Each process is defined by a method, and an attribute. Here's what a new process for **`AmountCannotBeZero`** looks like:

```csharp
[DataQualityCheck(true, "Amount Zero", "A check to determine if any AMOUNT columns are zero", validateAtTable: true, partition: "yardi")]
public class DQ_AmountCannotBeZero : IDataQualityValidator
{
    public void Validate(TransformDataQualityContext data, ref DataQualityValidationResult result)
    {
        
    }
}
```

### Attribute

The <mark style="color:orange;">**`[attribute]`**</mark> tells the system several important things, in the order shown above, they are:

1. <mark style="color:purple;">**Active?**</mark> - Should the plugin loader use this plugin, is it active? Or is this in development or unavailable.  &#x20;
2. <mark style="color:purple;">**Name**</mark> - What name is this plugin given? This is shown in the data quality report and should be short and descriptive
3. <mark style="color:purple;">**Description**</mark> - May be used in the report to further explain the check
4. <mark style="color:purple;">**Valdiate At**</mark> - This is going to the most common use case, validating at the table level. It's also the most performant. The other option is related to validating Set level transforms.
5. <mark style="color:red;">**Partition Keys**</mark> - This is a very important field to fill out. This specifies under what files (partitions) to run the data quality checks. You can partition them for only certain types of files. You may provide multiple keys in a comma separated list like so: <mark style="color:green;">`"yardi, finanace, FinanceFileA"`</mark>
   * It's either blank, <mark style="color:red;">**""**</mark> - which means it can always run.&#x20;
   * It has the [DataTableName (TransformGroup)](https://docs.perigee.software/transforms/the-mapping-document#transformgroup) - Which can automatically be selected when running that specific map.
   * It has a generic key (like `yardi`, `custom`, `finance`, etc), for which you can specify during the transform process which keys you'd like to run.  See the [MapTo ](https://docs.perigee.software/transform-sdk/mapto)section for more info on running with partition keys

Other optional attribute values you can supply are:

* <mark style="color:purple;">**IsPostTransform**</mark> <mark style="color:purple;">**(false|true)**</mark> - This is typically `true`, meaning this process is run after the transformation occurs.&#x20;
* <mark style="color:purple;">**IsPreTransform**</mark> <mark style="color:purple;">**(false|true)**</mark> - This is typically `false`, meaning this process is run before the transformation occurs. Less common, as typically you validate the data after it's been modified and mapped.

### Interface

The <mark style="color:orange;">**`IDataQualityValidator`**</mark> interface gives the method all of the required data it needs to process the file.&#x20;

The main method you'll use in the <mark style="color:orange;">**`TransformDataQualityContext`**</mark> is the **`Process`** method. This method automatically parallel processes the entire dataset and provides an easy to use callback to add validation rows.&#x20;

The end result of any callback should be adding a new **`DataQualityValidationRow`** for every quality issue that is found. Finishing the implementation for our amount zero check, we'll look at any **AMOUNT** columns that do not convert and read as `0.0m`.

```csharp
[DataQualityCheck(true, "Amount Zero", "A check to determine if any AMOUNT columns are zero", validateAtTable: true, partition: "yardi")]
public class DQ_AmountCannotBeZero : IDataQualityValidator
{
    public void Validate(TransformDataQualityContext data, ref DataQualityValidationResult result)
    {
        data.Process(result, () => data.RequiredColumns("AMOUNT"), (header, row, indx, bag) =>
        {
            var amount = row["AMOUNT"];
            if (amount != DBNull.Value && Convert.ToDecimal(amount) == 0.0m)
                bag.Add(new DataQualityValidationRow("AMOUNT", amount.ToString(), indx));

        }, 1);
    }
}
```

#### One more example:

Here's another example of a check that validates no newline characters are present. You can see the exact same pattern is followed, we just use the helper method **`ColumnsOfType`** to determine any <mark style="color:red;">string</mark> columns, then iterate those and report.&#x20;

```csharp
[DataQualityCheck(true, "Newline Columns", "If the cell contains two or more lines separated by a newline", validateAtTable: true, partition: "yardi")]
public class DQ_NewlineColumns : IDataQualityValidator
{
    public void Validate(TransformDataQualityContext data, ref DataQualityValidationResult result)
    {
        var dc = data.ColumnsOfType(typeof(string));
        var nlc = new char[] { '\r', '\n' };
        data.Process(result, () => dc.Any(), (header, row, indx, bag) =>
        {
            foreach (var col in dc)
            {
                if (row[col] == null || row[col] == DBNull.Value) continue;
                if (row[col]?.ToString()?.IndexOfAny(nlc) != -1) bag.Add(new DataQualityValidationRow(col.ColumnName, row[col]?.ToString(), indx));

            }
        }, dc.Count);
    }
}
```

## SDK

To see all of the available methods, properties, and helpers, check out the SDK page:

{% content-ref url="../sdk-reference/dataqualitycontext" %}
[dataqualitycontext](https://docs.perigee.software/transform-sdk/sdk-reference/dataqualitycontext)
{% endcontent-ref %}

## Running Data Quality Manually  (SDK)

If you're running DQ as part of a transform it's baked right into the process. See [MapTo](https://docs.perigee.software/transform-sdk/mapto).

If you are wanting to run DQ modules outside of this process, here's an example of running them manually:

```csharp
//Get data and map spec first
var sourceData = new DataTable();
var mapSpec = Transformer.GetMapFromFile("map.xlsx", out var mrpt).FirstOrDefault();

//Iterate modules defined in assembly
foreach (var dqTable in DataQuality.GetModules().Where(f => f.dataQualityCheckAttribute.Active))
{
    var dqresult = dqTable.RunForInstance(dqTable.Instance, 
        new TransformDataQualityContext(dqTable.dataQualityCheckAttribute, sourceData, mapSpec, null));

    if (!dqresult.Ignored)
    {
        //Do whatever you like with the results
    }
}
```

## Installation in Client App

If you created a `plugin.dll` project: Compile the project and drop the `.dll` into the <mark style="color:purple;">**`Plugins/DQ`**</mark> folder.&#x20;

If you wrote the process in the same project as you're running, the plugin loader will automatically take a scan of the assembly and the plugin is available for use.&#x20;
