LogoLogo
HomePricingDocumentation
  • 💿Getting Started
    • Installation and Project Setup
    • Hello Perigee!
    • Perigee Application Design
    • Hello Configuration
    • Hello Logs
    • Hello Integration
    • Troubleshooting
    • Case Studies
  • 📃License + Notice
    • 📂Licensing
    • Notice of Third Party Agreements
  • 🚀Perigee and Beyond
    • Extending - Threads
    • Extending - Loaders
    • ⏳All about CRON
  • 🔮API Generation
    • What is API Generation?
    • API Builder
  • 🗺️Architecting YOUR App
    • Design and Requirements
    • Define Sources
    • Requirements
  • 🧩Core Modules
    • 🌐PerigeeApplication
    • 🪡Thread Registry
    • Event Sources
      • Scheduled/Logic
        • CRON Thread
        • Scheduler
        • Sync Agent
      • Watchers
        • SalesForce
        • Sharepoint
        • Directory Watch
        • Directory Notifier
        • IMAP
    • Credential Management
      • Connection Strings
      • Custom Refresh Logic
      • RestSharp Authenticator
      • Credential Store SDK
      • ⁉️Troubleshooting Credentials
    • Integration Utilities
      • HTTP(S) - RestSharp
      • Transaction Coordinator
      • Limiter
      • Watermarking
    • Alert Managers
      • SMS
      • Email
      • Discord
      • Teams
    • File Formats
      • Excel
      • CSV
    • 📁File System Storage
      • File Revision Store
      • Concurrent File Store
      • FileSync + Cache
    • Third Party
      • SmartSheets
      • Microsoft Graph
    • Perigee In Parallel
      • Parallel Processing Reference
      • Extensions
      • GroupProcessor
      • SingleProcessor
    • 🧱Utility Classes
      • Metrics
      • F(x) Expressions
      • Multi-Threaded Processor (Scatter Gather)
      • OpenAI - GPT
      • XML Converter
      • Dynamic Data Table
      • Debounce
      • Thread Conditions
      • Perigee Utility Class
      • Network Utility
      • Lists
      • FileUtil
      • Inclusive2DRange
      • Strings, Numbers, Dates
      • Nested Sets
      • Behavior Trees
      • JsonCompress
      • Topological Sorting
      • DBDownloader
    • 🈁Bit Serializer
  • 📣Examples and Demos
    • API + Perigee
    • 📰Excel Quick Load
    • SalesForce Watcher
    • Report Scheduler
    • Agent Data Synchronization
    • 📩IMAP Echo bot
    • Watch and load CSVs
    • Graph Delegated Authorization + DataVerse
    • Coordinator Demo
    • Azure Service Bus
    • QuickBooks Online
  • 📘Blueprints
    • Perigee With .NET Hosting
    • Web Host Utilities
    • 🔌Plugin Load Context
  • 🎞️Transforms
    • 🌟What is Transforms?
    • 📘Terminology
    • 🦾The Mapping Document
    • 👾Transformation Process
    • 😎Profile
    • 🎒Automation
      • 🕓Package Options
      • 🔳Configuration
    • 🔧Utilities
      • 🧹Clean
      • 📑Map File
      • 🔎File Identification
      • 🗺️Map Generation
      • 🪅Insert Statement Generation
  • 🗃️Transform SDK
    • 👋Quick Start Guide
    • 🥳MapTo
    • 🔌Authoring Plugins
      • 🔘File IO Process
      • 📢Data Quality
      • 🟢Transform Process
    • SDK Reference
      • 🔘FileIOProcessData
      • 📢DataQualityContext
      • 🎛️TransformDataContext
      • 🏅TransformResult
Powered by GitBook
On this page
  • Use Cases
  • Creating a plugin project
  • Authoring the plugin
  • Attribute
  • Interface
  • SDK
  • Running Data Quality Manually (SDK)
  • Installation in Client App
Export as PDF
  1. Transform SDK
  2. Authoring Plugins

Data Quality

PreviousFile IO ProcessNextTransform Process

Last updated 11 months ago

Data Quality plugins inspect the transformed data and provide a list data quality issues. These can be anything from extra newlines, to decimals with too many places, or the presence special characters.

  • Need to modify the data before it's ever touched by the Transforms? Use .

  • Need to modify the data, maps, options, etc after Transforms has successfully loaded the data into a table? Use the

  • Need to generate data quality reporting? Use .

Use Cases

  • A company file doesn't allow for decimals with more than 2 places

  • We need to verify that every group of transactions in the file sum to positive amounts

  • The combination of multiple fields cannot be greater than n length.

Creating a plugin project

If you would like to actually create a plugin library (dll project), follow these steps first and we'll put our code here. Otherwise, , and create the code directly within your project.

  1. Create a new DLL project, and for the time being, set the framework to net6.0.

  2. Install the latest version of Perigee using install-package perigee - OR use Nuget Package Manager.

  3. Open the .proj file by double clicking on the DLL project in your code editor. You should see the XML for the project below.

  4. The two changes you need to make are:

    • Add the <EnableDynamicLoading>true</EnableDynamicLoading> to the PropertyGroup tag

    • For the PackageReferences, add <Private>false</Private and <ExcludeAssets>runtime</ExcludeAssets>

<Project Sdk="Microsoft.NET.Sdk">

	<PropertyGroup>
		<EnableDynamicLoading>true</EnableDynamicLoading>
		<TargetFramework>net6.0</TargetFramework>
		<ImplicitUsings>enable</ImplicitUsings>
		<Nullable>enable</Nullable>
	</PropertyGroup>

	<ItemGroup>
		<PackageReference Include="perigee" Version="24.6.1.1">
			<Private>false</Private>
			<ExcludeAssets>runtime</ExcludeAssets>
		</PackageReference>
	</ItemGroup>

</Project>

That's it! You've created a new DLL Project that when built, will produce a plugin.dll that Transforms is able hot reload and run dynamically at runtime.

Authoring the plugin

The plugin can contain many Data Quality checks. Each process is defined by a method, and an attribute. Here's what a new process for AmountCannotBeZero looks like:

[DataQualityCheck(true, "Amount Zero", "A check to determine if any AMOUNT columns are zero", validateAtTable: true, partition: "yardi")]
public class DQ_AmountCannotBeZero : IDataQualityValidator
{
    public void Validate(TransformDataQualityContext data, ref DataQualityValidationResult result)
    {
        
    }
}

Attribute

The [attribute] tells the system several important things, in the order shown above, they are:

  1. Active? - Should the plugin loader use this plugin, is it active? Or is this in development or unavailable.

  2. Name - What name is this plugin given? This is shown in the data quality report and should be short and descriptive

  3. Description - May be used in the report to further explain the check

  4. Valdiate At - This is going to the most common use case, validating at the table level. It's also the most performant. The other option is related to validating Set level transforms.

  5. Partition Keys - This is a very important field to fill out. This specifies under what files (partitions) to run the data quality checks. You can partition them for only certain types of files. You may provide multiple keys in a comma separated list like so: "yardi, finanace, FinanceFileA"

    • It's either blank, "" - which means it can always run.

Other optional attribute values you can supply are:

  • IsPostTransform (false|true) - This is typically true, meaning this process is run after the transformation occurs.

  • IsPreTransform (false|true) - This is typically false, meaning this process is run before the transformation occurs. Less common, as typically you validate the data after it's been modified and mapped.

Interface

The IDataQualityValidator interface gives the method all of the required data it needs to process the file.

The main method you'll use in the TransformDataQualityContext is the Process method. This method automatically parallel processes the entire dataset and provides an easy to use callback to add validation rows.

The end result of any callback should be adding a new DataQualityValidationRow for every quality issue that is found. Finishing the implementation for our amount zero check, we'll look at any AMOUNT columns that do not convert and read as 0.0m.

[DataQualityCheck(true, "Amount Zero", "A check to determine if any AMOUNT columns are zero", validateAtTable: true, partition: "yardi")]
public class DQ_AmountCannotBeZero : IDataQualityValidator
{
    public void Validate(TransformDataQualityContext data, ref DataQualityValidationResult result)
    {
        data.Process(result, () => data.RequiredColumns("AMOUNT"), (header, row, indx, bag) =>
        {
            var amount = row["AMOUNT"];
            if (amount != DBNull.Value && Convert.ToDecimal(amount) == 0.0m)
                bag.Add(new DataQualityValidationRow("AMOUNT", amount.ToString(), indx));

        }, 1);
    }
}

One more example:

Here's another example of a check that validates no newline characters are present. You can see the exact same pattern is followed, we just use the helper method ColumnsOfType to determine any string columns, then iterate those and report.

[DataQualityCheck(true, "Newline Columns", "If the cell contains two or more lines separated by a newline", validateAtTable: true, partition: "yardi")]
public class DQ_NewlineColumns : IDataQualityValidator
{
    public void Validate(TransformDataQualityContext data, ref DataQualityValidationResult result)
    {
        var dc = data.ColumnsOfType(typeof(string));
        var nlc = new char[] { '\r', '\n' };
        data.Process(result, () => dc.Any(), (header, row, indx, bag) =>
        {
            foreach (var col in dc)
            {
                if (row[col] == null || row[col] == DBNull.Value) continue;
                if (row[col]?.ToString()?.IndexOfAny(nlc) != -1) bag.Add(new DataQualityValidationRow(col.ColumnName, row[col]?.ToString(), indx));

            }
        }, dc.Count);
    }
}

SDK

To see all of the available methods, properties, and helpers, check out the SDK page:

Running Data Quality Manually (SDK)

If you are wanting to run DQ modules outside of this process, here's an example of running them manually:

//Get data and map spec first
var sourceData = new DataTable();
var mapSpec = Transformer.GetMapFromFile("map.xlsx", out var mrpt).FirstOrDefault();

//Iterate modules defined in assembly
foreach (var dqTable in DataQuality.GetModules().Where(f => f.dataQualityCheckAttribute.Active))
{
    var dqresult = dqTable.RunForInstance(dqTable.Instance, 
        new TransformDataQualityContext(dqTable.dataQualityCheckAttribute, sourceData, mapSpec, null));

    if (!dqresult.Ignored)
    {
        //Do whatever you like with the results
    }
}

Installation in Client App

If you created a plugin.dll project: Compile the project and drop the .dll into the Plugins/DQ folder.

If you wrote the process in the same project as you're running, the plugin loader will automatically take a scan of the assembly and the plugin is available for use.

It has the - Which can automatically be selected when running that specific map.

It has a generic key (like yardi, custom, finance, etc), for which you can specify during the transform process which keys you'd like to run. See the section for more info on running with partition keys

If you're running DQ as part of a transform it's baked right into the process. See .

🗃️
🔌
📢
MapTo
📢DataQualityContext
MapTo
File IO
Transform Process
Data Quality
skip this step
DataTableName (TransformGroup)