๐ŸŸขTransform Process

The Transform Process is squeezed between the other two processes. This process is run AFTER the File IO processes and BEFORE the Data Quality processes. The key take away about this process is that the data from the source has already been loaded into a DataTable. This means you can modify known source column names, change the mapping specification, calculate or concatenate fields, etc.

  • Need to modify the data before it's ever touched by the Transforms? Use File IO.

  • Need to modify the data, maps, options, etc after Transforms has successfully loaded the data into a table? Use This Transform Process

  • Need to generate data quality reporting? Use Data Quality.

Use Cases

Transform Process is perfect when:

  • The incoming data is already loaded and you'd like to modify known source column names.

  • There may be unique logic to change the mapping specification before running the transform.

  • You need to add or remove data, concatenate fields, calculate something before transform.

Creating a plugin project

If you would like to actually create a plugin library (dll project), follow these steps first and we'll put our code here. Otherwise, skip this step, and create the code directly within your project.

  1. Create a new DLL project, and for the time being, set the framework to net6.0.

  2. Install the latest version of Perigee using install-package perigee - OR use Nuget Package Manager.

  3. Open the .proj file by double clicking on the DLL project in your code editor. You should see the XML for the project below.

  4. The two changes you need to make are:

    • Add the <EnableDynamicLoading>true</EnableDynamicLoading> to the PropertyGroup tag

    • For the PackageReferences, add <Private>false</Private and <ExcludeAssets>runtime</ExcludeAssets>

<Project Sdk="Microsoft.NET.Sdk">

	<PropertyGroup>
		<EnableDynamicLoading>true</EnableDynamicLoading>
		<TargetFramework>net6.0</TargetFramework>
		<ImplicitUsings>enable</ImplicitUsings>
		<Nullable>enable</Nullable>
	</PropertyGroup>

	<ItemGroup>
		<PackageReference Include="perigee" Version="24.6.1.1">
			<Private>false</Private>
			<ExcludeAssets>runtime</ExcludeAssets>
		</PackageReference>
	</ItemGroup>

</Project>

That's it! You've created a new DLL Project that when built, will produce a plugin.dll that Transforms is able hot reload and run dynamically at runtime.

Authoring the plugin

The plugin can contain many Transformation processes. Each process is defined by a method, and an attribute. Here's what a new process for NewlineRemover looks like:

//Remove all newlines from string columns. They aren't allowed anywhere
[TransformationProcess(false, -99, "Whitespace Remover", "cleanup", IsPreTransform = true)]
public class TR_NewlineRemover : ITransformationProcessTable
{
    public void ProcessTable(TransformDataContext data)
    {
        //Process
    }
}

Attribute

The [attribute] tells the system a few important things, in the order shown above, they are:

  1. Active? - Should the plugin loader use this plugin, is it active? Or is this in development or unavailable.

  2. SortOrder (int) - When multiple steps are defined and activated, which order (ascending) are they run in?

  3. Name - What name is this plugin given? Although it may not be shown anywhere immediately when running locally, the name is used for debugging and shown in certain log messages.

  4. Partition Keys - This is a very important field to fill out. This specifies under what files (partitions) to run the data quality checks. You can partition them for only certain types of files. You may provide multiple keys in a comma separated list like so: "yardi, finanace, FinanceFileA"

    • It's either blank, "" - which means it can always run.

    • It has the DataTableName (TransformGroup) - Which can automatically be selected when running that specific map.

    • It has a generic key (like yardi, custom, finance, etc), for which you can specify during the transform process which keys you'd like to run. See the MapTo section for more info on running with partition keys

  5. IsPreTransform (false|true) - This is typically true, meaning this process is run before the transformation occurs. If you're writing a process to modify the transformed results, then set this to false.

Interface

The ITransformationProcessTable interface gives the method all of the required data it needs to process the file.

Here's a quick example of the powerful toolset provided by this interface. This checks every string column and removes any newline characters from it. If any are found, it will add a ProcessExecution type line item to the transformation report.

//Remove all newlines from string columns. They aren't allowed anywhere
[TransformationProcess(false, -99, "Whitespace Remover", "cleanup", IsPreTransform = true)]
public class TR_NewlineRemover : ITransformationProcessTable
{
    public void ProcessTable(TransformDataContext data)
    {
        var strCols = data.ColumnsOfType(typeof(string));

        data.ProcessRows(() => strCols.Any(), null, (row, indx) =>
            data.EachColumn<string>(row, strCols, (name, str, n) =>
            {
                if (str.IndexOfAny(new char[] { '\r', '\n' }) != -1)
                {
                    row[name] = str.Replace("\n", "").Replace("\r", "");
                    data.report.TransformationLines.Add(new TransformationReport.TransformationLine()
                    {
                        Column = name,
                        Message = $"Removed newlines from source value",
                        RowIndex = indx,
                        Severity = TransformationReport.TransformationItemSeverity.Warning,
                        Type = TransformationReport.TransformationItemType.ProcessExecution,
                        SourceValue = str,
                        DataObjectID = data.map.DataObjectID
                    });
                }
            }, true));

    }
}

SDK

To see all of the available methods, properties, and helpers, check out the SDK page:

๐ŸŽ›๏ธpageTransformDataContext

Running Transform Process Manually (SDK)

If you're trying to run transform processes manually, just make sure to restrict to the proper pre or post transform types. Get the map, data and iterate over the modules like so:

var sourceData = new DataTable();
var mapSpec = Transformer.GetMapFromFile("map.xlsx", out var mrpt).FirstOrDefault();
var report = new TransformationReport() { Name = "Custom report" };

foreach (var trProcess in TransformationProcess.GetModules()
    .Where(f => f.transformAttribute.Active && f.transformAttribute.IsPreTransform == false && f.TableType != null)
    .OrderBy(f => f.transformAttribute.Order)) {

    var tdc = new TransformDataContext(null, sourceData, mapSpec, report, false);
    tdc.attribute = trProcess.transformAttribute;
    trProcess.RunTableInstance(trProcess.TableInstance, tdc);
}

Installation in Client App

If you created a plugin.dll project: Compile the project and drop the .dll into the Plugins/Transform folder.

If you wrote the process in the same project as you're running, the plugin loader will automatically take a scan of the assembly and the plugin is available for use.

Last updated