🦾The Mapping Document
Last updated
Last updated
Mapping is an integral part of any data transformation process. It allows you to specify a target format with data restrictions like length, precision, requirements, and more.
The mapping document also allows you to specify certain rules on how that target field is created. Is it simply a Copy from the source? Is it Defaulted to a value? Do you Map it to another lookup value?
Let's take a look at the mapping document and how easy it easy to get started
Let's ask Transforms to generate us a map of a student
This generation methods allows us to take an idea, or concept for a map as a starting place and let AI generate the map for us. This is one of many ways to create a map without having one. 👍
Here's the map, unmodified, from the site:
Using the student data that we have, we need to make some tweaks to this map:
Let's add the Major column, as it's missing
We'd like to null out (or make empty) the DateOfBirth, as our target doesn't need that information.
To do that:
Copy the LastName or FirstName row, as it's close to the data type and information we need.
Change the TargetFieldName to Major.
Change the Ordinal to 6.
Change the TransformType of DateOfBirth to EMPTY.
And just like that, we're ready produce a valid Student file! Let's use Transforms again, but this time, let's take our map AND our sample data from above and let Transforms auto assign the source fields to the destination fields.
Head over to Prepare -> AutoMap. Simply drop your data file, and your map, and re-download the mapping document.
And just like that, we've got ourselves a fully ready to use map! In this case, all of the source fields perfectly matched the target fields. So it would be easy for us to just copy those over. However, in many live scenarios this won't be the case. The AutoMap process uses fuzzy logic to intelligently assign source columns from your input data to try to match the TargetFieldName from the map.
This is the ETL name or "Transform Group Name". It's a unique specifier to tell the system what the name of this map is.
This is the column name of the source data (the data that is to be processed by Transforms). It should identically match the input data column name.
This is the name of the column in the final transformed file. This should match what your database, target system, or specification has for the name.
This tells the map at what location (left to right, 0-n) should this column be.
The following types can be used in the TransformType
column
TransformType | Description |
---|---|
COPY | This copies the data from the assigned source field. It's still held to the length and precision requirements defined |
EMPTY | This is an empty value, specifying no operation is taken on this |
DEFAULT | This assigns the value of the " |
MAP | This is a special type that when supplying a lookup file, will use the " |
SEQ | A sequence of numbers, starting from 1.
Because ordering is done post process, if you supply a list of |
RANK | A dense rank sequence of numbers, starting from 1.
Because ordering is done post process, if you supply a list of |
GUID | Every row will have a newly generated guid in it |
The sequence and rank types allow you to sort the file while supplying the ordering.
This step is a post process step, so the SourceField
column of the map is a list of TargetFields
.
This key is required when the TransformType
is set to MAP. This tells the system which subset of lookup values to pass your source data through.
Taking a look at the example lookups file in the table below will show what exactly this MapKey
is tied to:
MapKey | SourceField | TargetField |
---|---|---|
StateCode | AZ | Arizona |
StateCode | NC | North Carolina |
PropertyCode | A12 | AB001200 |
PropertyCode | Z19 | AB001900 |
The default value is used when the TransformType
is set to DEFAULT. There is no transformation applied to this field, the value is simple "copied" over.
This is the datatype used for this field. It uses MS-SQL data types to define the input types.
This is the max length for strings, and precision for decimals.
This only apples when a scale data type is used, like a decimal. This is the scale.
This field is only assigned to true when we specify it's required for a specialized data quality check, or transformtion process. Behind the scenes, it creates a copy of that input column and allows the transformation engine to access the original, unmodified cell data during the process.
Most of time this field should be ignored and kept to "no"
This field indicates that it is active to be used. If set to false, the field will be ignored
Transformations occur regardless of missing fields or not. What this changes however is how the transformation report indicates issues, and their severity. Missing "required" columns generate a higher severity line in the transformation report than "non required" columns.
When loading into a system like Yardi Voyager, you'll need the CSV Topline above the header rows. This column indicates what that line should be if the CSV topline is written.
If the file has a specific name to be assigned on export, you can supply that here. The output file will be given this name if it exists.
This is a unique ID that should specifically identify the map. It can be used during the auto identification process when it is unclear what type of input file was given to the system.