The Mapping Document
Last updated
Last updated
Mapping is an integral part of any data transformation process. It allows you to specify a target format with data restrictions like length, precision, requirements, and more.
The mapping document also allows you to specify certain rules on how that target field is created. Is it simply a Copy from the source? Is it Defaulted to a value? Do you Map it to another lookup value?
Let's take a look at the mapping document and how easy it easy to get started
Let's ask Transforms to generate us a map of a student
Here's the map, unmodified, from the site:
Using the student data that we have, we need to make some tweaks to this map:
Let's add the Major column, as it's missing
We'd like to null out (or make empty) the DateOfBirth, as our target doesn't need that information.
To do that:
Copy the LastName or FirstName row, as it's close to the data type and information we need.
Change the TargetFieldName to Major.
Change the Ordinal to 6.
Change the TransformType of DateOfBirth to EMPTY.
And just like that, we're ready produce a valid Student file! Let's use Transforms again, but this time, let's take our map AND our sample data from above and let Transforms auto assign the source fields to the destination fields.
Head over to Prepare -> AutoMap. Simply drop your data file, and your map, and re-download the mapping document.
And just like that, we've got ourselves a fully ready to use map! In this case, all of the source fields perfectly matched the target fields. So it would be easy for us to just copy those over. However, in many live scenarios this won't be the case. The AutoMap process uses fuzzy logic to intelligently assign source columns from your input data to try to match the TargetFieldName from the map.
This is the ETL name or "Transform Group Name". It's a unique specifier to tell the system what the name of this map is.
This is the column name of the source data (the data that is to be processed by Transforms). It should identically match the input data column name.
This is the name of the column in the final transformed file. This should match what your database, target system, or specification has for the name.
This tells the map at what location (left to right, 0-n) should this column be.
The following types can be used in the TransformType
column
COPY
This copies the data from the assigned source field. It's still held to the length and precision requirements defined
EMPTY
This is an empty value, specifying no operation is taken on this
DEFAULT
This assigns the value of the "DefaultValue
" column in the map
MAP
This is a special type that when supplying a lookup file, will use the "MapKey
" column to perform a lookup of the source field, through the defined "MapKey"
, to a new value.
GUID
Every row will have a newly generated guid in it
This key is required when the TransformType
is set to MAP. This tells the system which subset of lookup values to pass your source data through.
Taking a look at the example lookups file in the table below will show what exactly this MapKey
is tied to:
StateCode
AZ
Arizona
StateCode
NC
North Carolina
PropertyCode
A12
AB001200
PropertyCode
Z19
AB001900
The default value is used when the TransformType
is set to DEFAULT. There is no transformation applied to this field, the value is simple "copied" over.
This is the datatype used for this field. It uses MS-SQL data types to define the input types.
This is the max length for strings, and precision for decimals.
This only apples when a scale data type is used, like a decimal. This is the scale.
This field is only assigned to true when we specify it's required for a specialized data quality check, or transformtion process. Behind the scenes, it creates a copy of that input column and allows the transformation engine to access the original, unmodified cell data during the process.
Most of time this field should be ignored and kept to "no"
This field indicates that it is active to be used. If set to false, the field will be ignored
Transformations occur regardless of missing fields or not. What this changes however is how the transformation report indicates issues, and their severity. Missing "required" columns generate a higher severity line in the transformation report than "non required" columns.
When loading into a system like Yardi Voyager, you'll need the CSV Topline above the header rows. This column indicates what that line should be if the CSV topline is written.
If the file has a specific name to be assigned on export, you can supply that here. The output file will be given this name if it exists.
This is a unique ID that should specifically identify the map. It can be used during the auto identification process when it is unclear what type of input file was given to the system.
Uses an Fx expression to calculate the value of a column. This is done AFTER all other transformation/conversions (in a post process)
To see the functions and available features, check out the link below
Don't forget to check out the Additional Transforms Fx Features
Uses an Fx expression to filter rows. The expression MUST return a boolean, meaning a comparison operation. Anything that returns true is eliminated.
To see the functions and available features, check out the link below
Don't forget to check out the Additional Transforms Fx Features
You can order the resulting file with a multi-sort by using this column.
Simply supply the order and the direction as such:
FirstName
2.asc
LastName
1.desc
OrgID
4.asc
Transforms enables additional F(x) features that make providing F(x) expression inside a map easier.
Aggregate functions allow you to aggregate values and you may also optionally supply order by, partition by, and rows between clauses.
Here's a few examples showing off how to use partition, order and rows clauses. They are very similar if not identical to MSSQL's syntax
Here's the list of aggregate functions:
sum
min
max
avg
count
first_value
last_value
row_number
rank
dense_rank
trannum()
- Used for Yardi type transaction numbering. The columns must have Book,PostMonth,Person
lookup(mapKey, value)
- The same function to lookup a value from an associated lookup values map. The mapKey is the group and the value is the key.
[$this]
- This selects the current TargetName. If the expression is on a field called Amount, you could use: [$this] > 50
as a way to filter amounts greater than 50
[!<col>] - If you use an exclimation mark, you can actually specify a SourceField column to use as the input to a function or calculation. This is incredibly powerful as you can reference unmapped columns, before any modification, mapping, or conversion has taken place. Example: [!firstName]
.
[$src]
- Source is the equivalent of selecting the source column from the current row. Just like [$this]
, but for the [!sourcecol]
.
This generation methods allows us to take an idea, or concept for a map as a starting place and let AI generate the map for us. This is one of many ways to create a map without having one.