Most of the SSIS data flow components allow customers to log the rows with errors to a dedicated error output, which can help customers to investigate and diagnose the root cause of the error. Error output has two 4-byte int columns (ErrorCode and ErrorColumn), which indicate the error code and the id of the error column.
For almost ten years, customers have been asking for improvement of the ErrorColumn in the error output. There has not been any easy way to get the column name according to the int column ID stored in ErrorColumn, which makes ErrorColumn useless in most cases. In this blog, I'll focus on the improvements that we have made to solve the ErrorColumn issue. We now allow customers to easily get the error column name in SSIS designer and execution log. We also provide API so that customers can programmatically get the column name according to an int column ID.
Improvement on SSIS Designer
Advanced Editor
In SSIS data flow, there is a concept named "lineage ID", which is associated with an input or output column. In general, the lineage ID of a column represents the ID of the original upstream output column, here I just call this upstream output column “lineage column”. The lineage ID allows customers to tell where the data of current column comes from. Actually, most of the components will put the lineage id of the error column to its error output. However, it’s very hard for customers to identify or locate the exact lineage column name based on an int lineage ID.
If you open the Advanced Editor of a data flow component, you can see column properties such as ID, IdentificationString, LineageID in the Input and Output Properties tab, where the property ID is associated with the property IdentificationString, a human readable string that uniquely identifies an input or output column. An example of identification string looks like: “OLE DB Source.Outputs[OLE DB Source Output].Columns[UserID]”. However, currently there is not a similar property that is associated with the property LineageID, which makes it hardly to know the upstream lineage column of a column.
To address the problem, a new property LineageIdentificationString has been introduced. According to this property, developers can easily identify the lineage column of any input or output columns, which will be useful in pipeline designing.
Data Viewer
Data Viewer allows customers to see the data rows that come from the upstream output and is often used for debugging errors in SSIS. And if it is enabled on an error output, in addition to all regular columns including ErrorCode and ErrorColumn, there will be another virtual column ErrorCode - Description, which contains the error message translated from the value of the column ErrorCode. However, previously there is not a similar virtual column that can translate the value of ErrorColumn to the identification string of the error column.
With the error column improvements, we have added a new virtual column ErrorColumn - Description in Data Viewer. This column contains the identification string of the error column. Thanks to the new introduced virtual column, now customers can know not only “why” the error happens, but also “where” the error happens!
Improvement on Execution Log
Although the improvement on Data Viewer can help customers to know the information of error columns in designer, we still need another story on diagnosing pipeline errors after the package has been deployed to SQL Server, because there is no designer to help at that time. Obviously the most common way for diagnosing on SQL Server is leveraging execution log.
Our error column improvement now allows SSIS runtime to write the int column ID to identification string map of all pipelines in XML format to the package level event DiagnosticEx, so that customer can find out the exact error column information by looking up the XML document. Here is a sample of the XML document:
The XML document above can be logged via log providers. To achieve that, you need to enable DiagnosticEx event in log configuration window of SSIS designer. However, currently there is a known limitation on the built-in SQL Server log provider: the content of the XML document will be truncated to 2048 bytes, because the type of the column message for SQL Server log provider is nvarchar(2048) instead of nvarchar(max). So you’d better NOT choose SQL Server log provider if you want to see the entire XML content.
In addition to log providers, if the package has been deployed to SSIS catalog, the XML document above can also be written to SSISDB. You need to set appropriate logging level to make SSIS catalog log the XML document. There are two choices:
- Simply use the built-in logging level Verbose;
- Create a new customized logging level, and at least enable DiagnosticEx event.
Please note that customized logging level is also a new feature introduced in SQL 2016 CTP 2.3. Later there should be another blog talking about more details about this feature.
Improvement on Public API
We also provide related public APIs so that customers can build their own related solutions on top of the APIs. First, a new method is added to the interface IDTSComponentMetaData100 to get the indetification string based on the int column ID:
string GetIdentificationStringByID(int lID);
It can be accessed both in the component base class PipelineComponent and Script Component, so that both 3rd components developer and SSIS package developer can leverage this method to get the identification string according to a column ID.
In addition, a new read only string property LineageIdentificationString is added in the interfaces IDTSInputColumn100 and IDTSOutputColumn100. This property is also exposed in Advanced Editor.
Below is an example to showcase how to leverage the new method GetIdentificationStringByID in Script Component to get the identification string of the error column.
- Before creating the new Script Component, configure an upstream component in the data flow to redirect rows to its error output when an error or truncation occurs. For testing purposes, you may want to configure a component in a manner that ensures that errors will occur. For example, by configuring a Lookup transformation between two tables where the lookup will fail.
- Add a new Script Component to the Data Flow designer surface and configure it as a transformation.
- Connect the error output from the upstream component to the new Script Component.
- In the Script Transformation Editor, on the Input Columns page, select the ErrorColumn column.
- On the Inputs and Outputs page, add a new output column of type String named “ErrorColumnDescription”. Increase the default length of the new column to 255 to support long identification string.
- Open the Script Transformation Editor, and on the Script page, for the ScriptLanguage property, select the script language.
- Click Edit Script to open the Microsoft Visual Studio Tools for Applications (VSTA) IDE and add the sample code shown below.
- Close VSTA.
- Close the Script Transformation Editor.
- Attach the output of the Script Component to a suitable destination. A Flat File destination is the easiest to configure for ad hoc testing.
- Run the package.
public class ScriptMain : UserComponent
{
……
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Row.ErrorColumnDescription =
this.ComponentMetaData.GetIdentificationStringByID(Row.ErrorColumn);
}
}
Conclusion
ErrorColumn is an important information in pipeline error diagnostic. However, in previous versions of SSIS, customers don’t have any easy way to make use of such information. This blog has introduced several improvements we have made on ErrorColumn, which have covered design time and runtime diagnostic with ErrorColumn, and we have also provided public APIs to help customers easily implement their own scenarios related to ErrorColumn.