IBM Information Server 8.X (DataStage): Parallel Transformer Stage

Business

DataStage: What is Transformer Stage?

DataStage provides multiple stages for extracting, transforming, and loading data into data warehouses or data marts. The stages are classified into General, Database, Development and Debugging, Archive, Processing, Real Time, etc. These stages will be classified into categories of active or passive stages.

The transformation stage is a processing stage.

This stage allows us to create transformations to apply to your data based on given business rules.

It can have a single input and any number of outputs. You can also have a reject link that takes rows that have not been written to any of the output links due to a write or expression evaluation error or null handling rejections.

The transformer stage is divided into

1. Link area

  • Define column definition
  • Define stage variables

2. Metadata Area

  • Define column metadata for input and output

Output Links:

  1. Pass some data directly through the altered Transformer stage
  2. Modify the derivation by entering the transformation expression.
  3. Specify constraints that operate on entire output links
  4. You can also specify an otherwise binding constraint, which is an output binding that carries all data that is not output by other bindings, that is, columns that have not met the criteria.

A constraint is an expression that specifies the criteria that data must meet before it can be passed to the output link.

Reject link:

You can also specify another link that takes rows that were not written to any other link due to a write error or an expression evaluation error. This is specified outside of the stage by adding a link and making it a reject link. All records that are discarded due to null handling will also be written to reject the link.

If runtime column propagation is enabled, no metadata is required for the outputs.

The Find and Replace capabilities allow you to find the particular string within an expression or to search for column names or to find an empty expression in expression types.

Definition of output column derivations:

  • Use drag and drop or copy and paste to copy a column from input to outputs
  • Automatic column matching facility to automatically configure derived columns from your matching input columns.

Automatic column matching

  1. Choose the output link you want to match the columns with the input link from the dropdown list.
  2. Match type area.
    • Location Match – This will set column branches to the input link columns in the equivalent positions.
    • Name Match: The output leads set based on the name match.

RESTRICTIONS and OTHERWISE/Registration

A constraint is an expression that specifies the criteria that data must meet before it can be passed to the output link.

  • Click the Otherwise/Check In field to display a check mark and leave the Restriction fields blank. This will catch rows that have not met the constraints in all previous output bindings.
  • Clicking the Otherwise/Log field will log the number of rows written to that link (that is, rows that satisfy the constraint) to the job log as a warning message.

Along with these we can define local stage variables, use system variables and we can also set partition methods and sort operations.

Leave a Reply

Your email address will not be published. Required fields are marked *