Configure Data Source Tables

 

Configure Data Source Tables

Expand the <//Solution : name / Data Sources / Data Source : namefolder and select the Data Source Table to be configured.
Select the table to present the editor used to edit Data Source Tables.
The main goal here is to make a mapping from source system data into data warehouse data source tables.

The Procedure   
Right click any Data Source or Data Source Folder node. Click New Data Source Table.
Define the fields presented then click Finish.
The next step is to define Column Mapping.
Define and configure the Source Columns and mapping to the corresponding Data Source Table-Column, then click Save to save the Column Mapping definitions.
The resulting Data Source Table is previewed by clicking on the Preview Table button.
The Data Warehouse is updated by right clicking a Data Source Table and clicking on Process. Alternatively select the Data Source Table and click the Process button. 
View the Data Source Table in the Data Warehouse by right clicking a Data Source Table and clicking on Browse. Alternatively select the Data Source Table and hit the F5 key. 

Data Source Table - Right Click Options

New Script

Click to open the form that allows for definition of a script.

Save As Template

Data Source Tables can be saved to the Template library for use in future projects. Select a Dimension Table and click. The displayed dialog gives the user the opportunity to either Save the Table as New or to Overwrite an existing Table.

Copy

A Data Source Table can be copied to clipboard by clicking.

Delete

Click to delete the selected Data Source Table. An "Are you sure?" dialog requiring a confirmation is displayed before the delete is performed.

Data Source Table - Command Buttons

Save

Click to save the information defined in the Details and the Column Mapping panes to the Solution database.

Reset

Click to undo any edits and or deletes and redisplays the original configuration as of the most recent Save.

Process / Edit Data

The "Process" button will load source data using the previously used load option.
The "Edit Data" button is visible if the table is in an editable state. The user can then edit the data warehouse table data directly.

Tasks and Data

Data Warehouse Table Management.

The "Tasks and Data" button will allow for specifying reload options for an ad hoc reload or perform delete operations.

Load Options:
- Reload Days (Number of days, using Transdate column)
- Reload Months (Number of months, using Transdate column) 
- Reload From Date (using Transdate column)
- Reload From and To Date (using Transdate column) 
- Custom Reload (using Source Filter and Delete Filter)
- Incremental Reload (MERGE using Primary Key)
- Reload All 
- Auto Update (optionally utilizing auto update table or auto update filter) 

Manual updates to the Data Warehouse are performed using the command buttons in the Tasks and Data form. The following commands are available:

Transfer Data (Generate if view)

Click to retrieve data from the Data Source and update the Data Warehouse.

Auto Update

Click to automatically update necessary data.

Browse Data

Click to view the data in the Data Warehouse.

Delete and Transfer Data (Delete and Generate if view)

Click to delete the data from the Data Warehouse before retrieving data from the Data Source and update the Data Warehouse.

Delete

Click to delete the data from the Data Warehouse.

Save SSIS Package (SSIS only)

Click to Save the SSIS package to file.

Validate and Save SSIS Package (SSIS only)

Click to Validate and then Save the SSIS package to the SQL Server.

Delete SSIS Package (SSIS only)

Click to Delete the SSIS package on the SQL Server.

Abort

Click to stop the execution of a Transfer.


Main Settings

Name

Name of the Data Source Table

Type

Type of Data Source Table

Mode Data Source Table modes include Default, Linked data and Staging.
- Default mode require mappings to defined data warehouse column names and data types (Name Definitions)
- Linked data mode indicates loosely linked data the user should be aware of. Source queries or scripts can be defined to browse source data. Also an integrated script can be added to allow for interaction with the linked data.
- Staging mode allows for transferring a staging table to the data warehouse without mapping to pre-defined data warehouse column names and data types.

Tag type (Tag Value tables only)

All Values (Automatically fetch all values)
Multiple Values (Allow user to select from multiple Values)
Number List (Allow user to select from multiple values to populate a number list)
Single Value (Allow user to select a single value)
String List (Allow user to select from multiple values to populate a string list)

Get Table Row Counts (Table Metadata tables only)

Turns on table by table row count (SELECT COUNT(*) FROM tableprefix.tablename is executed and the resulting row count is inserted into the MetadataTableRowCount field)

Get View Row Counts (Table Metadata tables only)

Turns on view by view row count (SELECT COUNT(*) FROM tableprefix.viewname is executed and the resulting row count is inserted into the MetadataTableRowCount field)

Get Primary Key Info (Field Metadata tables only)

Turns on gathering of primary key information (the SQL defined in the Primary Key Info field is executed and the resulting primary key information is inserted into the MetadataFieldKeyInfo field)

Add source tables as staging tables (Processed Table Metadata tables only)

Will add selected source tables as staging tables.

Add source tables as linked data tables (Processed Table Metadata tables only)

Will add selected source tables as linked data tables.

Usage (Tag Value tables only)

Will perform a scan in all data source tables within the data source and display a list of all contexts where it is used.

Description

Optional

Comments

Optional


Source Mappings

Source settings - Command Buttons

Edit Source Query Click to view or define a SQL query or a Power Shell script.

View Source Table

Click to display the Data Source Table and Column names and the results of the Source Query. Using drag and drop, column headings can be copied into the Source Column cells of the Column Mapping table.

View File (File and Excel only)

Click to view the file chosen in the Table Name pull down list.

Source Settings

Source location

Select from the pull down list. Source is set to either External or Data Warehouse depending on whether the data will stem from the source system or existing data warehouse data.

Source data

Table or view - Select a table or view from the "Table name" pull down list.
SQL source query - Click on the "Edit Source Query" button to define a SQL query.
Edit table - Click on the "Edit Data" button in the topmost toolbar to manually update the data. If the table has not yet been transferred or created it will be created on-the-fly when the "Edit Data" button is pushed.
PoSh source script - Click on the "Edit Source Query" button to view or edit a PowerShell source script.

Table name / SQL expression / File name

Select the Source System table name from the pull down list. Alternatively specify a SQL Expression.

Table filter

Define a filter on the rows in the source table, i.e. the Where Clause. For example, the Table Row Filter: SHIPMTHD = ‘GROUND’ will return only the rows with the GROUND shipping method.

Apply main filter

Indicate whether the main filter applies to this data source table or not. (Main filter is set on Data Source page)

Source settings - Edit Buttons

Table Name… (SQL Expression)

Click the button to display the Edit SQL Expression Editor. See SQL Expression Editor below for details.

Table Filter…

Click the button to display the Edit Filter Editor.

SQL Expression Editor

The SQL Expression Editor is accessed by clicking the button on the left side of the Table Name pull down list.

The SQL Expression Editor is a very powerful tool for browsing the Data Source, defining simple or complex SQL Expressions, and mapping Data Source Tables-Columns to a Dimension Table Column.

The Editor has three (3) functional Panes. They are the Expression Pane, Preview Pane and Objects Pane.

Expression Pane

The Expression Pane displays the SQL Expression that retrieves information from the Data Source and updates the Dimension Table in the Data Warehouse. Expressions are automatically constructed by the system when one table is the target. When more complex expressions are required the user can create the expression in the editor, or copy and paste expressions from other editors. The Expression Pane includes the following commands:

Save and Close

Saves the expression and closes the editor display.

View Expression Data (SQL only)

Executes the SQL Expression and displays the results in the Preview Pane.

Execute (PowerShell only)

Executes the PowerShell script (often transferring to a staging table)

View Staging Table (PowerShell only)

View the resulting staging table.

View Replacement Values (PowerShell only)

View tag, friendly name and macro replacement values.

Comment (PowerShell only)

Comment selected lines.

Uncomment (PowerShell only)

Uncomment selected lines.

Timeout (PowerShell only)

Timeout for Execute or View operation.

Expression Builder / View SQL Expression (SQL)

Shows either the Expression Builder or the SQL Expression Editor.

Update Column Definitions (SQL Expression Builder only)

Refreshes the Column Definitions for all Tables used in the Expression Builder.

View Tag Values (SQL only, not Expression Builder)

Display the actual data for Tags and freindly names in the SQL Expression.

Show / Hide

Shows or hides the Editor panes. 

Preview Pane

The Preview Pane displays the SQL Expression results or displays Data Source Table-Column data when browsing. Column headers can be selected, dragged and dropped into the Column Mapping table.

Objects Pane

The Object Pane is used to browse the Tables and Columns of a Data Source. There are three (3) tabs in the Object Pane provides the following capabilities.

Source Tab 

The Source Tab provides the following functions:

Load

Load All Table Names
Load Table Names with Row Count > 0 -> Sort Alphabetically
Load Table Names with Row Count > 0 -> Sort by Row Count
Find Field Name...

View Table Data

The selected table in the Objects Pane is displayed in the Preview Pane. Allows the user to drag & drop a Data Source Table Name into the SQL Expressions; and / or into a Source Column cell in the Column Mapping pane.

Functions Tab 

Displays a library of commonly used SQL Functions and allows the user to drag & drop Functions into SQL Expressions.

Tags Tab

Displays a list of the defined Tags and allows the user to drag & drop Tags into SQL Expressions. 


Column Mapping

The Column Mapping pane defines the Data Source Columns and their mapping to the corresponding Source Table Columns. The Data Source Column Names are the Profitbase Standard Base Format, i.e. the metadata names that are used in the data warehouse and in the OLAP Cubes. 
All data source tables must have source data and column mappings. 
Depending on the data source table type there are different set of columns available. 

Fact Tables typically consist of DimensionID's and measures. Also more unusually fact tables can contain Drillthrough columns like Description, but this is generally not recommended because it can compromise performance.
(Wide) Dimension Tables typically consist of a DimensionID defined as key attribute and Dimension Properties and/or DimensionID's defined as additional dimension attributes. In addition to that it is not unusual to also define _Name and _Name_LANG columns as attribute names and attribute name translations.
Parent / Child Dimension Tables typically consist of a key column and a description column. Additionally it is not unusual to have DimensionID's as Parents and Dimension Properties as attributes. UnaryOperator and MemberFormula columns are also available for Parent/Child Dimension Tables.
Other Tables are typically used as mapping tables consisting of two DimensionID's indicating a from and to ID. 

Data Source Column

The Data Source Column is defined by first clicking, then clicking in the Data Source Column cell of the newly added row. Select the desired Data Source Column using the pull down menu. Hint: Type all or part of the Dimension Column name to quickly find items in the pull down list.

Primary Key / Index (Type if Parent / Child tables)

Define if column should be (part of) primary key or index.

Source Expression

There are several ways to enter the name of the Source Column. They are:
1. Type the name in the field. The format is typically: LTRIM(RTRIM(sourcecolumn))
2. Click to display the table of raw data via the View Source Table command button; then Drag & Drop the desired column header to the Source Column cell. The system will automatically format the field.
3. Select the Source tab in the Source Mapping pane to display the columns from the Table Name defined in the Details pane. Drag & Drop the desired Source Column name to the Source Column cell or position the cursor in the desired Source Column cell then double-click on the desired Source Column name. The system will automatically format the field appropriately.
4. Select the cell. Click Edit Column Expression to display the Edit Column Expression Editor.

Comments 

Optional

Column Mapping - Command Buttons

New

Click to insert a new row in the Column Mapping table.

Delete

Click to delete the selected row immediately. Click Save to apply the Delete(s) or click Reset to undo the Delete(s).

Map Source Columns

Click to automatically map unused source columns.

Multi (Single) Column PK/Index

Enable using a multi-column primary key and/or a multi-column non clustered index. After checking this option you must go to the "Primary Key / Index" drop down to set the sequence of the primary key and/or index columns.

Edit Column Expression

Click to display the Edit Column Expression form that allows for a more complex mapping expression rather than a simple data source column reference.

View SQL Expression

Click to display the SQL Expression that the system constructs to extract the data from the Data Source.

Preview Table / View Source Table / View Datawarehosue Table (F5)

Preview result of Column Mappings. The context sensitive F5 key show either source/mapping or dw data.

Show/Hide Source Mapping Panel

The Show Source Mapping Panel command is used to make the Source Mapping Panel visible. Use Hide Source Mapping Panel as a means to make more space available for the Column Mapping grid. 

View Local Tags (Source Mapping Panel -> Tags tab)

Click to view the custom tags specific to this Data Source.

View System Tags (Source Mapping Panel -> Tags tab)

Click to view the tags Profitbase Studio automatically maintain the values of. (soime may be local to the data source (i.e. DATABASE tag gets value from connection Database field) while other tags are solution wide (i.e. STARTDATE tag gets value from Solution Start Date field)

Reload Settings

Reload Modes

Transdate reload, Allow for date based reload options. User must specify a Date Column in the source table.
Incremental reload, Allow for incremental load of data. Data is inserted or updated based on a Primary Key. User must specify a Reload Source Filter.
Custom reload, Allow for a fully customized reload of data. User specifies Reload Source Filter and Delete Filter.

Reload Options

The following reload options are available:

Maintain ModifiedDate ModifiedDate will be maintained by Profitbase Studio on a row level. ModifiedDate column will then not be part of change check.
Maintain RecordID Maintain RecordID as identity column and Primary Key/Clustered Index. RecordID column will then not be part of change check.
Maintain RELOADVALUE Enable RELOADVALUE tag. You need to supply a SQL expression returning the updated value for the RELOADVALUE tag. This SQL Expression will be executed after each Transfer / Generate.

Data changed trigger

Enable a data changed trigger. A SQL expression can be specified to make sure the system picks up when the view or table has updated data.

Data warehouse filter Enable an "Insert Filter". This can be used in combination with the RELOADVALUE tag to avoid filtering on non-indexed source columns.

Skip DimensionID scan

No distinct table scan will be performed for this table.

Skip null and empty handling

Null or empty string values in DimensionID columns will not be updated with unknown member values (*NULL and *Empty for strings, -1 for numeric datatypes. 01/01/1900 for date datatypes and 0 for the bit data type).

Skip full reload of distinct tables

Full reload (drop/create/insert into) will not be executed on distinct tables if All (Full Reload) is used. Instead only new items will be added to distinct table.
This will usually give better performance, but downsides can be more fragmentation and seemingly "un-removable" unknown members.

 

 

Options

Storage

The Storage settings often affect performance while allowing compressed use of the allocated disk space.

Use page compression

The table will be stored using page compression. This can give better query performance. The Estimate Compression Savings function in Data Source context will present the best candidates for compression.

Use memory-optimized table

SQL 2016 supports memory-optimized tables. 
Durable
Non-durable
Index bucket count 

Use columnstore index

The table will be stored using column store indexing. This will in most cases give better query performance.
Clustered
Nonclustered 

Use partitioning

Specify whether table should be partitioned. Select existing partition from drop down list. Solution Partitions are maintained in the Solution Management part of the application.
If the selected partition is not a SQL partition. the partition scheme is only used to track changes per partition. For more details. See tutorial here
 

Skip partitioning of Distinct Tables (visible if Use partitioning is selected)

By default also distinct tables will be partitioned to allow for tracking data changes by partition. If this is not wanted the user can avoid partitioning of distinct table(s).

Process settings

Process modes

The process modes group contains the following options:

All (full reload)

If unchecked. The Reload all (full reload) option will be disabled.

View

Indicates whether the data source table should be persisted as a view instead of a data warehouse table.

Editable

User will enter the data himself instead of extracting it from a source system. The table will be prevented from being transferred again when in this state.
The "Process" button changes text to "Edit Data" once the table is set to the editable state. The user can edit the data warehouse table data via this "Edit Data" button. 

Archive

The table will be prevented from being transferred again.

Hide (as Fact),

Determines whether the table will be available in other contexts (modules, measure groups). The table will not be visible in other contexts than within this data source and internal data sources.

Incremental Table

Indicates whether the table is an incremental table or not. (read-only)

Add Incremental Table (link button visible for fact tables if no attached incremental table exist)

Allows the user to add a new incremental table.
Add Data Flow for Incremental Table (link button visible for incremental tables) Allows the user to create an automatically generated data flow containing the selected incremental table, its connected scripts and 

Load options

The following load options are available:

Force uniqueness

Rows resulting in duplicate keys will be removed to ensure a unique Primary Key. (Only most popular row version for key will be transferred.)

Enhanced (Enabled if force uniqueness is checked)

Most commonly used properties (non-empty values) will be filled in overriding the most commonly used row version.

Keep existing data if no rows loaded

Existing data will not be replaced until one or more rows are transferred as part of the reload.

Abort if rows decrease

The transfer will abort with an error message if reload results in a decreased number of rows in the data warehouse table.

Skip change checking

Transferred rows will not be checked for changes. (By default Profitbase Studio will check if data has changed by comparing new data to existing data. (exception is Fact and Other type tables are not checked for changes if Reload All (Full Reload))).

Skip datatype cast

Source column data types will not be cast into new Name Definition data types. Also COLLATE clause will not be added if generated as view.

Keep temp tables

The temp/staging tables will not be deleted.

Use Script Extensions to load source data

Enable scripting of transfer phase between source and temp or data warehouse table (depending on Reload Method and Source Type). This option is to be used in combination with a script extension distributing data from source table to the correctly named and formatted temp table(s).

Use Script from temp to data warehouse table

Enable scripting of transfer phase between temp and data warehouse table. This option is to be used in combination with a script extension distributing data from temp table to data warehouse table(s).

Use Script Extensions for primary keys and indexes

Enable scripting of primary keys and indexes. This option is to be used in combination with a script extension maintaining primary keys and indexes. 

 

SSIS Settings (SSIS data sources only)

SSIS source settings

The following SSIS source settings are available:

Use custom source connection string For File, Excel and Access data sources one can customize the source connection string.
Specify a customized source connection string. Usually one will add the IMEX=1 setting or change ACE Version from 12.0 to 15.0. 
NB! The Connection string must end With ";Data Source=". Users cannot override the Data Source part of the connection string. 

Use default code page

For Oracle SSIS connections it might be necessary to manually set the default code page. 
For some SSIS connections (i.e. Oracle) it might be necessary to manually set the default code page.  

SSIS destination settings

The following SSIS destination settings are available:

Destination adapter

OLEDB Destination is more flexible.
SQL Server Destination can give better performance but has more limitations.

Max insert commit size (OLEDB Destination only)

Manually set the batch insert commit size. 
The FastLoadMaxInsertCommitSize property specifies the number of rows to commit as a batch.

Fast load options (OLEDB Destination only)

Manually set the fastloadoptions property. 
These fast load options are stored in a comma-separated list.  Default values are TABLOCK, CHECK_CONSTRAINTS, ROWS_PER_BATCH=1000. The value 1000 indicates that the destination is configured to use batches of 1000 rows.

 

Auto Update Overrides

Skip auto update

Check to indicate that this table will not be a part of auto update reloads.

Maintain auto update table

Specify whether auto update on the data warehouse module level should add missing DimensionID's in a TABLENAME_AutoUpdate table. Applies to Wide and Parent/Child Dimension tables.

Auto update filter

The user can specify a separate filter for auto update reloads. This filter can for example make use of the TABLENAME_AutoUpdate table if the "Maintain auto update table" is checked.

  

Partition Schemes

Partition Schemes - Command Buttons

Refresh

Click to Refresh partition information after Processing.

Partition Scheme Fields

Sequence

The partition range sequence number.

Boundary

The boundary values for the partition range.

ID

ID of the partition range.

Last Data Change

The last time partition range data was changed.

Rows

Number of rows in partition range.

 

General

For Name, Description, Comments editing click here. 

For use of status tab click here. 

For shortcut keys click here.

 

See also

Script Extensions

Using Script Extensions from Content