Tuning an Informatica ETL Session End-to-End

Let’s face it some Informatica sessions can be stubborn and slow…right?  When this happens it can be important to have a methodical and documented approach to fixing them.  A checklist is a great way to organize your thoughts and begin reviewing the session areas that could be slowing it down.  below, I’ve listed a few points that have helped me in the past.

First, breakup the Session into 3 parts like the following and review busy percentages of the session run, if available, this is usually a helpful indicator as to where to start your tuning efforts:

1.) Source(s) – Ensure the source(s) are optimal by seeing how long they take to run into a #temp table for a quick check.  If the session takes 20 minutes to complete and it takes 18 minutes to create and populate a temp table with the data, this is your guy.

2.) Target(s) – Make sure the target is ready to load data. Are indexes where you need them (or don’t want them)?

3.) ETL logic – Minimize the number of ETL Transformations and how the are used.  Scrutinize the ones needed and keep them as efficient as possible.

 

SOURCE(s)  - SQL Override

1.) If the session takes a while to start moving any rows, check out the SQL Override (if one is used).  This in my experience is about 80% or more of the performance problems I see.  SQL Overrides can get complicated quick; ensure you are moving the smallest amount of data possible in number of rows and width of columns selected.

2.) View the execution plan (estimated and/or actual) can be more than helpful by providing insight as to how the query is acquiring the data.  Fundamentally, if you are moving a lot of data you should expect hash or merge join operators and minimal, if any, loop joins… especially if there are index scans on the outer part of the join.  If you are anticipating moving very few rows, as in a CDC style mapping, you should expect few scans and hash or merge joins, instead opting for nested loop joins and index seeks.  Additionally, check out Grant Fritchey’s (Twitter \  Blog) free ebook SQL Server Execution Plans.

3.) Tune the SQL override with “SET STATISTICS IO ON” to view the logical I/Os, these are pages read from memory.  Less is more for these, indicating the work SQL Server needs to do to complete the requested query.  Review the pages read from each of the tables and look to improve these numbers by using or altering existing indexes or even creating new ones if the reason calls for it.  *Note* additional indexes may affect other queries (positively or NEGATIVELY) so proceed with caution and ask your friendly neighborhood DBA for his/her thoughts.

4.) Ensure Proper Statistics are in place.  The SQL Server Query Optimizer is a cost-based optimizer, so cardinality estimations need to be as accurate as possible to produce an optimal execution plan.   I have found it helpful to leverage indexes to even drop/create statistics (or update statistics) on the fly, before the actual Select statement runs, to ensure the ideal row count is expected by the optimizer.  An additional and less invasive approach is to prebuild a temp table with the rows in your “driver” table or the table with the targeted where clause.  This is also beneficial in cases where there are multiple filters in the Where clause that may confuse the optimizer into choosing a clustered index scan to review all of the filters at once.  In situations like this, I have found it helpful to apply the greatest filter first by leveraging an index on the search Date or Control Sequence Number for example, into a temp table, and then apply the remaining un-indexed filters to the temp table.  This approach can dramatically reduce the number of pages read by performing a single seek on the larger physical table and then applying the rest of the filters to the much smaller temp table via a scan.

 

TARGET(s)

1.) Similarly to select statements in the SQL Overrides, we can (and should) review the INSERT/UPDATE/DELETE Execution plans that take place on the target(s).  Execution plans can quickly reveal potential performance issues.

2.) Foreign Keys can certainly break the back of an otherwise perfectly performing session.  With Foreign Keys, execution plans will show the lookups to the parent tables to ensure the changes still maintain proper relationship integrity.  For example, before a new row is inserted, a Foreign Key must still have a parent key value in the referenced table.  This can be costly and careful thought should be considered to disabling the FKs before the data load.  One thing to note is that Informatica automatically disables the “check” on Foreign Keys when running in Bulk Mode.  This check should be enabled following the successful completion of the data movement; Post SQL is a great place for this.

3.) Indexes can also impact the target performance.   Changes into the table (inserts/updates/deletes) will also need to be reflected in the indexes.  This extra work can also be expensive, not to mention index page splits to maintain the index’s B-tree structure.  Additionally, a missing index can be just as bad, or worse, than having additional indexes.  For Sessions that perform updates and/or deletes indexes are needed for the predicate (the condition to update or delete on) such as a Key value.  If there isn’t an index on the field the condition is looking for, SQL Server must perform an index scan where the field does exist, most of the time this is the clustered index… for each row.  Ouch!  For more information on indexes check SQL Server Central’s Stairway to SQL Server indexes.

4.) Locking and Blocking.  In order to maintain consistency, SQL Server needs to lock objects while changes are being made.  This can be important for Informatica, with the understanding that sessions running in Bulk Mode, may lock not by page (8K) but by extents (64K) or even the entire table, which it some cases can even block itself if there are lookups or additional Select statements that are trying to reference this “locked” data.  In such cases, it’s helpful to seprate the read of the data first by using a temp table then performing the actual Insert.

 

ETL (Transformations and settings)

1.)    Understand and set appropriate commit points for the sessions.  Generally 500 rows works for CDC type mappings with smaller data movement while 1 million rows or more is better for Bulk loads.  For Bulk mode style Sessions consider and “all or nothing” approach by setting the commit level higher than the expected number of rows.

2.)    Remove any unused Tranformations and try to keep the data movement as sleek as possible.  Additional transformations are additional work for Informatica to consider and process.  Ensuring only the data manipulations needed are the ones included in the mapping will help with efficiency.

3.)    Beware of lookups.  Leverage lookups only when they are required.  Remember to lookup the smallest amount of data possible, not the entire table, especially if the table is large.  If a large table is truly needed for your lookup, consider leveraging a Joiner transformation instead or join directly in the SQL Override if the table is local to your source server.  Understand and test your performance with Lookup Caching enabled and disabled, it is important for the scalability for the mapping to have this setting set properly.  Enable Peformance Statistics to see if rows are in spilling to disk instead of staying in memory.

4.)    Work with your Informatica Administrators to determine a suitable Buffer Size, Block Size, and other Session and Workflow settings as these can affect the session run by orders of magnitude.  These settings help to determine the memory usage from the Informatica app server and in turn better utilize the resources on the database server.
In conclusion, there are several things to consider when looking for better performance in Informatica Sessions, having a list to to review is a helpful way to make sure nothing is overlooked.

What’s on your list?  What have I missed on mine?  I am happy to include your thoughts…

Best of Luck,
~Tim

Posted in Informatica | Tagged , , | Leave a comment