FACT Extracts from IRI Workbench
Eclipse Plug-In Unloads & Loads 'Big Data'

Users of IRI's FAst extraCT (FACT) tool can now design and launch large table extracts from the IRI Workbench, an Eclipse Plug-In that already supports CoSort® SortCL transformation and FieldShield® data masking functions.

The new job wizard, syntax-aware configuration file editor, and graphical execution options for FACT in the workbench provide all the front-end support necessary for rapidly off-loading table data into portable flat files. In addition to creating and launching the FACT configuration (.ini) file, the Workbench now supports target table creation and loader control file specification for faster loading.


Fast extracts from, and pre-CoSorted loads to, very large database (VLDB) tables play key performance roles in: high-volume data warehouse and operational data store (ODS) acquisition, off-line reorgs, database migration and replication, archive and retrieval, data franchising (data preparation for BI tools), ad hoc reporting, and search-based applications (SBAs). FACT uses proprietary connection protocols, multiple threads, and standard SQL select syntax to extract data from Oracle, DB2, Sybase, SQL Server, and Altibase tables on Unix, Linux and Windows.

To support subsequent (batch) or concurrent (piped) transformation and load operations, FACT automatically creates the extract file metadata in both CoSort/SortCL Data Definition File (DDF) and the database's load utility metadata formats. Using FACT in the IRI Workbench lets you utilize that functionality in a broader visual and operational context, where you can:
  • see and work with data in source and target tables via the Data Source Explorer 
  • use the Data Definition File (DDF) metadata FACT creates in CoSort SortCL and FieldShield jobs
  • run CoSort SortCL data transformations and reports, plus direct path loads, in-line with FACT (batch/piped ETL)
  • feed FACT, CoSort or FieldShield output to other Eclipse plug-ins like BIRT for advanced reporting
  • work on unload, load, reorg, and ETL projects in teams with version control
Additional details on, and screen images from, FACT within the IRI Workbench can be found here. If you have any questions about FACT, or would like to arrange a webinar or obtain an on-site evaluation copy, please email fact@iri.com.


RowGen Development Update
IRI's Test Data Tool is Being Upgraded

Along with FieldShield, IRI's data masking and encryption solution, RowGen can be part of data loss prevention and privacy law compliance initiatives by replacing the need for  production data for testing, outsourcing, and application development. Using realistic, referentially correct test data is also a safe way to protoptye ETL and database operations, and to benchmark new hardware and software platforms.

The current RowGen release, 2.11, uses the same syntax as the CoSort SortCL program to create big data in custom formats suitable for testing. On the Windows side, the product ships with a data model-parsing interface created by RapidACE LLC.

IRI is now in the process of re-developing its data model parsers using both newer technology from RapidACE and the Eclipse Data Tools Plug-In (DTP) expressed through the Data Source Explorer window in the IRI Workbench. In addition to more ergonomic RowGen job script creation in the GUI, IRI Workbench users will be able to send pre-sorted test data directly to ODBC-connected tables and bulk database load utilities.

IRI intends to add the updated RowGen functionality in the IRI Workbench next quarter. Meanwhile, if you have specific feature/function requests for the next RowGen release, please email rowgen@iri.com.



CoSort Expands in Hospitality Sector
Mereo in France Leverages IRI's Big Data Engine


Mereo is a leading provider in the field of revenue optimization and business intelligence (BI) for Hospitality, Entertainment and Travel sector clients.

Since 2000, Mereo has supported these efforts -- from profit assessment to process improvement, tools integration, implementation, maintenance and support, as well as staff training in. Based in Paris, the Company has assembled an expert consulting and engineering team experienced in yield and revenue management solution implementations in the travel, leisure and media industries.

Mereo has worked with IRI and CoSort Solutions France to integrate CoSort into their core applications. With high volumes of client sales data to be analyzed, Mereo uses CoSort  to manipulate and manage their data sets, and to calculate different sets of key performance indicators (KPI) from that data.

CoSort was selected because its SortCL program suited Mereo's data transformation requirements perfectly, and ran across the various Unix, Linux and WIndows platforms Mereo customers are using. Mereo's integration of SortCL in ETL-related operations is typical because data warehouse and BI architects can easily leverage the sort, join, aggregation, cross-calculation, and reformatting functions through simple 4GL job scripts.

For more information on the data transformation functions that SortCL can perform in a single pass for high volume data staging environments, click here.

SortCL can perform similar bulk data preparation for BI tools (also known as data franchising); SortCL creates CSV and XML file targets, as well as ODBC row inserts, as hand-offs to those tools. For more information on the business intelligence functions SortCL can perform natively as a report generator, or as a data franchising tool, click here.



Tech Tip -  Field Predicate Feature
A New Short-Cut for SortCL Script Writers

For those CoSort users moving to version 9.5.1 and still using text editors to create and modify job scripts (rather than the IRI Workbench GUI that automates script creation), SortCL's new /FIELD_PREDICATE statement can help reduce the size and complexity of job of scripts -- and thus the time needed to manually create and edit them -- by making field definitions as simple as a field name.

The predicate allows you to specify one or more repeating attributes of input or output fields only once at the beginning of the /INFILE or /OUTFILE section of a job script, rather than having to specify those attributes in every /FIELD statement. Once specified, SortCL will use the same attributes in every field statement that follows the predicate statement until the predicate attribute is either manually overridden in a specific field, or replaced by a subsequent predicate statement.

In addition to storing one or more repeating field attributes, the predicate will also automatically calculate byte offsets for fixed-position fields and augment the ordinal positions of delimited fields. This makes it easier to add or remove fields because their positions will be re-calculated automatically, and eliminates the possibility of specifying a wrong position number.

The more field attributes that can be specified in the predicate statement, the smaller each subsequent field statement needs to appear.

By way of example,
consider this input file, called 'addresses':

Dick Jones,1234 Maple St.,Philadelphia,Pennsylvania
Sam Henderson,1400 Highway A1A,Satellite Beach,Florida
Harry James,50 Elm Ave.,Boston,Massachusetts
Sarah Smith,300 Thornton Rd.,Frankfurt,Kentucky


This SortCL job script with predicate statements on input and output:

/NFILE=addresses
     /FIELD_PREDICATE=(SEP=',')
     # delimited field, starting at position 1
     /FIELD=(Name)
     /FIELD=(Address)
     /FIELD=(City)
     /FIELD=(State)
/REPORT
/OUTFILE=fixed-output.txt
     /LENGTH=67
     /FIELD_PREDICATE=(SIZE=15,FILL='^')
    # ASCII output assumed, uniform field width
     /FIELD=(Name,POS=1)
     /FIELD=(Address,POS=17)
     /FIELD_PREDICATE=(SIZE=17,FILL='x')
     # new SIZEs and FILL characters follow
     /FIELD=(City,POS=33)
     /FIELD=(State,POS=51)


produces the output file 'fixed-output.txt'

Dick Jones^^^^^ 1234 Maple St.^ Philadelphiaxxxxx Pennsylvaniaxxxxx
Sam Henderson^^ 1400 Highway A1 Satellite Beachxx Floridaxxxxxxxxxx
Harry James^^^^ 50 Elm Ave.^^^^ Bostonxxxxxxxxxxx Massachusettsxxxx
Sarah Smith^^^^ 300 Thornton Rd Frankfurtxxxxxxxx Kentuckyxxxxxxxxx

The first two output fields have a fixed size of 15 and pad out with the ^ character, while the next two fields are both 17 bytes and pad with an 'x'.

If you have specific questions or feedback on this SortCL feature, or are interested in testing the latest release of CoSort, please email cosort@iri.com.


Season's Greetings!

All of us in Melbourne, Florida, extend our warmest wishes to IRI customer and resale partners around the world for a safe holiday and prosperous year ahead.

Database Reorgs
Why They Matter, and the difference between On-Line and Off-Line Reorgs
 
Over time, data in large RDBMS tables eventually become fragmented. The size of tables and indexes increase as records become distributed over more data pages. More page reads during query execution slow query responses. To reclaim the wasted space, improve database uptime and speed data access (query responses), consider a strategy for reorganizing your database objects. There are two types of reorgs for these table, index and tablespace objects: on-line (in place) and off-line (classic).

On-line reorgs work incrementally by moving rows within the existing table to re-establish clustering, reclaim free space, and eliminate overflow rows. Objects are unavailable only for a short time near the end, not during the reload and rebuild phases, which can be protracted for large objects. They allow applications to connect to the database, but often slow their performance, and can create lock waits at that time.

Off-line reorgs are faster, but can take the database off-line (if the database reorg utility is used). With this method, data is exported from the database into a dump file (unload). The database objects set back up based on the extract, typically re-ordered (sort). They are then returned to the same tablespace (load), where indexes are restored implicitly (rebuild).

I
RI customers use FACT for the unload, which creates a portable flat file that can be CoSorted on the primary index key of the reorganized table. With this approach, other transformation and reporting operations can occur, and the database remains on-line. Pre-sorted, direct path loads also bypass the sort (overhead) of the database loader.

Holding a "shadow" copy of the data in the file system for each table should not be unduly onerous because once the flat file is sorted and re-loaded, it can be deleted. At the same time, having the reorg data externalized and available to CoSort also allows the possibility for other uses of the data, including archival, reporting, protection, and migration to other database, BI tool, and application targets. The caveat of course is that during the unload, other system users can read and may update the table space, so any updates during this time could miss the re-load and create inconsistencies in the target. It is therefore recommended that off-line reorgs be performed when updates are not occurring.

 
IRI Professional Services Open
Custom Data Manipulation and Protection Solutions Provided

In response to customer requests for bespoke applications across a broad range of data processing requirements, IRI now has an in-house business unit for the implementation of custom data transformation, ad hoc reporting, and data security solutions.

Leveraging CoSort's SortCL functionality and the ability to integrate specialized input and output procedures, complex field-level transformations, and third-party APIs, IRI engineers can develop, test, and deliver packaged software and solutions for:
  • data warehouse (ETL) acceleration
  • legacy sort and data migration
  • data encryption and masking
  • test data generation
  • VLDB load / unload
  • data validation and comparison
  • changed data capture
  • data franchising (staging for BI)
  • SQL procedure, shell and 3GL program replacement
IRI developers can also work with system integrators and value-added resellers to build new, vertical market offerings, like a Hadoop-based telco data warehouse to process large CDR volumes across a grid, a HIPPA-compliant medical database marketing product, and PCI-compliant credit card and core banking operations. Click here for more information.