FieldShield v3
Applying Common Protections Across Tables


IRI's data-centric security software, FieldShield, is being updated to v3.1. This is a major new release that allows DBAs using FieldShield in the IRI Workbench to apply common masking, encryption, pseudonymization or other data privacy functions to similar columns with personally identifying information (PII) across multiple tables within one or more database schemas. This profiling functionality will be extended to CoSort ETL users in the IRI Workbench wishing to apply common data transformation rules to many sources at once.

FieldShield's data protection rule engine allows users to specify a regular expression identifying the columns in selected tables that will receive a common security function. For example, the same format-preserving encryption algorithm and encryption key can be applied to every social security number column in every table, even if the column names were slightly different. A single job definition for encryption and decryption for an entire database is not only convenient, but preserves referential integrity by keeping the intra-table data encrypted, and thus linked, in the same way.

In addition to rule-based protections, FieldShield v3 features new protection functions, such as:

  • FIPS-compliant OpenSSL and 3DES encryption
  • AES-128 (in addition to AES-256 with or without format preservation)
  • Random selection or data generation
  • 256-bit hashing
  • Custom data masks
  • Field shifting and more string manipulations

Other features, including row- and column-level filters, reversible and non-reversible pseudonymization, and near-value lookup file logic, further enhance FieldShield's ability to protect PII in static and changing dimensions. If you would like to test FieldShield v3, click here to send your request with the details needed. 



IRI Expands in Eurasia
New Partner Offices in Russia and Turkey

New resale and support partners will expand the reach of IRI's growing software product line:

Aksioma Group was founded in 2010 by a team of IT professionals aiming at the realization of complex IT projects for major companies in Russia. Today the Moscow-based company focuses on consulting, infrastructure solutions, and information security, making it a perfect fit for offering licensing and implementation support for IRI's big data processing and data-centric protection tools. Aksioma's first joint customer with IRI and Sabre, Russian Railways, uses CoSort to accelerate database loads and analyze traffic data to discover the most profitable, brand-friendly mix of scheduling, price and service. Using CoSort in applications like these will continue to add business value to products and projects developed in Russia that Aksioma and its partners can realize.

Asteroit Teknoloji provides companies in Turkey with short- and long-term consulting services for data warehouses, business intelligence, data and ICT services. Based in Istanbul, Asteroit has expertise in data modeling, DW design and development, CRM, Oracle DB management and performance tuning, and GSM technologies for telco and mobile carriers. FACT and CoSort will help Asteroit customers accelerate the extraction, transformation, and loading (ETL) of CDR and other big data in Oracle and beyond. IRI's David Friedland is visiting Asteroit this month to provide training on the IRI Workbench, currently supporting CoSort 9.5.2, FACT 2.5.1, FieldShield 3.1.0 (beta), and RowGen 3.0.1 (alpha).



NextForm Feature Advice for v3
What Are Your Data Migration Challenges?

For the last several years, IRI has offered a basic file-format interchange, data type conversion, and record layout reformatting tool called NextForm. NextForm is another commercial spin-off of the CoSort Sort Control Language (SortCL) program and will be part of the IRI Workbench (Eclipse GUI). From the information IRI has gleaned from thousands of free trial downloads, most NextForm users have been interested in high-volume LDIF-to-CSV and COBOL-text file conversion, EBCDIC-to-ASCII, and packed decimal-to-numeric data type translations. 

The next release of NextForm in 2013 will introduce the ability to switch between big and little endian designations at both the file and field level, convert between native CJK data types and Unicode, and migrate between databases and certain database versions. 
What IRI needs to know now from those with data migration challenges are the additional areas of structured data conversions that are of interest, and what specific database migrations are planned. 

Anyone who has evaluated NextForm in the past, whether or not they have licensed the product, will be entitled to a free copy of the next release in exchange for feedback before the end of 2012. Think about your current and future data migration requirements, file formats and database platforms, and what you would like to specifically have in a purpose-built tool for converting big data on Unix, Linux, and Windows. Please email 
nextform@iri.com with your specific requirements and suggestions for NextForm and get a free upgrade!


 

Tech Tip - Slowly Changing Dimensions
Reporting with Fuzzy Logic Features in SortCL

The purpose of reporting on Slowly Changing Dimensions (SCD) is to find a value that can satisfy an other-than-equal relationship to the search argument. CoSort's SortCL program accomplishes this with new fuzzy table look up logic. This is an expansion of the search of a set file wherein the input field provides the key for the search. 

Consider, for example, a transaction file with non-uniform intervals of increasing dates, each associated with a product price change. With or without a specific date entry, you can find the price of the product before, on, or after that date. Given the transaction file "days.in",

 
20120519
20120511
20120522
20120528
20120610

and the tab-delimited dimensions file "butter.set" providing the slowly changing dimensions of the price of butter on specific days,
 
20120514     2.95
20120521     3.05
20120528     2.90

this SortCL script finds the active price for each date in "days.in": 

/INFILE=days.in
   /FIELD=(Day,POSITION=1,SIZE=10)
/REPORT # do not sort
/OUTFILE=buttercost.out
   /FIELD=(Day,POSITION=1,SIZE=8)
   /FIELD=(Price, SET GE butter.set[Day] \
           default = "XX", POSITION=14, SIZE=4)

The target file, buttercost.out, shows the effective cost on each transaction date:

20120519      2.95
20120511      XX
20120522      3.05
20120528      2.90
20120610      2.90

Notice that when comparing the transaction (fact) table with the limited dimensional (price) information,
  • the price of butter on May 19 is from May 14
  • there was no value to lookup on May 11
  • the price of butter on May 28 is a match
  • the price of butter on June 10 is from June 28
By default, where there is no information, no supposition is made. However, SortCL will use the last piece of available information.
Slowly Changing Dimensions
 
Slowly changing dimensions, or SCD, is the way in which data warehouse users track changes in the values (facts) of a datum. “Slowly” implies time but not necessarily “slow” time; the concepts are the same if changes occur in seconds or centuries. The interval between changes need not be consistent. The search argument(s) must be unique, and the resultant value is discrete.

Wikipedia and other sites discuss known techniques for storing and accessing SCD data. Users can ignore changes; overwrite the existing fact; expand the stored record; or, create additional records (tuple-versioning) using surrogate keys. This is often a complex process in ETL tools or SQL.

IRI took a fresh approach to reporting on SCD in CoSort’s SortCL program by using high speed, fuzzy logic searching to find fact data in set files. SortCL users can query for discrete values based on changing information like date and time. Given an arbitrary search date, for example, SortCL will find and display the address that was in effect before, on, or after that date.

In the examples that the above-linked web sites use, a company changes its location from time to time. In the IRI model, the company record would carry a location table reference rather than a location fact. This means that there is only one fixed record per company; it is not expanded or duplicated when the location value changes.

By evaluating SCDs at the field level, different fields can change at different times and the returned value can be determined by more than one search argument. Thus, IRI's technology is basic at its core, but offers opportunities for simplicity, reduced storage, speed, and increased capability. Specifically, IRI’s file system approach to SCD has these advantages:

  • the ability to find any number of changed values
  • very fast look-up performance
  • searches on any increasing values
  • complex, multi-level search criteria
  • finding values on, before, or after a given date
  • simple job script maintenance and sharing
  • new values quickly applied and integrated
  • support for built-in comments
  • no need for database overhead, reorgs, etc.

By using SortCL for SCD reporting, you can also integrate sorting, expression evaluation, aggregation, formatting, encrypting, etc. -- all in the same job script and I/O pass.

IRI is developing a white paper called “Dimensions of Slowly Changing Dimensions” to address the issue in detail and provide examples of SortCL solutions, which are also available in the CoSort manual starting in Version 9.5.2. The most basic example is shown in the Tech Tip article below.

The CoSort manual also contains an example of performing SCD lookups on a four-column set file with three search parameters; i.e. there should be no limit to the power of this CoSort feature for business intelligence purposes. Contact support@iri.com for help implementing this capability in your environment.