IRI's FieldShield package is uniquely capable of protecting personally identifying information (PII) data at the field level across multiple databases and flat files with functions in ten distinct categories determined by business rules and specified data conditions:
Big Test Data - Automatically!
IRI's uniquely powerful test data synthesizer, RowGen, is being updated to v3.1. This is a major new release that allows DBAs and data architects to rapidly create and load the quantity and quality of test data necessary to populate a database or enterprise data warehouse (EDW) -- all while uniquely integrating the rich data transformation and formatting capabilities of CoSort's SortCL program that ensure the most intelligent, production-like data values and formats are created.
RowGen v3 will run from the IRI Workbench GUI built on Eclipse, on the command line, or from batch programs. RowGen Control Language (.rcl) files specify the generation of test data in database table, flat-file, and custom report formats. For test database creation, RowGen's 'DB Test Data' wizard in the IRI Workbench automates these steps:
Parse - by selecting the schema and tables you want to populate, RowGen v3 translates the DB table descriptions and integrity constraints into .rcl scripts that specify the source structure, dependent sets, and data creation, in the order necessary to populate the tables in the right format, and with all primary and foreign key relationships intact (regardless of complexity).
Generate - by building and running the .rcl scripts to create one test file per table that can be bulk loaded, and/or saved for future use.
Populate - by bulk loading the target tables in the right order with the pre-sorted test data ... test data that is not only structurally and referentially correct, but is safe for outsourcing (compliant with data privacy laws), and truly intelligent; i.e. realistic, business-rule-conforming, and robust enough to stress-test applications with the right data value ranges and row volumes.
Another RowGen v3 wizard creates ad hoc test data targets in custom flat-file, report, and table formats. RowGen job scripts are based on, and compatible with, the CoSort SortCL program, and as such, can leverage many of the same data manipulation and formatting functions for customizing the look and feel of the test targets. Support is also provided for all and valid (joined) pairs, weighted distributions, and the inclusion of randomized real data to enhance the appearance of test data without compromising security.
IRI Software Reach Growing in India, Indonesia
Serving Domestic Users and MNC BPOs
Though based in the United States where most of its customers are, IRI is strategically focused on growing its business abroad through partners who serve big data users concerned about price-performance, functional versatility, and ease-of-use. IRI is proud to announce and welcome these new distributors:
Unisoft Infotech Pvt Ltd. is a Bangalore-based solutions provider best known for its SAP expertise and clientele around the world. Unisoft is moving IRI software to users in India's sizable domestic markets for big data processing and data-centric protection solutions, by leveraging its offices throughout the country and its partnerships with business process outsources (BPOs) developing, implementing, and supporting applications for use in India and multinational corporations (MNCs). Satellite offices in Chennai and Delhi can also provide licensing and support for CoSort, FACT, FieldShield and RowGen.
PT Cybertrend Intrabuana and PT Sinar Surya Teknologi (SST) are two more authorized IRI resellers in Jakarta that have a base of customers and prospects in telco, government, and financial services institutions. Cybertrend has particular expertise in open source BI technologies like Pentaho, where IRI tools like CoSort add value as a data franchising (preparation) engine. SST has a more Oracle-centric focus, indicating a need for Fast Extract (FACT) for Oracle and CoSort in ETL situations. SST also has a competency in DB security and DLP where FieldShield and RowGen are good fits.
These and other partnerships are forming in South and Southeast Asia, as well as other parts of the world. Watch this space for ongoing appointments and contact email@example.com if you need to work with, or are interested in becoming, an IRI software representative.
Using FieldShield in DB Apps
... and Protecting Every Row
Database applications that update and query tables may need to secure data going into, or being retrieved from, tables. The data must be protected on the way into the table, undergo protection on the way out, or remain protected in the database. In each case, the goal is to prevent unauthorized access to sensitive information.
The availability of resources and how they are allocated to CoSort jobs can have a profound impact on the efficiency and throughput of big data transformations like sorting, joining, and aggregation.
This blog article,
lists options for using FieldShield in your programs.
Consider improving the performance of your applications by protecting even less data:
Contact firstname.lastname@example.org if you are interested in incorporating one or more FieldShield protections into your application environment.
Tech Tip - CoSort Speed Tuning
Modify System and 'cosortrc' Parameters
First, consider the number of system threads you want to use for parallelism. The thread_max value in the cosortrc file determines the top number of computational threads you can assign to each job. Even if your system can apportion more threads than physical CPU cores aboard, IRI licensing recommends you only pay to use the latter limit so you can tie each thread's work to a physical resource.
Next is memory. Increasing MEMORY_MAX will improve performance when the data being processed can fit in RAM. When processing large data sets, temporary work files are used and the performance improvement from a large MEMORY_MAX setting is less significant.
Proper HDD (work_area) specification is important because jobs too large to fit in memory will use temporary files on those drives. During transformation, the overflow data is written to the work areas you specify, then read from there and merged onto the target drive. Because of this flow, the source and target drives are never accessed at the same time, so it is fine if these are on the same physical drive. But because the work areas are accessed at the same time as the source and target drives, they should be on separate physical drives (and I/O controllers) from the source and target drives -- and even from each other if you use multiple threads and have other available drives (since each thread can write to a separate temp drive).
Most of the time used by a large sort job is for file I/O. Even in the most complex SortCL scripts, the majority of processing time is spent reading and writing data from and to the disks. Thus, anything you can do to increase memory and accelerate I/O on your existing systems will speed up big data transformation throughput. One of the biggest performance improvements can come from a dedicated, high-speed I/O channel for overflow; two 6GB/second SATA drives in a striped configuration (RAID 0), on a dedicated controller, works well. If you must use the same drive for source, work and target files, however, increasing the BLOCKSIZE parameter can help.
Resourcing those transformations to in-memory databases or appliances is not usually a necessary, much less cost-effective, alternative; and, staging data in the DB or BI layer a la ELT, is not an efficient one.
Finally, note that there are a number of other, more esoteric memory and behavior-related parameters in the cosortrc file you can adjust, and that you can make these settings apply at a global, user, and job level as warranted. For more information, see Section D of the Appendix chapter in your CoSort manual, and contact email@example.com with any questions.