CoSort Update Coming Soon
CoSort Version 9.5.2
IRI's flagship product for manipulating big data - CoSort - is being updated to v9.5.2. This is a minor new release that adds new features to CoSort's SortCL program since the release of v9.5.1 last year, including:
Fuzzy Logic Look-Up. An expansion of the set file search algorithm to find the value of an entry whose key satisfies an other-than-equal relationship to the search argument. Prior releases would find the value only when the search argument was equal to the key.
Slowly Changing Dimensions. An application of the fuzzy logic feature that finds the value for a search argument that is closest to a table (set file) entry. As an example, consider rows with non-uniform, ordered dates, and gas prices that have changed over time. With or without an entry for 2012-06-28, the SCD search can find the price of gas before, on, or after 2012-06-28.
Literal Values. Constants can now be named and specified in /FIELD statements for derived and other field functions, conditions, cross-calculations, and reports. Manipulate and display static character and numeric strings along with input data; for example, apply a fixed rate to principal values in a source table.
String Analysis. Like the SQL 'instr' function in Oracle, the new SortCL function looks for characters in a field and displays the offset number for the specified occurrence of the string in that field.
Thai Characters. SortCL can now collate (sort) and generate test data in Thai Unicode and single-byte Thai formats, and convert between these sets.
Random Field Data. SortCL can create random data in fields per the data type and min/max size range specified. It can also randomly select data from an existing list of real SET file values. Both approaches allow users to add random data to existing input sources as new column values.
More Encryption. FIPS-compliant OpenSSL, 3DES, and AES-128 libraries are now built-in, adding more protections for fields containing PII.
Other Protections. In addition to new randomization and encryption functions, the new CoSort (and next FieldShield) release will feature 256-bit hashing, custom data masking, and field shifting.
If you would like to test CoSort v9.5.2, click here to send your request with the details needed.
IRI Expands in S.E. Asia
New Offices in Thailand and Indonesia
New resale and support partners will expand the reach of IRI's growing software product line in Bangkok and Jakarta:
Bara Advanced InfoTech Co., Ltd., (BARA), part of Thai conglomerate "Bara Windsor & Company", was established in 1994 to support hardware and applications software for rapidly growing companies. BARA is a top-ranked supplier for HP (COMPAQ PC) and IBM eServer/pSeries (RS/6000) systems in Thailand and has a strong Oracle practice. BARA personnel recently received initial training on CoSort, Fast Extract (FACT), and FieldShield in the IRI Workbench environment, and can assist in proofs of concept, evaluation and licensing, and customer technical support. If you are interested in IRI Software in Thailand, click here to reach BARA.
PT. Emerio Indonesia, a subsidiary of Emerio Corp. in Singapore (an NTT Communications Company), creates and optimizes data warehouse ETL and BI solutions for companies in the financial services and telco industries. Emerio is becoming versed in the language of SortCL to provide faster, and more affordable solutions for big data staging, capturing and reporting on changed and slowly changing data, and optimizing database performance without the need for costly hardware. Click here for Emerio's contact details in Indonesia and afield.
Hadoop Development Underway
CoSort to Run on Private Grids
Long before Hadoop became synonymous with big data, CoSort was. As one of the original 'big data' companies, IRI has been keen to distribute CoSort operations across secure, mission-critical customer grids that can handle petabytes of data. Over the years, IRI looked at Infiniband, Oracle Parallel Server, Platform Computing, Beowolf Clusters, and others. However, it was not until Hadoop emerged as a widely adoptable open source standard that IRI began to make related R&D investments.
Tech Tip - Pivot Transform
The object of IRI's efforts in Hadoop is to implement SortCL executions in MapReduce code, breaking up huge data transformation jobs across available nodes and merging the results onto the specified target platform. CoSort is already running in a major telco grid in Asia using Hadoop, but a general-purpose release awaits additional development in both MapReduce and CoSort code, along with feedback from IRI's big data users interested in this development.
Therefore, if you have specific requirements for, or interest in, exploiting CoSort in the Hadoop environment, email firstname.lastname@example.org.
Transpose Rows to Columns in SortCL
One of the many, more advanced ways in which CoSort performs SQL and DW functions on sequential data is shown in pivoting transformations, where rows can be rotated into columns and vice versa. The following example might be used to consolidate numeric sales data.
The input file below, transactions.dat, contains three-column rows: a store code, department code, and transaction amount for that department. The object of this transformation is to create one record for each store containing a column for each store, and then total transaction amounts for each department.
In the CoSort Sort Control Language (SortCL) script below:
The INPUT section defines the layouts of the source file, transactions.dat
The INREC section reformats the input records so that the transaction amount is in the correct column for the department, and puts a zero where there is no transaction amount for a given department
The SORT order on StoreCode allows us to group the transaction amounts by store code
The OUTPUT section summarizes the transaction amounts in each department, grouped by store code, and defines the target layout.
# Change the input record format so that TransAmount
# goes into the correct column for the DeptCode, and
# place zeros in the other columns
IF DeptCode == "1" THEN TransAmount ELSE 0)
IF DeptCode == "2" THEN TransAmount ELSE 0)
IF DeptCode == "3" THEN TransAmount ELSE 0)
# Collapse the columns with SUM statements to get a
# Unique StoreCode and TransAmount for each DeptCode
/SUM DC1_sum FROM DeptCode1 BREAK StoreCode
/SUM DC2_sum FROM DeptCode2 BREAK StoreCode
/SUM DC3_sum FROM DeptCode3 BREAK StoreCode
The target, StoreSums.out, contains:
The original input format was flattened to rows based on the StoreCode field, which was also used as the 'unary change' break key when summing. For those familiar with the Microsoft SSIS platform, DeptCode is the Pivot Key (top row value), StoreCode is the Set Key (left column value), and TransAmount is the Pivot Value (table values).