CoSort 9.5 is Here!
New Release Supports Unicode Data and ODBC in Eclipse IDE
 
After more than two years of development since the release of CoSort® 9.1.3, IRI is now shipping CoSort 9.5.1 on Unix, Linux and Windows.

CoSort uses a fourth-generation data definition and manipulation language called SortCL to transform, convert, protect, and report on "big data." SortCL is used for:

  • legacy sort and data migrations
  • data warehouse staging (ETL) operations
  • data franchising (preparation) for BI tools
  • database reorg and load pre-sorting
  • detail, summary and delta (CDC) reporting
  • field-level encryption and data masking
In the new release, SortCL users can directly address relational table data along with very large sequential (flat) files, as well as a wide range of multi-byte characters in both native and Unicode formats. The new version also features an integrated development environment (IDE) built on Eclipse for SortCL job design and execution. This "CoSort 9.5 Workbench" is a free option and replaces the former 'gui2scl' and 'guiagent' applications.

To summarize, the new features in CoSort 9.5.1 are:

  • ODBC source and target data processing. CoSort users can source and target data in Oracle, DB2, SQL Server, SAP, MySQL, Sybase, etc.
  • Selection, transformation, and test data generation for native multi-byte forms and Unicode UTF-16
  • Expanded support for numeric data types and field formatting attributes like implied decimal
  • Graphical job wizards, form editors, dialogs and syntax-aware script editing for SortCL job design
  • Graphical tools to discover, import and populate file and table metadata for use in SortCL jobs
  • Graphical, automated and bulk metadata conversion of legacy sort parameters like SyncSort
  • Data lineage and team-based version control for SortCL job scripts
If you would like to try CoSort 9.5 and its Eclipse Workbench, call 1-800-333-SORT, email info@iri.com, or visit www.iri.com/products/freetrial.


CoSort 9.5 Chiphopper Certification
IBM Linux Systems Compatibility Assured

IRI has partnered with IBM® to validate CoSort 9.5.1 for Linux® x86 on IBM Systems and middleware platforms. This extensive porting/rehosting and testing verification process ensures that CoSort continues to employ industry coding and reliability standards. Compatibility with Linux means CoSort 9.5 is highly portable and easy to support across multiple platforms.

 
Product Support Expansion
IRI Announces New Agents in Central Europe

IRI is pleased to welcome two new resale and support offices for CoSort, FACT, Fieldshield and RowGen. These authorized IRI distributors can provide licensing, value-added implementation, customization, training, and support services from Slovenia and the Czech Republic.

Result
Bravničarjeva 11
1000 Ljubljana
Slovenia
Tel. +386 1 542 17 80 
www.result.si
 
Result was established in 1989 and is located in Ljubljana, Slovenia.  Result provides legacy migration services and specializes in IT consulting, design, development, and implementation of complex information and business intelligence (BI) systems in the Balkans.

Per4Mance
Fišova 3
602 00 Brno
Czech Republic
Phone:  +420 545 215 400
www.per4mance.cz

Founded in 1995, Per4mance is a systems integrator offering comprehensive solutions for information systems including: project management, quality control, and implementation of communication components in the Czech and Slovak republics. Like IRI, Per4Mance is focused on solutions for open systems (UNIX and Windows servers) and large Oracle database sites.  

 

Tech Tip -  Migrating Mainframe Data
Translating a COBOL Record Sequential File

In CoSort 9.5.1, the Sort Control Language (SortCL) program offers more features than ever before for legacy dataset conversion. Consider the example of a COBOL variable sequential (VS) file with EBCDIC sales territory data and packed decimal sales amounts that must be converted to ASCII text for use on x86 Windows or Linux. The hexadecimal representation of the source file is:

1  2  3  4  5  6  7  8 
 
00 06 d4 e3 00 45 00 1d
00 06 d5 e6 00 25 08 2c
00 06 d5 c5 00 21 11 8c
00 06 d4 e6 00 30 20 2c
00 06 d5 c5 01 43 29 2c

 
Note that:
  1. Typical of a VS file, the first 2 column bytes are a "SHORT" integer specifing the length of the record that immediately follows; 00 06 indicates means each record is 6 bytes long. On mainframes this number is stored in 'Big Endian' format with the most siginificant bytes read left to right. But on a 'Little Endian' x86 machine, these columns are stored in reverse order, wrongly indicating a record length of 1536 bytes!  
  2. Columns 3 and 4 are the hexadecimal representation of a 2-byte EBCDIC character field that gives the sales territories.
  3. Columns 5-8 are the Packed Decimal sales amounts for the territory. When converting to ASCII numeric, the size of the target field must be at least double the size of the source field. Because an implied decimal is not carried in the source data, steps must be taken to place and size the decimal in the target field.
The SortCL job script below addresses these data and platform migration issues. The /PROCESS statement's ENDIAN=BIG parameter accounts for the endianness of the SHORT. Columns start at position 1 and need not account for the SHORT. IMPLIED_DECIMAL=2 tells SortCL that the last 2 decimal digits of the amount field should go to the right of the decimal in the output.

/INFILE=amount_with_territory_vs.dat
  /PROCESS=VS,ENDIAN=BIG
  /FIELD=(territory,POSITION=1,SIZE=2,EBCDIC)
  /FIELD=(amount,POSITION=3,SIZE=4, \
          IMPLIED_DECIMAL=2,MF_CMP3)
/REPORT    # do not sort
/OUTFILE=amount_with_territory.dat
  /FIELD=(territory,POSITION=1,SIZE=2,ASCII)
  /FIELD=(amount,POSITION=4,SIZE=9, \
          PRECISION=2,NUMERIC)


Here is the translated data.

MT   -450.01
NW    250.82
NE    211.18
MW    302.02
NE   1432.92

What is Unicode?
 
First Unicode Logo in June 1991
1st Unicode Logo (June 1991)

 
As the exchange of information and data became more prevalent electronically and internationally, Unicode began as a project in 1987 between Apple and Xerox engineers in response to a need for an international standard of representation for every character in all major languages of the world, to be read on any platform running any program. Prior to the development of Unicode, the primary ASCII coding scheme which used an 8-bit character representation only allowed for 256 characters.

These early Unicode pioneers discovered that there were about 27,000 characters in the modern world and this resulted in a 16-bit fixed length character code which allowed for 65,000 characters, enough even for future expansion. Joe Becker, one of the Xerox engineers, coined the term “Unicode” from their requirements for a “universal, “uniform”, and “unique” bit sequence to represent characters.

The initial success of Unicode naturally relied on its adoption by other companies. Early in its development, major computer manufacturers, networking and software companies began making significant contributions to the design. In addition to Xerox and Apple, participating companies included Metaphor, Claris, Research Libraries Group, Sun, Microsoft, SHARE, IBM, Pacific Rim, Aldus, NeXT, and Novell.

By 1991, Unicode, Inc. was incorporated with the original purpose to “standardize, extend, and promote the Unicode character encoding…” The original release date of Unicode was in October of that year. Version 1.0.0 contained codes for 7,161 characters. The most recent version, 6.0.0, was released in October 2010 and provided codes for 109,449 characters from the world’s alphabets, ideograph sets, and symbol collections.


IRI first began developing support for Unicode data in CoSort Version 7.5, but as of 9.5 effected a major re-design to support these (since updated) characters.

CoSort's Sort Control Language (SortCL) program supports Unicode files and fields which may be mapped to database tables. SortCL can collate (sort), merge, join or convert Unicode characters and numerals in delimited or fixed-position fields.

Conversion between Unicode and single-byte (e.g. ASCII) or native multi-byte characters (e.g. Chinese GBK/Big5, Japanese, and Korean) is supported, as well as conversion between a variety of numeric data formats and Unicode digits.

 
 
IRI Joins DAMA
 

 
IRI recently became a corporate member in the Florida chapter of DAMA, a non-profit, vendor-neutral association of technical and business professionals with chapters throughout the USA and around the world.

DAMA has become a key resource for IT practitioners, and influences data-related practiceseducation, and certifications. DAMA has published two books: DAMA Dictionary of Data Management and the DAMA Guide to the Data Management Body of Knowledge (DAMA-DMBOK).