De Montfort University – Assembler Program Transformation
You may not realise it, but at some point in your day you may have interacted with a system that uses Assembler code – a low-level programming language specific to a particular computer architecture.
Assembler was widely adopted by large companies who needed to manage complex mainframes, from banks to government agencies. Of course, time passes and systems evolve, and many firms start to consider a change.
In the past, transforming reams of Assembler code into more efficient C code was a nightmare. Drawing on decades of research by Dr Martin Ward at Durham and De Montfort Universities, Software Migrations Ltd (SML) offered a less gruelling solution.
“There’s a lot of assembler out there”, says Dr Ward. “Every time you do a credit card transaction, it probably goes through an IBM mainframe running code in assembler. It’s doing a lot of important work, but the assembler programs cannot be moved to new computer architectures without recoding.
“However, it’s quite an esoteric language now. Students don’t get taught it in schools, and it’s more difficult to find people who can maintain your code.
“A lot of companies have these core algorithms that run their businesses, and nobody really knows how they work and what they’re actually doing.
“Say you’re selling holidays and you want to introduce new deals and discounts, that might involve updating your software. If you can’t do that, your whole company gets held back. If you can migrate that code to a high-level language, it’s understandable, maintainable and you can build all of your modern processes on top of it.”
However, migrating code from Assembler isn’t a picnic. A frequently-referenced quote from an Allen Eastwood paper in 1992 observes that software reengineering is “about as easy as reconstructing a pig from a sausage”.
“When you’re transforming it without maths behind you, it’s very easy to put bugs in the system that don’t show up until the end”, says Dr Ward. “Even if you’re as much as 99.99% correct, the chance of it still working at the end is only 30–40%. The only way to get to 100% is to have a mathematical proof of correctness.
“In addition, it’s a big project. For example, a company invested 40 man years of effort in re-writing 750,000 lines of Assembler code written by hand, and then had to give up without producing a useful result. We managed to migrate the system automatically in under 18 months with a team of four, and not all of them were working full-time.”
Dr Ward’s solution dates back to the 1980s, when he was working on a theory of “program transformation”. This area of research focused on how to transform a program while still proving that the end product was semantically equivalent to the original, even if it took different steps.
“I was at Durham at the time, and we had a visit from a man from IBM”, he says. “They had funding for blue-skies research, and we talked about taking the theory and applying it to Assembler. That turned out to be a lot harder than I thought as there were a lot of features to analyse.”
By the time he left Durham for De Montfort in 1999, he had developed an initial version of the FermaT transformation engine, which forms the basis of SML’s software and services. At that point, SML employed around six staff.
Dr Ward continued his research into the theory and application of “program slicing”, which breaks down a program into chunks to check that each transformed element performs the same function as the original. For example, a “backwards slice” investigates how a variable came to a particular value, while a “forwards slice” looks at what would happen if the program were changed.
In 2001, SML was bought by Mike Dowd, who played a key role in commercialising this research. Between 2004 and 2006, Dowd invested more than £5m to help SML’s research meet the requirements of industry, which included being able to successfully convert Assembler into Cobol, and later Java. The tools reached the marketplace in 2007.
SML has worked with a number of globally-recognised companies to address their “Assembler problem”, and is working with the world’s largest user of Assembler to transform more than 10 million lines of code.
However, it isn’t just the transformation that proves challenging.
“The big problem is often the politics that bubbles away in big organisations”, says Dr Ward. “Our biggest competitor is the temptation to do nothing, and to live with the existing system for another year until the CEO retires. In my view, that’s really short-term thinking.
“Nowadays, PCs and workstations are very cheap and powerful, but if a system is written in IBM Assembler code it only runs on an IBM mainframe and that can be expensive to maintain. If you can take some of your processing off a mainframe and onto a PC, you can save hundreds of thousands of pounds.”
“We’ve seen all kinds of horror stories involving Assembler. It can really hold you back when you’re implementing enhancements or adding new functionality to a system. We even heard about two companies who were about to sign on the dotted line for a merger, but found that their two Assembler systems were so incompatible that the whole deal fell through.”
“We’re looking at how you take a procedural program and turn it into an object-oriented program”, he says. “There’s a lot of interest in Java now, and you can generate procedural Java, but the holy grail is be able to produce object-oriented Java as well, with pieces of code and pieces of data that operate together.
“I’d never have imagined all this when I was doing my PhD. I remember the first time we put a piece of Assembler through. It was about 200 to 300 lines long. The UNIX workstation chuntered away, and three days later it fell over in a heap trying to translate one module. These days it takes around seven seconds on average per module. That’s down to the efficiency of the transformation process, and the quality of the machines running the software.
That’s what research is all about. You push the boundaries of knowledge, and sometimes you don’t know what’s going to have an impact and what isn’t.