Harnessing a global computer grid Harnessing a global computer grid
Tuesday 09 October, 2001

Rajkumar Buyya wants to bring supercomputers to the masses. The 30-year-old computer science PhD student is a world authority on the democratised supercomputer, the "cluster". From his windowless office at the Caulfield campus of Monash University, Buyya spearheads an international coalition of computer scientists who want to make access to processing cycles as easy as turning on a power point.

Supercomputers have fascinated people since Seymour Cray created the Cray 1 in 1976. They are synonymous with weather forecasting and simulation modelling, which is used, for example, in nuclear weapon testing and code breaking.

The expensive machines, dubbed "Big Iron", would crunch any highly complex problem. But the falling price of ever more powerful semiconductor processors ("Moore's Law"), faster networking technologies and adaptable software, such as the free operating system GNU/Linux, is putting such power in the hands of business, researchers and governments everywhere.

In his book, High Performance Cluster Computing, Buyya defines a cluster as "a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource".

The cluster has found a place as the poor man's supercomputer, often running free software on relatively inexpensive hardware. They are one of a complementary collection of technologies knitting together a global computational fabric.

That includes specific-purpose distributed computing systems, sometimes called peer-to-peer ("P2P"), such as the famous Seti@Home and a less well-known application of the human genome project, the Folding@Home project. High-speed networks, or "grids", connecting diverse computational, storage and remote imaging resources, such as radio telescopes, are planned.

Even traditional supercomputers and massive parallel processing computers can be plugged into this emerging global grid.

The challenge is to harness numerous resources including processing, storage, memory and sensory devices and provide them to the user as a single, contiguous computer. This virtual supercomputer must be secure, flexible and easy to operate by the casual user.

One goal is to break the computing equivalent of the sound barrier a petaflop or a thousand trillion floating point operations a second by the end of the decade. The world's fastest computer, ASCI White, is a cluster built by IBM for the US Department of Energy. It is capable of 12.3 teraflops: about 1 per cent of the desired speed.

In the past 10 years, the Top500 Supercomputer list has reflected a growing trend in these distributed processing architectures.

In an editorial on the recently inaugurated Top500 Clusters website, Dr Thomas Sterling, of the California Institute of Technology and the NASA Jet Propulsion Laboratory, writes that clusters will be the dominant architecture within three years. Already 146 of the Top500 supercomputers are clusters, or clusters-of-clusters called "constellations".

"From the largest (US Department of Energy) ASCI cluster ... to the widely prevalent, albeit more modest, PC-based Beowulf-class clusters ... the impact of commodity clusters is enormous," Sterling writes.

Buyya is working on an automated broker that sits between the demand and supply sides of the equation. Tell the system what you want computed and when, and it tells you the price based on the resources available.

"(Users) need something where they can define deadlines and then how much they want to spend," Buyya says.

"People don't have to install software on their machines, they can access on demand and they don't have to buy it. They can just rent ... software and resources on demand and I expect this to be a future industry. I expect this industry to take off in another four or five years." Some software makers are already considering grid formation when designing their applications, Buyya says. But it will be at least three years before a menu option to offload processing to a computation resource provider is included in off-the-shelf packages, he says.

All forms of distributed computing have problems. One is security; granting the correct level of access to a person or network requesting resources. If the provider has a network of desktop PCs, that means granting only spare capacity so there's no disruption to users seated at those desktops.

Another problem is slow connections between computers on the grid. Understandably, makers of traditional Big Iron systems are not fans of the virtual supercomputers.

"Good quality high-performance grid computing is the next best thing to having your own resources," says Cray Australia national manager John Henderson. "I don't think it will ever be as good an approach as having your own resources." Henderson's view is backed by Professor Greg Egan, director of the Centre for Telecommunications and Information Engineering at Monash University. Cray is lending Monash's Clayton campus an SV1, 16 processor supercomputer, capable of 20 gigaflops, for the next two years.

"I'm not a fan of grid computing," Egan says. "At the end of the day, the critical question is one of latency when you're talking about network grids.

"As far as researchers are concerned, the important thing is the time between when you push the carriage return key and when you get a picture back on your screen that tells you what's happening physically, that's what matters. And if you start spreading stuff across a network and bring all this stuff together, latency is going to increase."

A commercial implementation of grid computing on millions of desktops, nevertheless, may be less than a year away.

"(Microsoft's next generation architecture) .NET is grid," says Microsoft researcher Gordon Bell, who has worked on the grid problem with colleague Jim Gray. "Let's face it, it's a commercial version of grid, so to me, the grid is quite interesting because it's the ultimate in distributed computing."

Bell, a US native who studied as a Fulbright Scholar at the University of NSW in 1957-'58, did some of his early parallel computing work in this country. Bell says traditional supercomputers, such as the Cray SV1, will continue for the foreseeable future because they are better suited for certain applications. They work much more efficiently on problems where the result of one operation is closely related to the input of the next. These highly complex problems suffer on distributed systems from latency.

The US and UK Governments are pressing ahead with their own distributed supercomputing facilities. Last August, the US National Science Foundation awarded $US53 million (about $A106 million) to four labs to build the Distributed Terascale Facility (DTF), a "teragrid" capable of 13.6 teraflops when networked next year over a high-speed optical fibre connection. The clusters will run GNU/Linux on IBM servers driven by several thousand Intel Itanium processors. IBM is also involved in the UK National Grid, a similar project also using GNU/Linux that links nine centres.

Head of the computer science department at the Royal Melbourne Institute of Technology, Bill Appelbe was in the US last week talking about cluster and grid computing. As the chief executive officer of the Victorian Partnership for Advanced Computing (VPAC), Appelbe is developing a commercial web for the spread of high-performance computing in the state. He says the VPAC consortium, which has 170 customers including six state university backers, is growing and will form the basis of a national computational grid.

"At the moment we're mostly acting as a service centre, meaning people dial into our machine from any of the six universities. They have free access to it and they download the resource to their machine," Appelbe says. "They run the program on our machine, that generates data that is downloaded to their machine where it can be visualised."

Most of the applications running on VPAC's Unix Tru64, 128-processor Compaq Alpha Cluster are scientific, engineering or business-related.

"There are three parts to that equation: how do you connect these computers together securely ... how do you provide services on top of this and how do you provide enough bandwidth to make it work," Appelbe says. "The holy grail of the grid is to make computers like the telephone system; you can dial into a phone anywhere in the world and talk as fast as you like."



This story was found at: http://it.mycareer.com.au/news/2001/10/09/FFXMLM06JSC.html