Much of the underlying architecture is based on the Information Technology Infrastructure Library, a set of common practices used across areas such as change management, configuration management, and release management.
The deployment of autonomic-computing capabilities over the past year has let Carey Capaldi cut by 40% the time he spends manually digging through system-failure logs to understand why a problem happened. It also has let the product manager for the content-management system at Technicolor Creative Services create an automatic way to redeploy jobs that otherwise would be stalled for hours.
Technicolor Creative Services provides content-management capabilities to other business units within Technicolor, a major manufacturer and distributor of video tapes and DVDs, and for resale externally. Technicolor Creative Services--a subsidiary of Thomson, which provides technology and services to the entertainment and media industries--offers services like the management of media files, such as reels of film; encoding pay-for-view movies; and the creation of DVDs.
When Capaldi assigns jobs, a variety of events can trigger a failure and, historically, that has resulted in suspension of the job. In the majority of cases, once the failure is detected, the job can be restarted manually from the suspended queue and finished without further incident. However, many of the jobs had to run overnight, and if there was a disruption then, they could remain suspended until the problem was discovered the following day.
IBM contacted Capaldi and asked him to be a guinea pig in its autonomic-computing effort, using IBM's Autonomic Management Engine framework and Common Base Event, which monitors system resources, correlates information from various infrastructure components concurrently, and automatically determines the root causes of failures.
Using a log and trace analyzer tool, Capaldi can instantly gain access to custom logs that provide a detailed look at why failures happened. Taking such a look across his jobs to see specific points of failure saves time, he says. The real autonomic feature, however, is that the system can now resubmit the stalled job under specific criteria without Capaldi or his staff intervening.
Technicolor Creative Services traditionally has written a lot of in-house software to aid in its effort to archive and manage the large amounts of digital content it handles, Capaldi says. In the future, he plans to create specific log files in new software than can be optimized to work with IBM's autonomic tools.
"Right now, a lot of this has to be tailored to exactly how you work as a company, and it would be nice if it was more off-the-shelf," he says.
Capaldi is ready to move further down the autonomic path. "In a heartbeat," he says. "I think there's a ton of potential that hasn't been tapped yet. Over the years, I've worked with a lot of bleeding-edge technology that eventually just didn't go anywhere, but this is an industry where you need to push the envelope."
The president and chief executive of LAN Solutions Inc., Victor Kellan, agrees. The company, which provides network-management services, saw growing opportunities to provide remote monitoring to customers as a managed service. Through trial and error, it built a network operations center. But as LAN Solutions grew, it experienced difficulties in quickly scaling the center to handle growing amounts of data going through the system.
When a problem happened, depending on its type, location, and complexity, it could take experts from several different areas to parse through thousands of log entries from databases, applications, Web servers, operating systems, or other network devices to find the problem's starting point and then determine a course of action. Typically, problem resolution was a time-consuming task accomplished by several people, each familiar with a specific type of log file.