Bill Gates, Microsoft's chairman and chief software architect, spoke with InformationWeek about the companywide effort to improve the quality and security of Microsoft's products.
Bill Gates, Microsoft's chairman and chief software architect, was interviewed May 10 by InformationWeek's editor/print, John Foley, and executive editor of features, Chris Murphy. Following is an excerpt in which Gates discusses Microsoft's companywide effort, prompted by his January memo on "trustworthy computing," to improve the quality and security of Microsoft's products.
INFORMATIONWEEK: To what extent does the effort to develop better software go beyond the walls of Microsoft to become an industrywide cooperative effort? For example, a new group called the Sustainable Computing Consortium, spearheaded by professors at Carnegie Mellon University, has just been formed to address some of these issues.
GATES: There's several aspects that go beyond just what Microsoft needs to do. First of all, we've been promoting and pushing universities to do research on software testing and software liability for eight years now, and we've had some success in that, but in some ways it wasn't the sexy topic. Actually figuring out how to do something is sexier than figuring out how to make it very reliable. But there have been contributions coming from the universities. The Carnegie Mellon effort you mentioned. David Patterson at Berkeley--he and John Hennessey did the original explanation of why RISC was important--his big focus now is this notion of recovery, that if something goes wrong in a system, how do you make it basically trivial to get back to the recovered state? How much time does it take? How easy is it in terms of personnel? And it's very insightful stuff. I'd encourage you to look at it because his slides are actually very approachable and yet very deep.
And there are many things that are industry standards that need to be enhanced in the face of this challenge. For example, take the E-mail protocol SMTP. There's no way to know if a message comes from who it appears to come from. It's almost as if everybody could ring the fire alarm whenever they felt like it because the mail protocol doesn't verify, just doesn't include enough information to verify that the sender is a valid sender. So you've got to have an industry-standard protocol that's much stronger there.
A key point to make on this is that what the industry is doing, what Microsoft and the industry have to do in terms of the levels of reliability and security we need to achieve, they're unprecedented. These are things that were never solved in the past in any domain--not mainframe, Unix, you name it. Computer systems were inward facing. It's when you get the outward-facing pieces--E-mail, home and remote access, and your Web site--that now the system is out there and strangers can come in and do valid things, but they can also come in and attempt to do invalid things.
And so how have we dealt with these things? First of all, we have authentication. Authentication is done with passwords. Passwords are a great weak link. People pick guessable passwords, they don't change their passwords, and it's just not an adequate approach. So there will be a transition (that won't happen overnight) but a transition to where IT authentication both when you're in your office, but particularly when you're remote, will be like a cash machine where you have the physical smart card and then a short password to go with it. And that increases the reliability a great deal.
Another industrywide thing is we have these things called firewalls. People think I've got my firewalls up, I must be safe. Well, that's not right because the thing that's punched through the firewall is HTTP. Well, all the things that you were blocking in the past that came through some non-HTTP thing are all coming through on HTTP because you can always take a protocol and embed it inside another protocol. So HTTP is the problem now because it's the only thing that firewalls let through, so people just use that.
So there's a need to take these firewalls and make them both far more sophisticated and allow applications to educate the firewall on what's allowed so you don't just say, "OK, let's read HTTP"; what you say is, "OK, we have this sales order out and certain types of HTTP requests are valid for it." But when actually as part of developing the application you get the module that should run in the firewall and do the filtering in the firewall. So it's a big architectural advance that we're promoting--in what we call distributed firewall and firewall management--that has to be far better than it is today. And so the standard that we've got out there is this Firewall Control Protocol that does the things I'm describing, and we propose to put that in to the standards groups.
Another thing that we did that wasn't very visible at the time we did it--it was about three years ago--the guys inside the company said that automatic software verification had advanced, that this one company had advanced enough to really make it worthwhile. And they said: Let's just spend $50 million, buy these guys, and totally combine them with the Microsoft research guys doing a similar thing and really get it to critical mass, and use it for a number of years on our own stuff to improve it, and then put it out as a product. And this is where you "prove" programs--can they have a buffer overrun? can they have an invalid condition?--which has been a Holy Grail.
When I was a computer science student, which is now 25 years ago, we thought we were on the verge of proving programs at Harvard. I thought, oh, I'm going to do this commercial thing, I'm going to miss all these nice advances in proving programs in AI. But these proved to be very tough problems. Anyway, so we bought this software tool called Prefix, and we have used it internally and really refined this thing, and now we're getting it out into our tools. So there are a lot of these things that we need to get industry standards, industry practices.
And then a lot of it is taking the big R&D budget we have and making sure that--we're in a position--we're really the only company where we will be able to say that our operating system is secure because--partly because of these methodological breakthroughs and partly because the amount of outside exercise against it is very extreme. And that's a good thing, because 99% of that is benign stuff and we think it's great. As long as people share what they've learned, that's good. And we're 24 hours a day ready for anything that comes up.
We've been very focused on that, but the notion that we really needed to secure everything against malicious attack and how complex that was, it's really been in the last two years that there have been a number of groups in the company saying that people have good ideas about it, talking about how you do it.
So my memo is sort of like my Internet memo, that is, it's not the first memo on this subject. People don't get this memo and say, oh, security, how do you spell that? How many letters in that? Rather, it's empowering the people who have been championing that and making it clear that they're right about where it should be prioritized and highlighting some of the progress we've made. Then, we direct some of the product groups so that security expertise isn't just for the 15% of the group that likes to think about that stuff. It's 100% of the engineers. So that was a change in terms of taking one of the books that Microsoft wrote and really teaching the methodologies and sort of the paranoia about design and the verification techniques and how you make your software so you can do these either automatic or manual verification of its security properties.
INFORMATIONWEEK: When we were on Microsoft's campus in March, it became clear that some people there think they've been developing quality software for years; in other words, it's not a new concept to them. And we got a sense that a few people were a little bit defensive about it.
GATES: Well, let's separate out quality from security. Microsoft in terms of this quality stuff--we have as many testers as we have developers. And testers spend all their time testing, and developers spend half their time testing. We're more of a testing, a quality software organization than we're a software organization.
When we do a new release of Windows, which is, say, a billion-dollar effort, over half that is going into the quality. And these guys know that every release we've made of Windows--both by moving up to an NT kernel in the secure design, and by more advanced test methodologies--we've been raising the bar on quality every year, and we love to have people compare our quality to other people's quality. We will win in that any day. But our stuff is so broad that every delta of improvement is a big deal.
Now, one of the challenges we face in terms of how it all comes together is when you're running software on a PC there are pieces. There's a BIOS, there's device drivers, there's Windows, and there are third-party applications. And we run across very diverse hardware, peripherals, applications ... the most diverse thing in the world. And that's incredibly valuable. When you buy a PC you can buy it knowing that, hey, if it runs Windows, it runs your applications. The benefit to the customer is the openness of the PC hardware space. Nobody needs to come to us for permission to release an application or a device driver or a peripheral. They just do it. But to some degree that variety makes it so that sometimes when those pieces come together they don't all work perfectly together.
Now, we solve that problem or addressed that problem pretty dramatically with DataCenter Server, where we give up some of the flexibility. We require that it be certified drivers that are tested to a very precise methodology. It has to be among a very small set of hardware systems. With DataCenter, we've made that trade-off.
The thing we did with Windows XP is we put in this reporting system where whenever an application gets a system error or whenever the system gets a system error, it offers to report that back to Microsoft and people can choose to or not; about 70% of people do. Then we can see exactly what's going on. And so within a month of having this out there it was clear that the quality of video drivers was a problem. Every two or three days, they wouldn't work.
So now we're getting these reports that are very statistical. The way it works is: We get information about what the configuration of that system was, and then if that's not enough for us to track down the problem, what we do is when the report comes in we look at the information and then we can ask the system that's reporting, oh, because this is type X, we need more information and it goes and gets that information. There's all these privacy things we do to make sure it's not invasive in any way.
We were able to make a huge difference within a few months of Windows XP being out, because now we have a complete feedback loop where we see this and we have this Windows update thing in the lower right-hand corner of Windows, every week or so it will say, hey, I've got updates, and you can just say, do I want them? Do I just want the reliability ones and security ones, or do I also want the new feature ones? Whatever your preference is. And because systems are overwhelmingly Internet connected, you've completed the feedback loop. So somebody has a problem, we see it, we fix it, that goes out to the client machines.
We're also doing that for servers. It's a tiny bit more complicated in terms of IT policy and making sure that it works exactly right. So this statistical notion of, OK, to what degree are third-party drivers not perfect because they're in a hurry to get that driver out? They're not thinking of the overall PC ecosystem quite like we do or maybe Intel does. To win the benchmark they say, maybe if I did this it would be faster on the benchmark. OK, fine. I'm not saying they're irresponsible, although some of them are, because there hasn't been a mechanism for catching this where it all comes together. So now there is.
INFORMATIONWEEK: You kind of reiterated what we heard a couple of months ago, which is that Microsoft people feel they already develop quality software and when there are problems they're not always your problems. But at the same time, would you agree that there's room for improvement in the quality of the products that Microsoft puts out?
GATES: On an absolute basis, yes. Our software products are the most-used products in the world, and the beauty of that is even selling them at very low prices we can afford to put $5 billion a year of R&D into them, and a substantial part of that R&D goes to making those products higher quality and higher security. And ours is the only company you can go to and you'll find research people doing research on testing breakthroughs. This data mine is wonderful for the research guys, too, to look at different approaches and where they would pay out. But yes, no doubt. Part of the appeal of Windows XP is it is a step up, it's quite a step up in that dimension because the kernel, the so-called Windows NT kernel, was always designed to have a level of separation between things that leads to a more reliable system.
INFORMATIONWEEK: What would you point to as evidence that Microsoft's products are getting better? How do you measure quality?
GATES: There's a million ways to do it. You look at server uptime. Do they stay up a year if they're doing certain things? If somebody reboots, why did they reboot? Do they get system crashes? Do they get app crashes? Do they get unexpected behavior?
By using the software on the Internet, we want to have a red button, yellow button, green button. Red button is what we're already doing. If an app doesn't work, it gets a system error, then we get the report. So we've got a great feedback loop from that, and people are going to be very surprised at the benefits of that. Yellow button is where your system kind of frustrates you, and you want to push it and say, "Hey, I'd like this to be better." And then green button, which some people will never push, is when you really like something, you can tell us that, too.
INFORMATIONWEEK: If we were to ask you in January of next year, what do you have to show the world in terms of measurables for your efforts?
GATES: Oh, we'll have tons of statistics about server uptime, client uptime. We'll have objective statistics like that. We'll have subjective data surveying users about how they view their experience. We'll have subjective data about talking to IT people about their experience.
Part of the reason this is so urgent is not only is it important for what people are doing today, but it's critical for what we're asking them to do in the future. That is, this whole Web-services thing is the notion of the digital enterprise, where all your collaboration and activity, the state of your business transactions, is in the computer, and to ask people to do that, they have to view it as a very reliable system.
I make the analogy in the trustworthy computing memo to phone system or electricity system, where you have a huge dependence on those things. And no, they're not 100% reliable, but they're so reliable that you basically plan your day and your actions without thought that New York is going to have another blackout or something. And computing will only achieve its potential if our systems, both in reality and perception, have that level of capability.
INFORMATIONWEEK: On a scale of one to 10, where one is unsatisfactory and 10 is high satisfaction, how would you rate the overall software quality of your company's products today?
GATES: It's a very subjective number. I mean, is it as good as people want? One. Is it good compared to other people's software or what we were doing three or four years ago? I'll give us a nine. But it's the most subjective question in the world, and, hey, the customer is always right. Are they going to rank us any worse on that number now than they did four years ago? No. Are we doing a dramatically better job on this stuff now than four years ago? Yes.
But are people more ambitious about how they're using their systems? Take E-mail. If E-mail isn't working, it's a real drain on productivity. In some ways it's one of the most mission-critical systems. That's one of these ironies that people say: Oh, we've got to be careful about mission-critical systems. Well, you're running all your knowledge-worker productivity--shared files, E-mail, you name it--they're running against our infrastructure. That's mission critical. If that stuff can't be relied on there, that's real cost.
Most everything is mission critical now. And most everything has to run 24 hours a day. So all the things that Tandem and Stratus had, a commodity Windows server platform now has. So the answer to that question is going to depend on if you ask the guy 10 minutes after his PC just crashed, he'll tell you what he thinks. Then if he thinks back to: You know, XP, wow, this is a lot better than I've ever had before, and Microsoft the way they're propagating these updates now so that I don't have to pull and get updates.
Part of the irony of last year's virus attacks was that the fixes were all there, but because we weren't automatically distributing the updates, they weren't applied. That doesn't shift the responsibility away from us. It just says we've got to make sure the updates propagate to these servers automatically. It's just like the Outlook attachment. We made it so when you open an executable task, when you've got this big warning that says, "Be careful, be careful, be careful, don't open this," people just ignored that. So that was our solution three years ago; people didn't like that.
So our solution two years ago was, hey, strip out executables. Outlook Exchange will not pass through executable things. That works. But does it work to just have the updates be on our site and assume people come get them? We thought it would but the nature of the problem, the nature of behavior is such that's just not good enough. And that's why this Windows update infrastructure is pretty important.
So if you get a customer who has now got the new Windows update infrastructure and thinks, "Microsoft really did the mea culpa on that," they might give us a high rating. So you're going to get a high variance when you ask a question like that.
INFORMATIONWEEK: We asked our readers if they felt that the products from their most strategic software providers were trustworthy, somewhat trustworthy, or untrustworthy. Only 37% said trustworthy.
GATES: I wouldn't have guessed it would have been that high. I would have guessed like 25%. But it's very subjective. If you know that you're about to put all your business transactions there, and you say, "Whoa, do I really know?" And of course these systems are complex enough that it's not an IT person's job to get out the source code and look at it. But they have to believe in our methodology and our process and our quality assurance. That's billions of dollars that can't be replicated anywhere else. And the way that people want to use these systems, it's just raising the bar and is the industry and Microsoft at that bar? No. That's what the memo is about. Certainly, Web services won't happen the way we want unless we raise the bar.
2017 State of IT ReportIn today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Infographic: The State of DevOps in 2017Is DevOps helping organizations reduce costs and time-to-market for software releases? What's getting in the way of DevOps adoption? Find out in this InformationWeek and Interop ITX infographic on the state of DevOps in 2017.