I just got back from MIX08 in Vegas where I gave a talk called “Everything You Need To Know About Debugging and Diagnostics with IIS7.” It was a fun talk. There were a fewIT administrators and a whole lot of developers. When we have talked about this topic with customers in the past, we have focused on the IT administrator perspective. This time, we took it from a different angle, which is to think through when and how you would leverage the diagnostic platform as a developer.
At the end of the day, both developers and IT professionals use most of the same tools (with a few differences to be called out in this post). The primary difference is the environment and the dev lifecycle in which the tool is used.Advice From the Front Lines (Or, What I Learned From Brad, our IIS Escalation Engineer)
Before I went to MIX, I talked to Bret, the IIS Escalation Engineer who works closely with our product team, asking for his input on how he diagnoses problems with IIS apps. I figured that he has a great perspective on this, since he has seen all sorts of problems in all sorts of configurations. Bret provided a list of diagnostic patterns that I included in my presentation at MIX, namely:
Don’t assume something is happening. Do get facts. (Using features like performance counters, event counters, FREB logs, RSCA)
Don’t go down the “switch this setting and see”. Do find the root cause, and then resolve the problem there. (Again, FREB, detailed errors, RSCA)
Do isolate the problem and get a good problem statement. (ditto on tools)
When you think about it, these three concepts are both critical to effective troubleshooting and amazingly hard sometimes when you are in the thick of figuring out why your application just won’t work the way you want it. It all comes down to figuring out the root cause of your problem. Bret talked about one case where someone he worked with once spent 2 hours trying to troubleshoot a server unavailable error by getting netmon traces and performing a whole host of other diagnostics. The root cause was that the w3svc was not started.
Once you have determined the root cause of the problem, you can isolate it even further to figure out what you need to fix. Bret gave another example where if a customer is having problems with Windows Auth, try using NTLM only. Does that work? If so, then you know there is a problem with Negotiate and Kerberos config.
That’s what the IIS7 platform diagnostic tools are all about – narrowing down to root cause and isolating the problem. Our diagnostic are built into IIS and integrate with ASP.NET Trace events and System.Diagnostics.
2 Very Useful Diagnostic Tools in the Dev LifeCycle
Detailed errors and Failed Request Tracing are two features that apply all the time. Detailed Errors you just “get” for free with the Web server. Failed Request Tracing requires a tiny amount of work to use – namely, setting up rules and looking at logs.
When you are developing a web app, you are constantly checking to see if it works in the browser. Me, I am prone to configuration errors, especially if I get myself in trouble messing around the applicationhost.config or web.config file directly – I stay in safer waters when I use the IIS Manager or appcmd to modify configuration. (AppCmd backup has become a reflexive twitch for me before I start to fiddle with an app.) I tend to see a lot of 500.19 errors with wrong configuration syntax. Detailed Errors are smart enough to tell me the exact plan in the configuration file that is not working.
Failed Request Tracing is incredibly useful to pinpoint what is failing where in the request pipeline. You can set up rules that trace a range of errors (like 400-500). Combined, the two tools are pretty handy for the quick “gut check” when you are developing your code. Since ASP.NET Trace events and System.Diagnostics data are also integrated in the Failed Request Tracing logs, you can use the single log file to view the data that you get from HTTP errors, ASP.NET Traces, and System.Diagnostic calls.
When you move to functional testing, you can add on a couple more layers of diagnostics, namely WCAT, orphaning processes, and some interesting performance countersFunctional Testing with Web Capacity Analysis (WCAT)
Web Capacity Analysis Tool (WCAT) for stress testing. It’s amazing how few people know about WCAT. This is the tool that we use internally for our own performance testing. It lets you throw all kinds of load against your dev box (or production server) to simulate high load. And it’s free. You can get it from the IIS6 Resource Kit or download it from www.iis.net Download Center.Orphaning Failed Processes For Debugging
You might want to orphan failed processes, which basically means making sure the server doesn’t kill off a worker process that hangs. The Web server will automatically recycle a failed process by ending it and starting a replacement. If you enable the process orphaning, the Web server leaves the failed worker process running and starts a new process up in its place.
You can also configure the orphaned process so that the server can run a command, like launching a debugging tool, on the orphaned process. (The Executable attribute is how you could attach a command to the orphaned process in the UI and orphanActionExe is the config setting.)
Monitoring Performance Counters
Performance counters are handy for functional testing. Mukhtar, who does our performance testing for IIS, looks at different data for managed and unmanaged requests.
For unmanaged requests, Mukhtar looks at "Processor(_Total)\\% Processor Time". If this counter falls well short of 100%, Mukhtar starts digging. He also looks at MBits/sec in the wcat log summary section to see if the network was maxed. Another counter he checks out is "System\\Context Switches/sec" which is basically Context Switches/Request in the wcat log summary section, to learn if the problem is scaling. And of course, the "Memory\\Available Mbytes” is important because it tells you if you ran out of RAM.
When working with managed requests, Mukhtar pays close attention to all counters GC-related to see if there are any unusual spikes.
In addition to these counters, there are also IIS-specific counters that are useful to monitor. I would check out the IIS6 perf counters to watch (http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/71490aae-f444-443c-8b2a-520c2961408e.mspx?mfr=true), which are still valid, as well as some of the newer ones in IIS7 that I talked about back in January: http://blogs.iis.net/mailant/archive/2008/01/10/new-worker-process-performance-counters-in-iis7.aspx.Other Tools in Production
Once you are running your code in production, all the same diagnostic tools apply plus additional tools like Runtime State and Controls API (RSCA). RSCA lets you see requests that are in-flight and taking more than 0 seconds to execute.
You can view RSCA data through the IIS7 Manager or using appcmd (“appcmd list requests”). Unless you are using WCAT for stress testing (or demoing, in my case), you typically don’t use RSCA when you are developing and in functional test mode because you aren’t generating enough load to see where a request is getting bogged down in the pipeline.
Another tool that you can use in production is Event Tracing for Windows (ETW). You can use our LogParser (also in the IIS6 resource pack) to parse the verbose logs that you get with ETW. Bret our IIS escalation engineer recommends using ETW for complicated issues with a specific request. He also recommends DebugDiag, a tool in the IIS6 resource pack that lets you find memory leaks, process terminations, and slow responses.Best Practices
For the developing phase (which includes functional testing), I called out:
Use failed request log files to find error patterns and offending URLS, along with detailed errors
Consider using process orphaning
Use ASP.NET health monitoring to receive configurable alerts about errors
Use WCAT to stress application before product
When you are tuning, you should consider:
Use design patterns in the Performance Tuning whitepaper (here’s the link that I forgot to add to my deck and promised the audience that I would put in this blog: http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/Perf-tun-srv.docx). This isn’t just about tuning IIS, though, it includes all of Windows Server 2008.
Enable Output Caching for semi-dynamic pages. In my demo, we saw an enormous difference in performance by moving all content with a .php extension to kernel mode caching.
Set 32 bit IIS worker processes in Wow64 mode in per-AppPool settings (there was a typo in the deck that didn’t include the very important 32 bit part, sorry)
If you * script-mapped all requests to ASP.NET in IIS6, Integrated Pipeline is much faster than an IIS6 * scriptmap solution. Try together with IIS7 URL Authorization.
Move your highly trafficked default document to top of the Default Documents list so that the Web server doesn’t always go down the list to find it.
While doing production-level troubleshooting, it is good to:
Use Failed Request Tracing to capture hard-to-repro errors. Keep the rules on in production to make sure that you’re able to react quickly.
Set fine-grained Failed Request Tracing rules to keep your log history valid. By default, the limit to Failed Request Tracing logs is 50. After that, they start to recycle. You can expand that limit to a higher number if you would like. Best practice, though, is to build the right rule set.
Check key performance counters for clues on application health, which I mentioned earlier.
One of the things that I talked about today is the new modular platform for IIS7. You can replace one of your modules with your own, or extend the platform by building in new functionality. That’s what we are doing with our IIS extensions like secure FTP Server for Windows Server 2008. When you build on the new IIS7 platform, you get the benefit of other features in the platform like remote delegated administration (!) and the diagnostics functionality.
If you write your own module that enables virtual directory hosting, you can use Failed Request Tracing and all of the other diagnostics that I talk about in this blog with it. For FastCGI-compliant application frameworks like PHP and Ruby, there are a couple of diagnostics idiosyncrasies to note:
In Detailed Errors you would see potential errors from the FastCGI module and not a particular application.
Your PHP application will not generate a Failed Request Tracing event unless the event occurs before entering the PHP engine and after leaving the PHP engine. Same thing with ETW. Since the PHP engine doesn’t emit Failed Request Tracing events or ETW events, the app will be pretty much silent while being processed by the PHP engine itself.Questions from the Audience
The Q&A time after the session was a lot of fun. I know that I am not capturing all of questions here, sorry about that -- in particular, I can’t remember all the questions that came up during the presentation itself. But here is a list of some of the questions that I caught at the end of the talk:
How do I migrate my old IIS6 applications to IIS7? Try using the CTP for Web Deploy, downloadable here. Make sure you grab the walkthroughs (under the Documentation link on the download page) – the walkthroughs are basically discrete labs on common tasks with Web Deploy.
Are you doing anything with PowerShell? Yes, we are working on Technical Preview for a PowerShell provider for our configuration system.
Can you use kernel model caching on content that requires authorization? No, if the content requires authorization, you need to use user mode caching, which can be configured on the same page as kernel mode caching.
I have a lot of old ADSI scripts that create web sites in my hosted environment. When the scripts create the new sites, are they IIS7 sites or IIS6 sites since it’s using a legacy technology? If the script is creating a new site, it should be an IIS7 site. To get the scripts to run, though, you will need to install the IIS legacy components which are an optional install in setup.
Why does IIS always bring along the legacy components? The IIS web server doesn’t require the legacy components for its core functionality. Other functionality will bring it along, though. For example, if you are using SMTP with your IIS server, you will need to install the legacy components for IIS so that SMTP with work (it needs the metabase).
Is there any migration between the metabase on the new XML configuration system? No, we do not have a conversion tool that transforms the metabase into the right config files. We will focus our efforts in a great Web application and server migration story that takes the config settings from one store to the other, rather than trying to convert the storage itself.
What is the relationship between the error dialogs that I pop in my app and Detailed Errors? There is actually no correlation between the app error dialogs and Detailed Errors. Detailed Errors are all about HTTP error codes and mapping the error code to probably cause. Your error dialogs are likely to application logic-specific.
Can I view Detailed Errors remotely? No, Detailed errors can only be viewed on the local server.
Can II6 applications work on IIS7? Yes. ISAPI apps will run in Classic Mode without modification.
If you are a developer and you get a feature delegated remotely to you, do you have a limited view scoped down to just what you are allowed to do?
If there are any of the MIX08 IIS diagnostics talk attendees out there, thanks for coming and listening! You were a great crowd. I will try to get the video posted once that is out.