FastCGI: Debugging "The FastCGI process exited unexpectedly"
FastCGI: Debugging "The FastCGI process exited unexpectedly"
Update 17-Oct: Added .lines, fixed some spelling mistakes, added pageheap Here is the sequence of events for two back-to-back FastCGI requests to a freshly started worker process:- Request 1: HTTP GET /hello.php
- IIS (iiscore.dll) looks up handler in system.webServer/handlers, find FastCGI module
- The FastCGI module (iisfcgi.dll) looks up handlers@scriptProcessor in fastCgi/applications
- The FastCGI module spawns a new FastCGI process
- The module then starts communicating with the process, according to the FastCGI protocol
- The FastCGI process generates and returns a response
- Request 2: HTTP GET /hello.php
- (same as above, except request 1's process is re-used)
As you can see there are many places where the FastCGI module expects to have a FastCGI process to communicate with. If the FastCGI process disappears at any of these steps, the "The FastCGI process exited unexpectedly" error occurs. If this happens frequently Rapid Failure Protection may be enabled.
Here are some common causes for this error:
- The FastCGI process calls some sort of ExitProcess API, e.g. the C exit() function
- An unhandled exception occurs, e.g. Access Violation
ExitProcess:
- If the FastCGI is a script interpreter, the scripting language may support some sort of ExitProcess function and the script may call it. (though PHP's exit() function does not seem to exit the process)
- FastCGI processes often voluntarily exit after performing a certain number of requests. (e.g. PHP 5.2.1 does so at about 250 and can be configured) If the FastCGI module is expecting (fastCgi/application@instanceMaxRequests) the FastCGI process to perform more requests then this can also be a source of this error. This error can only happen in the time between when the next request arrives and the module detects the process' exit. This happens most often on busy servers, where requests are queuing for a FastCGI process.
Access Violations are more difficult to debug. The first step is to create a simple repro, here are repros sorted from simplest to most challenging:
- Command line, without FastCGI. (e.g. \php\php.exe \inetpub\wwwroot\hello.php)
- Command line, with a web environment. (e.g. Ruby's Webrick, or my fake FastCGI server)
- Single HTTP request. (e.g. recycle AppPool, GET /hello.php)
- Only under load. (e.g. Only when wcat is hitting the site)
- Randomly. (e.g. Only in production)
-
Attach the debugger:
- For command-line failures, prefix the debugger onto the command. (e.g. c:\debuggers\ntsd -g -G \php\php.exe \inetpub\wwwroot\hello.php)
- For AppPool debugging, you'll need to attach after starting the AppPool, e.g.
D:\>net start w3svc The World Wide Web Publishing Service service is starting. The World Wide Web Publishing Service service was started successfully. D:\>start http://localhost/dne D:\>\Windows\system32\inetsrv\appcmd list wps WP "4012" (applicationPool:DefaultAppPool) D:\>c:\debuggers\ntsd -o -g -G -p 4012
- Reproduce the Access Violation, you should see something like this in the debugger:
If you don't see an access violation occur in the debugger, then you don't have a reliable repro. Go back to creating a small simple repro. (very rarely certain AVs only happen when NOT under a debugger).Executable search path is: ModLoad: 00370000 00374000 av.exe ModLoad: 77380000 774a6000 ntdll.dll ModLoad: 76a80000 76b5b000 D:\Windows\system32\kernel32.dll ModLoad: 75eb0000 75f5a000 D:\Windows\system32\msvcrt.dll (ce4.d30): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00000000 ebx=00000000 ecx=00372374 edx=00000000 esi=00000001 edi=00372374 eip=00371190 esp=0014fe24 ebp=0014fe28 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 *** ERROR: Module load completed but symbols could not be loaded for av.exe av+0x1190: 00371190 c700adbaadba mov dword ptr [eax],0BAADBAADh ds:0023:00000000=??? ????? 0:000>
-
Setup symbols:
If symbols are working, then, after "k" (stack back trace), "lml" (list loaded modules) will not show "no symbols" or "export symbols". The Microsoft symbol server only gets Microsoft symbols. You'll also need symbols for the application, e.g. If you're debugging PHP use the PHP Snapshot php bits, and then use the "Debug pack" symbols (.sympath+ \path\to\debugpack; .reload). If you're using Ruby, you'll probably need to build from source and then ".sympath+ " to wherever ruby puts the .pdb files.0:000> .sympath SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols Symbol search path is: SRV*c:\localsymbols*http://msdl.microsoft.com/download/sy mbols 0:000> .lines Line number information will be loaded 0:000> k *** ERROR: Symbol file could not be found. Defaulted to export symbols for ntdl l.dll - ChildEBP RetAddr WARNING: Stack unwind information not available. Following frames may be wrong. 0014fe28 00371312 av+0x1190 *** ERROR: Symbol file could not be found. Defaulted to export symbols for D:\W indows\system32\kernel32.dll - 0014fe6c 76ac1d02 av+0x1312 0014fe78 773b85db kernel32!BaseThreadInitThunk+0x12 0014feb8 773b85ae ntdll!RtlInitializeExceptionChain+0x64 0014fed0 00000000 ntdll!RtlInitializeExceptionChain+0x37 0:000> lml start end module name 00370000 00374000 av (no symbols) 7c800000 7c8f5000 kernel32 (pdb symbols) c:\symbols\kernel32.pdb\CE 65FAF896A046629C9EC86F626344302\kernel32.pdb 7c900000 7c9b0000 ntdll (pdb symbols) c:\symbols\ntdll.pdb\36515 FB5D04345E491F672FA2E2878C02\ntdll.pdb
- Share out the stack backtrace (the "k" command's output) with whoever you think may own it.
- Take a memory dump. (".dump /ma \av.dmp") and then share out the av.dmp file and any symbols you did not get from the Microsoft symbol server.
- The Windows heap has a built-in memory-checking feature called pageheap. It makes debugging certain memory-usage errors extremely easy. On the downside, it slows the process down substantially and uses massive amounts of memory, so use caution before enabling it and remember to turn it off before perf testing. Here is a sample usage:
- ensure no ruby.exe processes exist
- gflags /p /enable ruby.exe /full
- start ruby.exe, cause av, debug
- gflags /p /disable ruby.exe
- gflags /p (should say "no application has page heap enabled")