How to Troubleshoot Live Smooth Streaming Issues? – Part 5 (Client Manifest)
4. Client Manifest
As we discussed earlier, smooth streaming client starts streaming by first requesting the client manifest from the server using URL template: http://{serverName}/{PublishingPointPath}/{PublishingPointName}.isml/manifest . The client manifest contains information such as stream types, parameters, bitrates and fragment timestamps. By simply examining the client manifest, you could get some useful information for troubleshooting live smooth streaming issues. In IIS Media Services 4.0, we made some tweaks to the client manifest format for better efficiency. The discussion below is based on IIS Media Services 4.0 release.
In general, if your smooth streaming player is able to start the playback with both video and audio, chances are the “static information” (stream types, parameters, bitrates, codec data, etc.) is correct. Then the most important information in the client manifest for your debugging is the timestamps. Here is a short excerpt of the timestamp section from a client manifest:
……
<StreamIndex …. >
<QualityLevel ……… />
<QualityLevel ……… />
<c t="0" d="20000000" />
<c d="20000000" />
<c d="20000000" />
……
</StreamIndex>
The timestamp entries are located in the <StreamIndex> element. The <StreamIndex> element includes all bitrate for a media type (e.g. video). Each bitrate is specified as a <QualityLevel> entry. As you noticed, the timestamp entries (<c> element) are not defined as child elements for a particular <QualityLevel> but rather directly under <StreamIndex> element which means that every timestamp listed here is representing and SHOULD be available for ALL bitrates. The implication is two folds:
1. Server will try to make sure that it only publishes a timestamp to the manifest after all bitrates have reported it.
2. However, if certain bitrate is terribly late or stopped working, server would still release timestamps that were only reported from a subset of the bitrates after a timeout window. This means:
2.1 It is possible that server only has some bitrates available for a particular timestamp but not others.
2.2 Client has this logic to automatically try other bitrates if a fragment request failed for a particular timestamp.
In terms of the format definition for the <c> entry, “t” attribute is the absolute timestamp for a fragment and “d” attribute is the duration value. In IIS Media Services 4.0, the logic is that “d” (duration) is a mandatory attribute while “t” is optional for each “c” entry. It might sound a little reversed but there is a good reason for doing that – compressibility. In most smooth streaming manifests, the duration values of a stream type is a fixed value (like the example above). By only listing the “d” values, the manifest becomes much easier to compress. For example if you simply enable gzip compression on the IIS server to compress the manifest response (by enabling “Dynamic content compression”), you would get a much better compression ratio with this new format (Silverlight smooth streaming player can automatically handle gzip compressed manifest). Better yet, in our 2.2 version of the manifest format, we defined a new tag called “repeat” tag which can directly compress any duplicated durations in place (see “clientManifestVersion” attribute in this blog). For example, for the three <c> entries as shown in the above example, they could be collapsed to a single line:
<c t="0" d="20000000" r=”3”/>
If you use an encoder that supports fixed duration and you picked the right encoding settings, it is possible to completely collapse the 2.2 version of the client manifest into just a few lines and the only thing keeps changing in the manifest overtime is the values of the repeat tags (“r” attribute) for each stream. So basically your live manifest is just a tiny a few hundred bytes and it almost never grows! (This feature is supported by Smooth Streaming Client SDK 1.5 on the client side)
So now you might ask, how would the client calculate the original timestamp (“t”) value then because that’s the one that really matters to the client media pipeline. Good question and it’s not hard. First of all, there will always be a “t” value in the first <c> entry to set the base (if it’s not there it’s assumed to be zero). From there, the client can calculate all following “t”s by adding the current “t” and the “d” value. Basically the logic is that you can precisely calculate the next timestamp by adding current timestamp and duration. Reversely, this becomes a requirement for the server in that server would only omit “t” attribute if the current timestamp can be precisely calculated by adding the previous “t” and “d”. Ok, then you might ask, what if they don’t add up. This is quite possible in scenarios like encoder failover leaving gaps in the stream. In that case, server would have no choice but to explicitly output a new “t” attribute. For example:
<c t="0" d="20000000" />
<c d="20000000" />
<c t=”60000000” d="20000000" />
In this example, the first fragment has t=0. The next one doesn’t have “t”, so we can calculate its “t” as 0 + 20000000 = 20000000. Now the next “t” should be 20000000 + 20000000 = 40000000. But here somehow that fragment is missing and server only had t=60000000. In this case, server explicitly output a new “t” attribute for timestamp 60000000 to indicate to the client the right timestamp and also set the right base for the next calculation. So by using this knowledge, we can arrive at this conclusion:
If you see a <c> entry with “t” attribute in the client manifest, it indicates a gap in the stream. In most cases, it’s caused by ingest data interruption on the server but sometimes it could also be caused by improper encoder implementations.
Another type of problem that sometimes happens with new smooth streaming encoder is that sometimes the timestamps across all bitrates are not correctly aligned. As we discussed above, each <c> line that contains timestamp and duration information is for all the available bitrates. So it’s mandatory that fragment timestamps across all bitrates for a particular media type (e.g. video) must be precisely aligned. If they’re not, you might see strange <c> list in the client manifest like this:
<c t="0" d="20000000" />
<c d="20000000" />
<c d=”10000000” />
<c t=”55000000” d=”20000000” />
<c d="20000000" />
<c d="6666000" />
In this example, even though the encoder is supposed to always output 2-second duration fragments, it sometimes outputs some odd durations (in red) that caused unaligned fragment. In this case, the client could get many HTTP 404 errors because of the misalignment and the client’s heuristic logic would also get confused. So another advice here:
Examine the duration values in the manifest and check if there are any anomalies, which can help you identify encoder problems.
These approaches saved us a lot of time when we were debugging strange streaming issues that turned out to be caused by encoder bugs. In the next post, I’ll focus more on the fragment request/response between the server and the client.