Conventional video playback (also known as Progressive) involves a single video file at a single quality that is transferred as it is being played. If the user’s playback has caught up to how much of the video has been downloaded, the player pauses and buffers. YouTube subscribes to this method of playback but offers different quality levels that you manually select. You only watch a single quality unless you manually switch it.
With adaptive media streaming, a high quality base video source (often called a Mezzanine) is converted into a set of video files of varying qualities. This process is known as encoding. For example, you can take a mezzanine file and encode low, medium, high, and ultra quality versions of a video. These encoded files are then stored for distribution on a Server or Content Delivery Network (CDN).
When the user attempts to play a video adaptively, they are given a Manifest file that lists information for all these different video qualities. Adaptive streaming technologies then alternate between the different qualities (bitrates) depending on a user’s varying connection while playing the video in order to ensure that buffering is minimized. In order to start playback as soon as possible, adaptive streaming technologies usually begin playback at the lowest quality and then scale upwards after a few seconds. You may have noticed this happening when you start watching a movie or episode on NetFlix.
A video player (often referred to as a Client) that supports an adaptive media streaming technology will handle this process of switching between qualities automatically without a user’s involvement.
What are some adaptive streaming technologies?
The two biggest smooth streaming technologies I’ve worked with in my time at Digiflare are Apple’s HTTP Live Streaming (HLS) and Microsoft’s Smooth Streaming (MSS) technologies. These technologies differ in terms of the video and audio formats they support as well as how they go about delivering the video content optimally.
Streaming – What does HLS, HDS and MPEG-DASH mean?
These are all ‘chunked HTTP’ streaming protocols. These work by breaking the content in small (a few seconds) chunks that can be delivered as separate files rather than a constant stream of content. The advantage of this method is that it allows the client to make use of the ‘bursty’ nature of the internet and does not rely on a constant bandwidth being available.
Apple’s HTTP Live Streaming (HLS)
HLS stands for HTTP Live Streaming and was developed by Apple to serve its iOS and MAC OS devices. It is also widely available for other devices notably Android. Apple made the specification public by publishing it as a draft IEEE RFC. HLS usually makes use of MPEG -2 transport stream technology which carries a separate licensing cost which deters some manufacturers from implementing it in their devices. It is a simple protocol that is quite easy to implement.
- Manifest: M3U8 playlist
- Video: H.264
- Audio: MP3 or HE-AAC
- Container: MPEG-2
- Server: No special server software
Microsoft’s Smooth Streaming (MSS)
Microsoft’s Smooth Streaming technology also involves encoding a mezzanine into various quality levels but MSS supports slightly different formats in the encoding process. Video can be encoded using H.264 or VC-1 and audio is encoded to AAC or WMA. The encoded quality level video is wrapped in an MP4 container with a *.ismv or *.isma file extension.
During the encoding process, XML manifest files are also generated. An *.ism file is generated for use by the server in describing the available bitrates while a *.ismc file is used by the client to inform it of available bit rates and other information required in presenting the content. One such piece of information is the chunk duration.
Unlike HLS, Microsoft’s Smooth Streaming doesn’t encode the individual qualities into a series of chunks. Instead, the server cuts the full content into chunks as it’s being delivered. This requires a specially set up server using Microsoft’s Internet Information Services (IIS).
- Manifest: XML file with *.ism/ismc file extension
- Video: VC-1 or H.264
- Audio: AAC or WMA
- Container: MP4 (with *.ismv/isma file extension)
- Server: IIS (Internet Information Services) server
- Additional: Only quality files are stored but server virtually splits them up into chunks at playback
HDS stands for HTTP Dynamic Streaming and was developed by Adobe to serve its Flash platform. The BBC uses this protocol for its desktop browser presentations using the BBC Standard Media Player (SMP) which implements the Flash playback client. Adobe has published the HDS protocol to register developers. It is a more complex protocol and is harder than HLS to implement.
MPEG Dynamic Adaptive Streaming over HTTP (DASH)
MPEG-DASH stands for Motion Pictures Expert Group Dynamic Adaptive Streaming over HTTP. This is a new completely open source protocol that is just starting to be adopted by content producers and client implementations. It has the simplicity of HLS whilst being free of additional licencing other than that required by the codecs.
Unlike, HLS, HDS and Smooth Streaming, DASH is codec-agnostic.
DASH is audio/video codec agnostic. One or more representations (i.e., versions at different resolutions or bit rates) of multimedia files are typically available, and selection can be made based on network conditions, device capabilities and user preferences, enabling adaptive bitrate streaming and QoE (Quality of Experience) fairness.
- Manifest: Media Presentation Description (MPD)
- Video: Codec agnostic
- Audio: Codec agnostic
- Container MP4 or MPEG-2
MPEG DASH is the result of a collaborative effort from some of the biggest players (ie. Adobe, Apple, and Microsoft) of adaptive bitrate streaming. From a bird’s eye view it functions similarly to the technologies previously described, but differs in the details of its delivery to end users.
In DASH, the entirety of an available stream, made up of a media portion and a metadata manifest, is known as a Media Presentation. The manifest portion of this is called a Media Presentation Description (MPD). Much like an M3U8 or Smooth Streaming manifest, an MPD contains metadata for the media available.
The media portion of a presentation is made up of different quality levels of the same media. Each quality level is known as a Period. A period is a set of time-aligned contents (audio, video, captions, etc.) which form one entire viewing of the content at a single quality level. Each period consists of a collection of different media forms, each known as an Adaptation. So a period may consist of a separate video adaptation and audio adaptation. Each encoding of a particular adaptation is known as a Representation. Each representation is split into short chunks, dubbed segments. Using the terminology at hand, the entire stream consists of a set of periods where each period will typically contain a representation of each type of adaptation being delivered to a user in the presentation. Adaptive playback is facilitated appropriate quality as segments are downloaded as playback is taking place and connection speed is being monitored.
As confusing as that may have been to sort out, there is a significant theoretical advantage to this approach of different adaptations to build up a period versus the approaches previously described for MSS and HLS. This advantage is the codec agnostic nature of DASH. The media is served in either an MP4 or MPEG-2 container using whatever video and audio formats and the onus is put on players to be able to decode and render the video/audio/captions/etc. This eases up effort for content creators and distributors to prepare their content for adaptive streaming and also removes a lot of restrictions associated with proprietary solutions. That includes the IIS server set up for MSS and the proprietary encoding software for HLS.
However, this large scope of supported codecs does make for more complex player development. Communities have banded together to provide a plethora of player framework options for developing for DASH on a variety of platforms and for an assortment codecs. These frameworks vary in their supported platforms and features so a good amount investigation must be done in advance to find the right fit for the feature requirements of the player as well as the platform.
This is where subscribing to MPEG DASH as a solution may become problematic on more obscure platforms, and even on some of the more popular ones. This means MPEG DASH is not yet the answer to the segregation issue that exists with adaptive bitrate streaming.
Sample data flow of MS video streaming service