Infovista | Testing native OTT video streaming applications

Learn how to successfully and cost-effectively validate the user experience of OTT video streaming services on your network.

Testing native OTT video streaming applications Learn about Infovista’s TEMS generic framework for OTT application testing and Video Streaming Quality Index (VSQI)

Contents

Enablers of today’s OTT video streaming

Insights on OTT video streaming quality evaluation 4 Metrics required for testing user-perceived OTT video streaming quality 4 Main testing challenge 6 A pragmatic testing solution 7 Infovista OTT video streaming testing 8 Generic framework for testing OTT video streaming applications 8 Video Streaming Quality Index (VSQI) 9 Trusting VSQI performance 11 Take away 12

WHITEPAPER

Enablers of today’s OTT video streaming The video streaming market is expected to continue to achieve over 18% Compound

issues such as slow start-up times, rebuffering, resolution switches and frame rate degradation. Mobile access also enables the ubiquity of OTT video streaming applications. Therefore, not only do OTT video streaming applications represent most of the mobile data traffic, but their consumption by users is omnipresent. The sophistication of smartphones . Today’s smart mobile devices benefit greatly from sophisticated display technologies such as high density of pixels that make up the screen, high contrast ratios that create brighter whites and deeper blacks, as well as in-plane switching technology that improves the angles from which the screen can be viewed; all embedded for example in retina technologies used in iPhones. Video resolutions of up to 4K at 60fps have therefore become the new norm for OTT video streaming applications. The complexity of video streaming technologies . As with the network and device technology evolutions, video streaming technology and its delivery protocols have gone through significant transformation. Video codecs using smart compression algorithms to support high-resolution video without increasing bandwidth or sacrificing speed, such as the new H. 266 VVC and AV1 codecs, can cut bitrates in half, making 4K and even 8K resolutions feasible. ML/AI-based OTT video clients which can sense the network performance conditions and know the device’s capabilities can increase the size of the video buffer and use optimized buffer pre-filling (initial buffering) to significantly reduce or eliminate rebuffering events at video start.

Annual Growth Rate (CAGR) 1 from 2021 to 2030. Major Content Delivery Network (CDN) players (broadcast, cable, streaming) contributing to this include Netflix (223 million paid subscribers), Disney+ (164 million), Amazon Prime Video (117 million), HBO Max (48 million), Hulu (47 million). YouTube chose to no longer be a CDN, but rather to focus on funding existing YouTubers who post exclusively to the platform. When it comes to OTT video streaming over mobile networks, Netflix and YouTube represent almost 50% of the viewership, split approximately equally between them. Such growth is expected considering the user demand for OTT video streaming. However, this could not have been achieved without three key enablers: the evolution of mobile network technologies, the sophistication of smartphones and the complexity of video streaming technologies. All three significantly rely on ML/AI techniques. The evolution of mobile network technologies . 5G NSA deployments, and lately 5G SA, have enabled mobile networks to deliver high bandwidths (100Mbits/s on average), enabling download speeds up to 20 times faster than 4G, and higher resolutions such as 4K (about 50Mb/s at low frame rates less than 30fps, or about 80Mb/s at high frame rates more than 60fps) and 8K (about 250Mbits/s), while helping to reduce video quality

1. Precedence Research: Video Streaming Market Size, Trends, and Growth Report, 2030

WHITEPAPER

Compared to just using Adaptive Bit Rate technology and/or standard clients, ML/AI-based OTT video clients help ensure higher video quality including minimizing playback errors and start-up delays. Continuously growing numbers of edge CDN servers accommodating the most downloaded video content for a specific area can significantly reduce streaming latency. At the same time, a video streaming session’s high quality, delivered with significant bandwidth efficiency, is ensured by the streaming protocols, such as the adaptive HTTP over TCP which uses advanced congestion mechanism, as well as the smart QUIC protocol, which is built on top of UDP, significantly reducing the TCP latency caused by multiple handshakes. Enabled by the evolution of mobile networks, devices and video streaming technologies, the popularity of OTT video streaming applications among subscribers keeps growing. Due to this, operators are increasingly facing a challenge to ensure the users’ experience of demanding OTT video streaming applications at minimal operational costs. This paper shows how operators can achieve this by using Infovista’s OTT video streaming application testing. Metrics required to describe OTT video streaming quality and the challenges to determine these are presented. A pragmatic solution to cope with the challenges and well suited for drive testing scenarios is discussed. Insights on OTT video streaming quality evaluation Metrics required for testing user-perceived OTT video streaming quality As described in ETSI TR 103.488 (Guidelines on OTT Video Streaming; Service Quality Evaluation Procedures), a user’s perceived quality of an OTT video streaming session has three dimensions: waiting time, video playback (also called presentation) quality and retainability (Figure 1). Evaluating the performance of any OTT video streaming application requires determining all three dimensions and understanding the impacting factors for each. Each of these three dimensions of a user’s quality of experience is determined by different QoS parameters related to the video streaming session phases as defined by the user’s actions when using the OTT application.

Request videoclip

(Play, Autoplay)

(User stop)

Display duration reached / video end reached

User action

Request video URL and ID

Buering

Displaying

Streaming phases

Load multiple HTML contents

(Stop playing)

Videoend

Video preparation time

Pre-playout buering time

Video streaming quality

Video playout duration

Video access time

Video playout duration (if user stop)

Perceived retainability

User perceived quality

Perceived waiting time

Perceived video quality (video playback time)

Figure 1. Typical OTT video streaming session (ETSI TR 103.488, YouTube example).

WHITEPAPER

The video access time is defined by two factors, preparation time and pre-playout (also called initial) buffering, which depends on the CDN streaming performance and OTT application configuration, for example client/buffer settings. The video presentation quality is defined by resolution value, resolution switches, frame rate and playback interruptions (rebufferings) during playback and by the video playout duration (e.g., streaming session cut-off). Factors impacting the quality are determined both by the configuration/ construction of the OTT video service (e.g. client/ buffer settings and schemes, codec configurations) and by the network performance (e.g. bandwidth/ throughput, delay, jitter, loss).

Last, but not least, the original content quality and dynamicity are factors impacting the video playback quality. High resolution, high quality and dynamic video content can be more sensitive to network performance degradation and thus to user experience, than lower resolution, lower quality and less dynamic (or stationary) video content.

Table 1 presents a map between the user experience video quality dimensions, their impacting factors, and their source.

USER’S PERCEIVED VIDEO STREAMING QUALITY

IMPACTING FACTORS

SOURCE OF DEGRADATION ****

Resolution

Network instantaneous behavior (real time)

bandwidth/throughput

Frame rate

Network* long term behavior (video session level)

Rebuffering**

Delay*, jitter, loss

Perceived video quality (video presentation/playback)

Video content

CDN/Service provider

Video codec

OTT application

Video client settings (pre-filling/initial buffering)

OTT application

Network behavior

Perceived waiting time

Video preparation time

CDN/OTT application

Video start failure***

CDN/OTT application

Perceived retainability

Video streaming cut off

RF, IP congestion

Network

* It should be noted that delay is rooted not only in the network (generally between 5ms-2min), but also in the encoding (50ms-10sec) and decoding (15ms-2sec) processes; thus, in the OTT application itself. Due to the small values when compared with the network delay, the delay from the OTT application is not considered in the discussion.

** Close to half (44%) of viewers say it’s the most frustrating aspect of the streaming experience.

*** An important factor since users are more likely to abandon viewing due to it.

**** Based on the consideration that the original content has the best quality

Table 1. Comparison between generic OTT telephony and WhatsApp MOS in LTE across varying radio conditions

WHITEPAPER

Main testing challenges While evolving in complexity and sophistication to enable better quality, OTT video streaming applications come with additional levels of encryption of the delivery protocols (e.g., QUIC) as well as with non-standardized, proprietary OTT video codecs and clients. In addition, OTT applications are continuously and dynamically changing to improve both the user experience and the protection of the video content itself from piracy with sophisticated encryption schemes. This kind of closeness and lack of transparency makes it difficult to develop and deploy testing solutions which require access to the video stream to determine user-perceived quality metrics. The level of difficulty is also dependent on the device’s Operating System (OS). 5G networks created a favorable ecosystem for the CDNs to offer a multitude of OTT video streaming applications. The resulting variety and diversity come with different protocols, platforms and device operating systems, and non-standardized proprietary codecs/clients. Consequently, they all bring differing expected performance which needs to be tested. Due to the desire to continuously increase the quality perceived by users, each OTT application runs continuous software version updates. All these add levels of complexity for testing. To cope with these challenges, the ETSI STQ- Mobile group developed and released TR 101.578 and TR 103.488 which offer guidance for testing OTT video streaming applications. The set of defined KPIs as well as their measurement refer to OTT video streaming access and retainability as well as the video streaming presentation quality during playback (see Table 1). However, when it comes to the latter, ETSI recommends describing it through several KPIs (Table 1), rather than a single QoE/MOS score.

ITU-T Study Group 12 also undertook extensive research to develop a series of models which are designed to estimate a user’s subjective opinion (QoE/MOS) on video streaming playback quality 2 . All these solutions, although accurate, have serious drawbacks when it comes to being implemented in a drive testing solution and/or on-device testing. Firstly, most of these solutions require information elements embedded in the video bit stream as input parameters (e.g. knowledge of I and P video frame), which is generally highly encrypted. Secondly, even those solutions which rely only on transport parameters, which are easier to get, are generally trained to work for one application and for limited resolutions. Thirdly, the provided video QoE scores correspond to a minimum measurement granularity of 4-6 secs (continuous scoring) and about 60sec or more for per video streaming session scoring. The measurement granularity of 4-6 sec., although not optimal, could work to reflect network behavior in most drive test scenarios. However, the 60 sec. per session scoring is less meaningful for drive testing since possible network problems could be hidden and/or smoothed out. Therefore, while ETSI provides exact guidance for determining the perceived waiting time and perceived retainability of video streaming sessions, the perceived video presentation (playback) quality remains largely with ETSI and ITU-T SG12 solutions neither optimal for drive testing scenarios nor for the variety of OTT video streaming applications.

2. ITU-T P.1203.x, ITU-T P.1204.x; x=1-5

WHITEPAPER

A pragmatic testing solution Delivering on the expectation of seamless user experience with OTT video streaming applications becomes a significant challenge for operators due to application variety and diversity, as well as their lack of transparency for testing. Furthermore, all the above must correlated against a backdrop of increased complexity in the mobile access technologies and the operator’s need to minimize OPEX. However, the continuously evolving complexity of OTT video streaming applications with ML-based proprietary codecs/clients/delivery protocols, and the increasing sophistication of devices with Ultra High Definition (UHD) displays, show fewer and fewer QoE problems. That is unless network problems, such degraded RF performance and/ or traffic congestion which can result in poor latency, jitter and loss, are experienced. The performance of the OTT applications is generally something that operators cannot control and/or manage. Therefore, operators need an OTT quality measurement which quantifies only the network impact component of the overall quality. It should be noted that the impact of the network is expected to be the most significant one. This is because the video content, the OTT application’s performance and the device are expected to have a low impact considering the technologies’ evolution in these areas, as discussed above.

There is therefore the need for an OPEX-efficient testing solution which is focused only on the network itself. The solution needs to enable operators to troubleshoot, optimize and benchmark their network to meet the minimum performance requirements to support demanding OTT video streaming applications while coping with their variety and diversity. This can be achieved with a pragmatic OTT video streaming testing solution which needs to satisfy the following criteria: • To run on-device, close to the user’s perception of the streaming quality • To support testing of a variety of OTT video streaming applications • To be consistent by providing perceived waiting time, perceived retainability and perceived video quality measurements agnostic to the OTT application • Provide video quality measurement with high granularity suitable for troubleshooting and optimization based on drive testing • Deliver video quality measurement to reliably reveal network-centric problems, free of the OTT application’s configuration (codec/client), video content, performance as well as the device’s performance

WHITEPAPER

Infovista OTT video streaming testing

The generic framework comprises two testing steps:

• Scripting, which contains user interface (UI) actions and trigger points for events, to be sent to on-device measurement (ODM) to generate measurement events and KPIs per streaming application, but with shared ODM service and KPIs. For example, scripting tasks can be video search, ‘Streaming Video IP Service Access Time’ event generation. • IP Sniffing, working on commercial devices embedded in the generic framework for ODM IP recording, to provide KPI triggers and payload information for throughput calculations. The KPIs are defined by ETSI (ETSI TR 101.578, ETSI TS 102.250-2), and they are reported per session and refer to session establishment/session set-up handling, streaming, up/downloading, posting, messaging, as well as video quality during presentation (playback). In addition, they are common to all applications, and thus agnostic to the native OTT application. In the case of OTT video streaming, these KPIs can be used to determine the waiting time, session retainability and playback video quality perceived by the user (Figure 1). Examples of such ETSI-based KPIs are presented in Table 2.

With extensive experience in on-device testing, OTT voice and video QoE testing, and a highly qualified understanding of operators’ network performance and concerns, Infovista has developed a pragmatic video quality testing solution for OTT video streaming, based on the criteria mentioned in section “ A pragmatic testing solution ”. Generic framework for testing OTT video streaming applications As part of the generic testing strategy , Infovista developed an on-device generic framework for testing a variety of native OTT video streaming applications, such as YouTube, Netflix, TikTok, Facebook and others. Infovista’s generic framework for native OTT application testing ensures consistency and efficiency through: • Automated and fast testing using one script to collect field data for several OTT applications in a single drive test • A common set of KPIs for all tested native OTT applications, as defined by ETSI

• Generic definitions of triggering KPIs measurements for all tested native OTT applications, as defined by ETSI

TYPE

KPIS

QOE DIMENSION

Streaming video play start time, failure Streaming reproduction start delay, failure Streaming video IP service access time, failure ratio Streaming reproduction cut off ratio Streaming video interruption duration, percentage Streaming aggregated average session resolution Streaming video resolution changes Number of positive/negative resolution changes CDN download application throughput Application layer DL throughput

Perceived waiting time Perceived retainability

OTT video session quality

Perceived video quality during presentation/playback

OTT video quality

Video provider Player type Subscriber IP CDN transport protocol CDN media server IP address

OTT application configuration

n/a

Table 2. Examples of OTT video streaming KPIs.

WHITEPAPER

Calibration to MOS The VSQI model aims to describe in a single number the video streaming quality as perceived by users during playback. Therefore, the VSQI model is based on mapping the model’s input parameters to MOS target values for a broad range of 4G and 5G network conditions. The reference data source for the MOS target values was generated based on ITU-T P.1203- 1204 series and ITU-T TR PSTR-PXNR - No-reference pixel-based video quality estimation algorithm , 2019. Support of UHD video VSQI values rank on the whole MOS scale for a large range of frame rates and resolutions up to 8K resolution. Video content dependency Video quality is highly dependent on the video content type, including its complexity, density and dynamicity. More intense video content with increased complexity/density/dynamicity is more sensitive to network errors. This means that it is more challenging for codecs and for the network, with the consequence that the human vision perceives degradations produced by network errors more easily than if the same type and level of degradations impacted less intense video content. An example of the perceived video quality (MOS) on content dependency at various resolutions is presented in Figure 3.

It should be noted that the encryption level can determine the number of KPIs available for measurement. Therefore, it is understood and agreed within ETSI that depending on the OTT application, the set of KPIs available can become limited. In addition, depending on the level of encryption, information regarding the application’s configuration can be available (Table 2). Video Streaming Quality Index (VSQI) As discussed above, ETSI defines a set of KPIs for describing the video quality during presentation (playback) as perceived by users. Even though this KPI set is valuable for troubleshooting and optimization, ETSI does not provide a single number (score) which can describe the overall playback video quality expressed MOS (Mean Opinion Score). ITU-T SG12 provides a series of models, but as of today none are suitable for today’s native OTT application testing on devices in drive test scenarios. Infovista Network Testing has developed a model which aims to provide a video streaming quality index expressed in QoE terms (MOS), suitable for any on- device OTT application and meaningful for network- centric troubleshooting and optimization based on drive test data 3 . The evolved VSQI model’s design considers various aspects related to its scope as a testing solution for today’s OTT application testing solution implemented in drive testing tools. Input parameters The model relies on the fact, proven by extensive testing and analysis, that there are three main factors rooted in the network which impact video streaming quality during playback: resolution, frame rate and interruptions/buffering (Table 1). The resolution and frame rate are determined by the dynamically available bandwidth changes and possible limitations caused by either network congestion and/or radio link RF quality. The video interruptions/buffering is caused by extended network delay/jitter and/or loss. Therefore, the model’s input parameters are resolution, frame rate and playout state, including initial buffering, re-buffering and playing.

3. This model has been developed based on previous work described in Ascom Network Testing: Video Streaming Quality Measurement with VSQI. Technical Paper , 2009; Ascom Network Testing: Evaluating Mobile Video Service Quality with Ascom TEMS , 2011.

WHITEPAPER

4.5

3.5

2.5

144p 240p 360p 480p 720p 1080p

1.5

Time index (steps of 4s)

Figure 2. Video quality index dependency on the video content for various resolutions.

Suitability for drive testing scenarios: fine resolution and two quality indexes . The fine geographical/spatial resolution with the short measurement time window of drive test data characteristics enables pinpointing geographically with good accuracy where a network problem caused degradation of the video quality. To best suit these drive test data characteristics, which are crucial for troubleshooting and optimization, the VSQI model uses short video sessions (30-40sec) to keep the video buffer small and thus ensure small delays from the moment when a network problem occurs to the moment its impact affects the video quality. With the same scope of best suiting drive test data characteristics, the VSQI model is designed with two outputs: VSQIinstant and VSQIsession .

Meaningful network performance evaluation, troubleshooting and optimization require a video quality index, which exhibits variability with network errors and captures only the network’s impact. It should show consistent results, independent of the video content which can change the results according to its type, as described above. It is recommended though that the selection of the video content be made considering that stationary and less dense content is expected to provide higher scores while hiding possible existing network problems. Therefore, Infovista’s VSQI model applies normalization to the most sensitive video content in the mapping process of the input parameters to the perceived video quality (MOS). Consequently, VSQI output shows meaningful variability for detecting network problems, troubleshooting and optimization.

WHITEPAPER

VSQIinstant values are calculated per sec to easily detect and reflect the real time network quality and its impact on video streaming. Therefore, VSQIinstant is defined by:

Where:

where (i, x) variables pair can be (resolution r, resolution value x) and (framerate fr, framerate value x)

As can be seen in equation (1), VSQI dependency on the resolution and frame rate parameters is described by a sigmoid function. The sigmoidal (“S” shape) has been selected because it is well known that human perception shows this behavior in relationship to individual video quality parameters. In addition, extensive testing and analysis showed the multiplicative effect of individual video quality parameters on the video quality. VSQIsession values are calculated per video streaming session (one value after the first 30sec video streaming time) and they capture the long-term effects of video playout interruption (rebufferings) as well as possible resolution changes. VSQIsession is based on all the VSQIinstant values during the first 30-second video streaming time weighted by the video interruption periods.

Where the weighting function W depends on rebuffering parameters which can be length, percentage or count.

Generic for mobile OTT applications VSQI works for any OTT video streaming application running on devices with displays of 6” and 7’’. However, VSQI values are unique per application, and video quality results on different OTT applications cannot be compared to each other. Trusting VSQI performance VSQI training and tuning is based on a large set of simulated network conditions containing error patterns, including delay, jitter and latency, which are characteristic to 4G and 5G NR networks. VSQI performance has been tested on real-life network conditions showing from poor to very good quality. This means both the VSQI accuracy and robustness are fully tested. As mentioned above, MOS target values against which VSQI has been tested are based on the ITU-T P.1203-4.x series. The performance results are presented in Figure 3. VSQIsession and MOS target values for a wide range of conditions and quality are compared after applying a 3rd-order polynomial mapping between the two data sets to remove any bias, as required by ITU-T P.1401 (Statistical evaluation of QoE models). The correlation coefficient, RMSE (Root Mean Square Error) and MAE (Mean Absolute Error) as defined by ITU-T P.1401 are calculated and all the performance statistics show values within the performance requirements on QoE video models (ITU-T P.1203-1204.x series), such as R>80%, RMSE and MAE<0.5 The regression chart is presented in Figure 3a. These results prove VSQI trustful performance, reflected by the fact that although designed to only quantify the network’s impact, VSQI provides high accuracy on the MOS scale, like a full QoE video quality model. A full QoE model reflects the impact of all components affecting video quality: network, video content, OTT application and device performance. This is largely expected since the network weighs in most on the quality, as described in section “ A pragmatic testing solution ”.

WHITEPAPER

Figure 3a.

Figure 3b.

Video quality index dependency on the video content for various resolutions.

Take away The evolution of mobile network technologies, the sophistication of smartphone capabilities and the complexity of video streaming technologies enable consistent customer demand for high-quality OTT video streaming applications. Supporting OTT video streaming applications with seamless user experience becomes a significant challenge for operators due to the applications’ variety and diversity, as well as their lack of transparency for testing. All this must then be correlated against a backdrop of increased complexity in the mobile access technologies and under OPEX constraints. Since the performance of the OTT applications is showing continuous improvement while at the same time being something that operators cannot control and/or manage, a pragmatic testing approach is the most cost-efficient solution to address this task. Infovista developed such a solution based on a generic framework for testing a variety of native OTT applications with a common set of KPIs describing user-perceived waiting time, retainability and video quality during playback. Infovista empowers operators with a generic tool which enables consistency of testing across various native OTT applications. The benefit of network- centric VSQI, with calibration to the most sensitive video content and dual scoring, instantaneous and per session, is two-sided: troubleshooting accurately with fine resolution suited to drive testing and benchmarking the overall OTT video streaming session quality

Figure 3b shows a time snapshot of VSQIsession and MOS target values. The chart shows the raw VSQIsession values, as they would be displayed in field measurements, along with VSQIsession values after the 3rd-order polynomial mapping and MOS target values during the same time window. The chart shows that the 3rd-order polynomial values not only exhibit the same time distribution but also values very close to the MOS target values. This is further proven by the performance statistics presented above (Figure 3a). The raw VSQIsession values show the same time distribution, as expected, but the displayed values are slightly lower than for MOS target values. This behavior is intended by design for two main reasons. First, as mentioned in section “ Video Streaming Quality Index (VSQI) ”, to make the scoring independent of the content and to ensure sensitivity to network problems for easy detection, troubleshooting and optimization, a normalization to the most sensitive video content has been applied to VSQI model. Second, it is expected that higher video resolutions and frame rates will emerge with 5G Advanced. Thus, better video quality is expected from higher video resolutions and frame rates, and it should be possible to rank it against previous video quality on the same MOS scale. The VSQI calibration to MOS scale is designed to support large ranges of frame rates and resolutions up to 8K. Therefore, it is shown that VSQI provides the network-centric video quality with an accuracy characteristic of QoE video quality models and, unlike those models, supports high resolutions and frame rates.

EUROPE HEADQUARTERS Infovista S.A.S. 3 rue Christophe Colomb, 91300 Massy, France Telephone: +33 1 64 86 79 00 Fax: +33 1 64 86 79 79

AMERICAS HEADQUARTERS Infovista Corporation

EASTERN EUROPE, ASIA, AND AFRICA HEADQUARTERS PO Box 54753, Office 429, 4th Floor, Building 8WB, Dubai Airport Freezone

20405 Exchange Street, Suite 300 Ashburn, VA 20147 USA

For more information please visit www.infovista.com For sales inquiries please email info@infovista.com

Telephone: +1 855 323 5757 Fax: +1 703 707 1777

Telephone: +971 4256 7101

Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 Page 11 Page 12 Page 13