Technologies in a content delivery platform: NETFLIX
The Internet has become an integral part of modern life by transforming the way we communicate and interact with each other. It has opened new possibilities for entertainment, marketing, and networking. Lots of different applications including streaming platforms such as Amazon Prime, Netflix, Hulu, and Apple+ have been developed to exploit these new pathways.
Netflix is one of the pioneers among them. It has a subscription-based business model. To its 98 million paying users in 190 countries, Netflix streams 250 million hours of video content per day. At this scale, offering each consumer top-notch entertainment in only a few seconds is critical. Netflix has made sure that the user experience is streamlined and delightful by developing world-class infrastructure on a scale unmatched by any other Internet service.
Content Onboarding
When onboarding videos, it needs to be in different formats, because different people have internet connections with different speeds. The second thing is that Netflix plays with different resolutions to be compatible with the different subscriber devices because more than 2200 devices are supported by Netflix, and each of them has a unique set of formats and resolution requirements. In this way, a single video has multiple copies in multiple codecs and multiple resolutions. This transcoding process takes more time. Instead of making this a responsibility of a single computer as it takes time and has a chance of failing, Netflix has done it in a really smart way. It takes the original video and breaks it into chunks. Each chunk is run through the transcoding process for different codecs and resolutions. A combination of one codec, one resolution, and one chunk is one task. Likewise, those chunks are processed in parallel and at the end are stored in Amazon S3.
Initially, the video was divided into equal sizes by time. It seems good because each processor got the same load of work and finishes simultaneously. But the thing was when there is a split during a single scene when retrieving chunks, there may be lags and the user experience is bad. Therefore, instead of chunking based on time, Netflix chunks video by scene to eliminate the transition to new chunks during a single scene. That is instead of splitting by large time ranges, they split it by much more fine-grain time slots. Those small chunks are called shots. At the end of the processing, those shots are collated into scenes. Each scene has lots of smaller chunks.
If when a person during watching a video, and clicks on some point, the video serving algorithm will take it as one scene, the entire block will fetch together and it makes the user experience better. Netflix sees the entire movie and treats it as a set of chunks. During playtime, Netflix prediction algorithms recognize the user viewing pattern. If the user is continuously watching the content, a few chunks are fetched ahead predictively. Otherwise, each chunk is loaded on demand only when the user clicks on a particular position.
Less buffering is most important for streaming services. In addition to that chunking strategy, Netflix uses adaptive streaming based on ABR to address this issue. Depending on the bandwidth of the network, the content will be transmitted to customers. As a result, the picture quality is dynamically changed between higher resolution when sufficient bandwidth is available to lower resolution when network congestion or excessive latency is present.
Open Connect CDN
Caching is one of the best ways to serve static content. Netflix has extended this content and applied it to the level of the edge of the internet through its own CDN called Open Connect. Open Connect, Netflix’s proprietary global content delivery network (CDN), is a network of dispersed servers in various geographic regions. Everything involving streaming video is handled by it. Open Connect Appliances (OCA), which Netflix refers to as its caching servers, are located at internet exchange points (IXP) where internet service providers (ISPs) can connect with Netflix. Netflix also provides ISP partners with the option to integrate OCA into their own networks. This design gives Netflix subscribers a high-quality, seamless viewing experience while sparing ISPs money and reducing internet congestion. It has gone to such an extent as nearly 90% of the Netflix traffic is served by Open Connect.
Netflix has formed clusters of those edge caching endpoints to provide high availability and fault-tolerant. Due to the heterogeneous nature of the servers in those endpoints. Netflix uses a modified version of consistent hashing which is called heterogenous cluster allocation(HCA) for the distribution of content within the cluster. It allocates content in two stages. It allows caching the small number of most popular content in ISPs while storing a large number of least popular content in IXPs.
Furthermore, It is impossible to accommodate the full Netflix library in any cluster of co-located servers due to the limited amount of disc space that each server has and the size of the Netflix catalogue as a whole. Therefore Netflix uses machine-learning algorithms to selectively cache the content that is most popular in any particular region.
Netflix video streaming, similar to other popular streaming services, use TCP and simply buffers a few seconds of content, instead of using UDP since the delay is not crucial and loss of frame may affect badly on user experience.
Micro Service Architecture
Netflix has implemented their backend as a collection of services which is called microservice architecture. All tasks not related to video streaming are handled by this part of the architecture, including adding and processing content, sending them to servers throughout the globe, and controlling network traffic. Even though Amazon is their biggest competitor they have implemented their backend in AWS. Traffic to Netflix’s front-end services is routed via AWS Elastic Load Balancer (ELB). ELB utilizes a two-tier load-balancing strategy in which the load is distributed over zones and then instances initially.
Netflix caches frequently used data at the endpoints. Based on Memcached, it has created its own proprietary caching layer named EV cache. As this is a custom solution, they have used SSD instead of RAM in the caching layer. Multiple copies of the cache are kept in sharded nodes, and the data is distributed throughout the cluster within the same zone. And also multiple clusters have been implemented in different Zones. Synchronous replication is used on data writes to make sure data consistency and availability, while the reads are served by the nearest cluster.
Due to the need for ACID compliance, Netflix stores non-real-time data in MySQL RDBMS, such as payment and user details. MySQL is configured as a master-master setup using InnoDB. Netflix has set up cross-regional read replicas to ensure high availability and scalability.
On the other hand, real-time data such as user search history, and viewing patterns are stored in a Cassandra database which is NoSQL distributed database. Cassandra is scalable through data partitioning. According to Netflix, it is real-time data write: read ratio is 9:1. Cassandra is optimized for write-heavy workloads which make it suitable for a such behavioural pattern.
Data Processing
Netflix leverages Apache Chukwa and Kafka to ingest data generated in other parts of the system, such as error logs, Troubleshooting and diagnostic events and also Video viewing activities etc. Chukwa writes the event in Data Lake implemented on S3, meanwhile it also provides traffic to Kafka. Kafka moves data to various data sinks such as Elastic search and Spark. Netflix makes use of elastic search for system error detection, customer service, and data visualisation. For personalisation and content recommendations, Apache Spark and Amazon EMR are used. On those clusters, the majority of the machine learning processes are executed.
The innovative process of video processing and video service keeps Netflix running at scale. Netflix has been able to expand its customer base without overburdening its servers since it uses AWS Cloud for backend microservices. AWS cloud also provided a more reliable infrastructure and allowed Netflix to focus on improving video delivery instead of worrying about improving data centres. Netflix implements their own CDN open connect reducing its reliance on third-party CDNs like Akamai because third-party CDNs are delivering all kinds of content and are not optimized for video content. Netflix Trusted Opensource technologies and they implement their backend in a cloud-agnostic manner.
Conclusion
In summary, those are the technologies leveraged by NETFLIX to create its content distribution system. The technology that stands out among them is caching. In order to cache content, these platforms have been using their unique designs. In order to make sure the data is quickly available, it has been cached and replicated. Access speed-up techniques like prefetching and precomputing are also used for the purpose of reducing the response latency to customers.
This platform has expanded the resources at the edge of the core network and is running the servers at the edge of the core network to give low latency and QoS to services and applications. As a result, it has a new ecosystem that shifts computation, caching, and storage to the edge. Another common theme is CDN (Content Delivery Networks), which is used to store and deliver larger objects, such as videos, for the quick delivery to the end-users wherever they are located.
This platform is mainly driven by data. For predictions and recommendations, machine learning is widely employed. It has effective data ingestion mechanisms from various parts of the platform and various machine-learning algorithms are applied to the data gathered. The platform is striving to be better at the job due to two different reasons. Firstly, analytics and delivery algorithms are crucially important to keep high engagement levels with the end user. The platform is continuously improving its algorithms to deliver high-quality content, relevant to the taste of each end-user to keep them spending more time on the platform. Secondly, it is using complex machine learning and analytics algorithms to increase its advertisement revenue which is its most significant income stream.
As this platform’s backend is implemented as microservices, as a result, it is continuously evolving with ease. It is a known fact that the cloud is the optimal infrastructure to deploy microservice-based applications. Hence, the backend has been placed and leveraging the power of the public cloud infrastructures such as AWS, MS Azure and Google Cloud. In addition, it has taken full advantage of cloud services’ features to make such platforms scalable and fault-tolerant, among other advantages.
References
Netflix techblog. Available at: https://netflixtechblog.com/ (Accessed: December 19, 2022).
A computer science portal for geeks. Available at: https://www.geeksforgeeks.org/ (Accessed: December 19, 2022).