Skip to content

Latest commit

 

History

History
136 lines (108 loc) · 8.04 KB

File metadata and controls

136 lines (108 loc) · 8.04 KB

Designing Youtube | Netflix | Hulu

image

Questions upfront

  • Can we leverage the existing cloud infrastructure from Amazon, Microsoft, or Google, or we are focusing on inventing them by ourselves?
    • We could expect to hear that we can leverage, because it's unrealistic for most of the companies.
  • Cost optimization on CDN level
    • Definitely worth to discuss because it costs a lot.

Functiona moments and Requirements

  • Video uploading flow

    • (Optional) Recommendation system based on your preferences (immediate changes, scheduled changes)
    • bitrates & transcoding questions
  • Video streaming flow

    • (Optional) Prepared set of 100 videos on the main screen
    • (Optional) Offline streaming?
    • Switching between bitrate for smooth user experience?
  • (Optional, it leads us to another system design) Live streaming: It refers to the process of how a video is recorded and broadcasted in real time. The notable differences are:

    • Live streaming has a higher latency requirement, so it might need a different streaming protocol.
    • Live streaming has a lower requirement for parallelism because small chunks of data are already processed in real-time.
    • Live streaming requires different sets of error handling. Any error handling that takes too much time is not acceptable.

Non-Functional Requirements

  • Ability to upload videos fast
  • Smooth video streaming
  • Ability to change video quality
  • Low infrastructure cost
  • High availability, scalability, and reliability requirements
  • Clients supported: mobile apps, web browser, and smart TV

DAU & Costs

  • Assume the product has 5 million daily active users (DAU).
  • Users watch 5 videos per day.
  • 10% of users upload 1 video per day.
  • Assume the average video size is 300 MB.
  • Total daily storage space needed: 5 million * 10% * 300 MB = 150TB
  • CDN cost.
  • When cloud CDN serves a video, you are charged for data transferred out of the

CDN. Discussing how to reduce the cost of CDN might be very important on the interview

  • Let us use Amazon’s CDN CloudFront for cost estimation. Assume 100% of traffic is served from the United States. The average cost per GB is $0.02. For simplicity, we only calculate the cost of video streaming.
  • 5 million * 5 videos * 0.3GB * $0.02 = $150,000 per day.

High Level Design

image
  • CDN and blob storage are the cloud services we will leverage.

Video Uploading Flow

image
  • Load balancer: A load balancer evenly distributes requests among API servers.
  • API servers: All user requests go through API servers except video streaming.
  • Metadata DB: Video metadata are stored in Metadata DB. It is sharded and replicated to meet performance and high availability requirements.
  • Metadata cache: For better performance, video metadata and user objects are cached.
  • Original storage: A blob storage system is used to store original videos. A quotation in Wikipedia regarding blob storage shows that: “A Binary Large Object (BLOB) is a collection of binary data stored as a single entity in a database management system” [6].
  • Transcoding servers: Video transcoding is also called video encoding. It is the process of converting a video format to other formats (MPEG, HLS, etc), which provide the best video streams possible for different devices and bandwidth capabilities.
  • Transcoded storage: It is a blob storage that stores transcoded video files.
  • CDN: Videos are cached in CDN. When you click the play button, a video is streamed from the CDN.
  • Completion queue: It is a message queue that stores information about video transcoding completion events.
  • Completion handler: This consists of a list of workers that pull event data from the completion queue and update metadata cache and database.

Video Uploading Flow. Transcoding

Transcoding is computationally expensive and time-consuming.
Meta uses DAG (Directed Acyclic Graph) programming model which defines tasks in stages so they can be executed parallelly or sequentially. image

  • Inspection: Make sure videos have good quality and are not malformed.
  • Video encodings: Videos are converted to support different resolutions, codec, bitrates, etc. Figure 14-9 shows an example of video encoded files.
  • Thumbnail. Thumbnails can either be uploaded by a user or automatically generated by the system.
  • Watermark: An image overlay on top of your video contains identifying information about your video
image

Transcoding Architecture

image

Preprocessor

  1. Video splitting. Video stream is split or further split into smaller Group of Pictures (GOP) alignment. GOP is a group/chunk of frames arranged in a specific order. Each chunk is an independently playable unit, usually a few seconds in length.
  2. Some old mobile devices or browsers might not support video splitting. Preprocessor split videos by GOP alignment for old clients.
  3. DAG generation. The processor generates DAG based on configuration files client programmers write. Figure 14-12 is a simplified DAG representation which has 2 nodes and 1 edge:
  4. Cache video segments. Preprocessor stores Video parts and metadata in temp storage. For resiliency (retry mechanism).

DAG. Scheduler
image Idea: split the video processing onto independent tasks.

Resource Manager. Also known as Task Scheduler
image

  • Is responsible for managing the resource allocation and priorities management.
  • Find optimal worker for the certain task
  • Job Queue management

Task Workers
image

Encoded Video
Is the final output of the encoding pipeline.

Video Processing Optimizations.

image
  • Upload video closer to the end user
  • High Parallelism

Video Streaming

Protocols

  • MPEG-DASH "Moving Picture Experts Group" - "Dynamic Adaptive Streaming over HTTP"
  • Apple HLS. "Http Live Streaming"
  • Microsoft Smooth Streaming
  • Adobe Http Dynamic Streaming (HDS)

Naive video streaming diagram:
image

Recommendation: They all support different video encodings. You have to choose the right streaming protocol for your business case.

Cost Optimizations

  • For less popular content, we may not need to store many encoded video versions. Short videos can be encoded on-demand.
  • Some videos are popular only in certain regions. There is no need to distribute these videos to other regions.
  • Build your own CDN like Netflix and partner with Internet Service Providers (ISPs). Building your CDN is a giant project; however, this could make sense for large streaming companies. An ISP can be Comcast, AT&T, Verizon, or other internet providers.
image