Skip to Content

Top K Videos

A video service (like youtube) has many viewers watching videos. Given a stream of the video IDs that are being watched, we need to find the top K most viewed videos for different periods of time (1 hour, 1 day, 1 month, all time). For the top K videos returned, we also want the count of views during this period.

Functional requirements

Query top K most viewed videos (1000 is max value for k)

Accept a parameter for the time period (minute, hour, day, all time)

Sliding window

No arbitrary starting points

Non Functional Requirements

10-100ms read latency

Assuming massive viewer volume and number of videos, similar to youtube

..................

To tackle this system design problem, you'll need to think about scalability, real-time data processing, and maintaining low latency for high-read scenarios. With the high volume of users and videos, it's essential to distribute the load and ensure that queries for top-K videos can be processed quickly and accurately, even as new views come in.

A load balancer would be the first component to direct incoming requests to multiple servers, ensuring that no single server becomes a bottleneck. As new video views are streamed in, they can be processed by a streaming platform like Apache Kafka, which is well-suited for handling high-throughput data streams. Kafka allows for the real-time processing of incoming views and can distribute this data across multiple partitions to balance the load.

To compute and maintain the top-K most viewed videos, a separate Top-K Service could be set up. This service would ingest data from the Kafka stream, maintaining a set of counters (e.g., using Redis or a custom in-memory data structure) to track view counts for each video over various time periods (hour, day, month). For storing the top K items efficiently, a min-heap (or max-heap) could be used for each time window, allowing for quick updates as new data comes in. For example, for the top 1000 most viewed videos, the heap would ensure that only the top 1000 are retained in memory. This service should be optimized for low-latency updates, enabling real-time adjustments as view counts change.

Because time-based queries (hour, day, etc.) require aggregating views within specific windows, implementing sliding window logic is crucial. Kafka Streams, or similar processing frameworks like Apache Flink, can help handle these windowed aggregations and ensure that view counts for different time periods are kept up to date without arbitrary start and end times. To ensure data is not lost in case of failures, checkpointing mechanisms should be in place to periodically save the state of view counts to a durable storage, like HDFS or a relational database, enabling quick recovery.

To speed up reads and meet the latency requirements, caching layers like Redis or Memcached can store the results of recent top-K queries, particularly for frequent requests like "top K videos for the past hour." This will allow you to serve users without recalculating the top K from scratch every time. Additionally, services like ZooKeeper can manage distributed service coordination, such as ensuring that Kafka consumers don’t overlap and properly balance partition consumption.

By combining streaming data processing, efficient data structures, and caching, you can build a robust service that meets the real-time demands of a large video streaming platform while maintaining quick response times for top-K queries across different time periods.

Posted by grwgreg 7 days ago

Related Problems

Functional Requirements

1. As users type text in a search box, show the top 10 auto complete results with very low latency

2. Analytics will be collected on what the user types

Design a service with the following functional requirements

1. Users should be able to upload and download files

2. The files should be able to be shared with other users

3. Changes to the files should be pushed to other users with the content on their machine

4. There must be no risk of file corruption

5. Keep track of different versions of the files so they may be rolled back

6. Users should be able to edit files without an internet connection and the changes sync up when a connection becomes available

Design a social network website with the following functional requirements

1. Users should be able to post content with text, images or video 2. Users should be able to follow other users 3. Each user will have a relatively low latency feed which shows content posted by users they follow

Functional Requirements

The ability to set limits on the number of requests allowed within a specific timeframe

Keep performance and fault tolerance in mind