FEATURE: New `Video` API #1924

Ashp116 · 2025-07-30T19:29:49Z

Description

This PR introduces a new Video API that streamlines video processing and rendering workflows. It addresses both issues #1923 and #1929 by enabling more flexible backend support and improved audio-video synchronization.

With this update, the video processing function now supports multiple backends, including PyAV and OpenCV. Notably, PyAV is the only backend currently supporting audio rendering, which significantly improves output quality.

This PR requires the optional dependency pyAV for the video rendering backend.

Tags:
Fixes #1923
Fixes #1929

Type of change

Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Please refer to #1923 and #1929

Any specific deployment considerations

Ensure that pyAV is installed in the environment to test pyAV backend.

Docs

Docs updated? What were the changes

…supervision into bug/process-video-audio

SkalskiP · 2025-07-31T15:32:54Z

Hi @Ashp116 👋🏻 Another great idea! Video processing is probably the oldest part of supervision, written over two years ago, and I’ve been wanting to update its API for a while. Would you be open to not only adding audio support but also helping me with the update?

Ashp116 · 2025-07-31T18:23:34Z

Hi @SkalskiP, yea, I would like to help update the API. I was thinking of changing how videos are written in process_video. The original compression is lost when annotations are added and the file is written to a target_path. But yea, I would like to help out with the update.

SkalskiP · 2025-08-01T09:54:27Z

Hi @Ashp116 I'm really glad you want to help me! Let's goooo! 🔥 🔥 🔥

I want the functionalities currently found in supervision.utils.video to be reorganized around a new Video class. Importantly, all features previously available in the old API must still be supported in the new one. Ideally, the new API should be more consistent and expressive.

get video info (works for files, RTSP, webcams)

import supervision as sv
 
# static video
sv.Video("source.mp4").info

# video stream
sv.Video("rtsp://...").info

# webcam
sv.Video(0).info

simple frame iteration (object is iterable)

import supervision as sv

video = sv.Video("source.mp4")
for frame in video:
    ...

advanced frame iteration (stride, sub-clip, on-the-fly resize)

import supervision as sv

for frame in sv.Video("source.mp4").frames(stride=5, start=100, end=500, resolution_wh=(1280, 720)):
    ...

process the video

import cv2
import supervision as sv

def blur(frame, i):
    return cv2.GaussianBlur(frame, (11, 11), 0)

sv.Video("source.mp4").save(
    "blurred.mp4",
    callback=blur,
    show_progress=True
)

overwrite target video parameters

import supervision as sv

sv.Video("source.mp4").save(
    "timelapse.mp4",
    fps=60,
    callback=lambda f, i: f,
    show_progress=True
)

complete manual control with explicit VideoInfo

from supervision import Video, VideoInfo

source = Video("source.mp4")
target_info = VideoInfo(width=800, height=800, fps=24)

with src.sink("square.mp4", info=target_info) as sink:
    for f in src.frames():
        f = cv2.resize(f, target_info.resolution_wh)
        sink.write(f)

multi-backend support decode/encode

import supervision as sv

video = sv.Video("source.mkv", backend="pyav")

video = sv.Video("source.mkv", backend="opencv")

suggested minimal protocol

class Backend(Protocol):
    def open(self, path: str) -> Any: ...
    def info(self, handle: Any) -> VideoInfo: ...

    def read(self, handle: Any) -> tuple[bool, np.ndarray]: ...
    def grab(self, handle: Any) -> bool: ...
    def seek(self, handle: Any, frame_idx: int) -> None: ...

    def writer(self, path: str, info: VideoInfo, codec: str) -> Writer: ...

class Writer(Protocol):
    def write(self, frame: np.ndarray) -> None: ...
    def close(self) -> None: ...

UPDATE: Added a new Video class with OpenCV writer and backend

…supervision into bug/process-video-audio

Ashp116 · 2025-08-02T06:52:16Z

Hi @SkalskiP,

I’ve addressed most of the features you mentioned, but I have some thoughts on a few aspects of the implementation:

.save Functionality
How would you handle .save for a video feed coming from a webcam or an RTSP stream? Currently, I have it where only video files can be saved.
Writer and Backend Classes
This is just my personal opinion, but should these classes be moved to separate scripts/modules? If we add more writers and backends in the future, keeping everything inside the main video script might become cluttered.
“Complete manual control with explicit VideoInfo” Functionality
```
from supervision import Video, VideoInfo

source = Video("source.mp4")
target_info = VideoInfo(width=800, height=800, fps=24)

with src.sink("square.mp4", info=target_info) as sink:
    for f in src.frames():
        f = cv2.resize(f, target_info.resolution_wh)
        sink.write(f)
```
I’m not fully clear on what this feature is intended to do. In this snippet, the Video instance source is created but never used afterward. Is src supposed to be source? Also, is the goal to create sinks for each backend? Could you please clarify the purpose and expected usage here?

Ashp116 · 2025-08-12T02:45:33Z

Hi @SkalskiP,

Thank you for reviewing the PR. I have addressed all the comments from the review. Could you please take a look at the following points?

render_audio parameter:
You mentioned the need for this parameter. I agree it’s necessary. Here is my response:

I think we do. pyAV’s default compression codec is h264, which produces much better quality than OpenCV’s mp4v. If a user wants to render only the video frames without audio using pyAV, this parameter allows that. I also suggest setting render_audio to None with the default as True.
Overall, ffmpeg’s default compression leads to better video outputs.

.show() feature:
I think this is a useful addition, and I have implemented it. However, there are some issues: currently, cv2.imshow is used to render the frame with a wait time of 1 ms. This causes the display to not match the correct FPS for a given video source. There are ways to address this, but they would require adding a dependency. Could you share your thoughts on this implementation approach?
EDIT: I added support for headless and notebook support. Solid points that were mentioned here
pyAV webcam bug:
In my previous review, I completely missed webcam support for the pyAV backend. I’ve added this in the current PR. Using a webcam with pyAV requires a different code path. Could you help test this feature on other devices? I’ve verified it works on a Windows machine.

Please let me know if you have any feedback or suggestions. Thanks

…().show()

Ashp116 · 2025-09-01T17:14:27Z

Hi @SkalskiP,

It’s been a while! I’ve added better audio support. Previously, I manually manipulated audio packets, along with DTS and PTS values, to synchronize them with the video. Now, I’m using the atempo filter on audio streams, which matches the video much more cleanly.

I’ve included my Colab notebook showcasing the new .show() function. Next, I’ll be working on the documentation and unit tests for audio.

I’d love to hear your thoughts and get your feedback on my current implementation.

Thank you!

ryashry

Improve Hight quality video web

ADD: Added audio stream for process_video

7fba113

Ashp116 requested a review from SkalskiP as a code owner July 30, 2025 19:29

pre-commit-ci bot and others added 3 commits July 30, 2025 19:30

fix(pre_commit): 🎨 auto format pre-commit hooks

8947f77

REMOVE: Removed ffprobe

73b5836

Merge branch 'bug/process-video-audio' of https://github.com/Ashp116/…

e02d298

…supervision into bug/process-video-audio

Ashp116 changed the title ~~ADD: Added audio stream for process_video~~ BUG: Added audio stream for process_video Jul 30, 2025

Ashp116 and others added 16 commits August 1, 2025 22:51

UPDATE: Added a new Video class with OpenCV writer and backend

5e07794

Merge pull request #1 from Ashp116/update/video-core

46ec693

UPDATE: Added a new Video class with OpenCV writer and backend

fix(pre_commit): 🎨 auto format pre-commit hooks

b2096d0

Precommit

9fb7098

fix(pre_commit): 🎨 auto format pre-commit hooks

850a2c6

Precommit

46900f8

Merge branch 'bug/process-video-audio' of https://github.com/Ashp116/…

34cb9a1

…supervision into bug/process-video-audio

fix(pre_commit): 🎨 auto format pre-commit hooks

c700394

UPDATE: Fixed incomplete write closing

fce8ade

ADD: Docstrings

f86f4f2

fix(pre_commit): 🎨 auto format pre-commit hooks

2265977

UPDATE: Allow for ffmpeg error passthrough

bf67bfa

UPDATE: Writer and Backend abstract class

ec4bd01

Precommit

b9e7968

fix(pre_commit): 🎨 auto format pre-commit hooks

a96c3f0

Precommit

a6c91bc

Ashp116 changed the title ~~BUG: Added audio stream for process_video~~ FEATURE: Versatile Video class Aug 2, 2025

Ashp116 mentioned this pull request Aug 5, 2025

Reimplement video utils #1929

Open

Ashp116 added 2 commits August 6, 2025 16:21

UPDATE: Added manual control

d075e03

ADD: Added docstrings

7f078ff

Ashp116 requested a review from SkalskiP August 12, 2025 02:45

Ashp116 and others added 26 commits August 13, 2025 02:07

UPDATE: Add support for IPython display

11fc8a5

UPDATE: Added support for headless machines and notebook for sv.Video…

1dac635

…().show()

fix(pre_commit): 🎨 auto format pre-commit hooks

dedb68a

UPDATE: Updated error msg for IPython

035196a

fix(pre_commit): 🎨 auto format pre-commit hooks

a1218e8

Precommit

76d8145

fix(pre_commit): 🎨 auto format pre-commit hooks

32a5e2c

UPDATE: Fixed av module install

7680cae

UPDATE: Revert av error

68fb727

UPDATE: updated av module getter

3f403fe

fix(pre_commit): 🎨 auto format pre-commit hooks

c9badc1

UPDATE: Updated .show() with more configuration params

7a507e5

fix(pre_commit): 🎨 auto format pre-commit hooks

824fa98

UPDATE: Updated IPython import

8aa364c

fix(pre_commit): 🎨 auto format pre-commit hooks

2bec991

BUG: Frame iteration fix

43830f7

UPDATE: Updated audio stream to use atempo reflecting changes in fps

f827455

UPDATE: Updated docstrings

2d90915

fix(pre_commit): 🎨 auto format pre-commit hooks

e9ccca2

UPDATE: Changed backend type class and added ref to root

9f115c4

fix(pre_commit): 🎨 auto format pre-commit hooks

c67aad3

BUG: Appending fixes for VideoBackend error

ba9efc2

FIX: Merge conflicts

1dcbb4b

fix(pre_commit): 🎨 auto format pre-commit hooks

9c7a9ec

UPDATE: Decompose playback speed into valid atempo chain

fb2171c

fix(pre_commit): 🎨 auto format pre-commit hooks

c78e4f7

ryashry suggested changes Sep 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEATURE: New `Video` API #1924

FEATURE: New `Video` API #1924

Uh oh!

Ashp116 commented Jul 30, 2025 •

edited

Loading

Uh oh!

SkalskiP commented Jul 31, 2025

Uh oh!

Ashp116 commented Jul 31, 2025

Uh oh!

SkalskiP commented Aug 1, 2025 •

edited

Loading

Uh oh!

Ashp116 commented Aug 2, 2025

Uh oh!

Ashp116 commented Aug 12, 2025 •

edited

Loading

Uh oh!

Ashp116 commented Sep 1, 2025

Uh oh!

ryashry left a comment

Uh oh!

Uh oh!

FEATURE: New Video API #1924

Are you sure you want to change the base?

FEATURE: New Video API #1924

Uh oh!

Conversation

Ashp116 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

SkalskiP commented Jul 31, 2025

Uh oh!

Ashp116 commented Jul 31, 2025

Uh oh!

SkalskiP commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ashp116 commented Aug 2, 2025

Uh oh!

Ashp116 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ashp116 commented Sep 1, 2025

Uh oh!

ryashry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FEATURE: New `Video` API #1924

FEATURE: New `Video` API #1924

Ashp116 commented Jul 30, 2025 •

edited

Loading

SkalskiP commented Aug 1, 2025 •

edited

Loading

Ashp116 commented Aug 12, 2025 •

edited

Loading