TorchCodec

TorchCodec

TorchCodec is a Python library for decoding videos into PyTorch tensors. It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML models on videos, TorchCodec is how you turn those videos into data.

We achieve these capabilities through:

Pythonic APIs that mirror Python and PyTorch conventions.
Relying on FFmpeg to do the decoding. TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a mature library with broad coverage available on most systems. It is, however, not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used correctly and efficiently.
Returning data as PyTorch tensors, ready to be fed into PyTorch transforms or used directly to train models.

Note

⚠️ TorchCodec is still in early development stage and some APIs may be updated in future versions without a deprecation cycle, depending on user feedback. If you have any suggestions or issues, please let us know by opening an issue!

Using TorchCodec

Here's a condensed summary of what you can do with TorchCodec. For a more detailed example, check out our documentation!

from torchcodec.decoders import VideoDecoder

decoder = VideoDecoder("path/to/video.mp4")

decoder.metadata
# VideoStreamMetadata:
#   num_frames: 250
#   duration_seconds: 10.0
#   bit_rate: 31315.0
#   codec: h264
#   average_fps: 25.0
#   ... (truncated output)

len(decoder)  # == decoder.metadata.num_frames!
# 250
decoder.metadata.average_fps  # Note: instantaneous fps can be higher or lower
# 25.0

# Simple Indexing API
decoder[0]  # uint8 tensor of shape [C, H, W]
decoder[0 : -1 : 20]  # uint8 stacked tensor of shape [N, C, H, W]


# Iterate over frames:
for frame in decoder:
    pass

# Indexing, with PTS and duration info
decoder.get_frame_at(len(decoder) - 1)
# Frame:
#   data (shape): torch.Size([3, 400, 640])
#   pts_seconds: 9.960000038146973
#   duration_seconds: 0.03999999910593033

decoder.get_frames_in_range(start=10, stop=30, step=5)
# FrameBatch:
#   data (shape): torch.Size([4, 3, 400, 640])
#   pts_seconds: tensor([0.4000, 0.6000, 0.8000, 1.0000])
#   duration_seconds: tensor([0.0400, 0.0400, 0.0400, 0.0400])

# Time-based indexing with PTS and duration info
decoder.get_frame_played_at(pts_seconds=2)
# Frame:
#   data (shape): torch.Size([3, 400, 640])
#   pts_seconds: 2.0
#   duration_seconds: 0.03999999910593033

You can use the following snippet to generate a video with FFmpeg and tryout TorchCodec:

fontfile=/usr/share/fonts/dejavu-sans-mono-fonts/DejaVuSansMono-Bold.ttf
output_video_file=/tmp/output_video.mp4

ffmpeg -f lavfi -i \
    color=size=640x400:duration=10:rate=25:color=blue \
    -vf "drawtext=fontfile=${fontfile}:fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" \
    ${output_video_file}

Installing TorchCodec

Install the latest stable version of PyTorch following the official instructions. For other versions, refer to the table below for compatibility between versions of torch and torchcodec.
Install FFmpeg, if it's not already installed. Linux distributions usually come with FFmpeg pre-installed. TorchCodec supports all major FFmpeg versions in [4, 7].

If FFmpeg is not already installed, or you need a more recent version, an easy way to install it is to use conda:
```
conda install ffmpeg
# or
conda install ffmpeg -c conda-forge
```
Install TorchCodec:
```
pip install torchcodec
```

The following table indicates the compatibility between versions of torchcodec, torch and Python.

`torchcodec`	`torch`	Python
`main` / `nightly`	`main` / `nightly`	`>=3.9`, `<=3.12`
not yet supported	`2.5`	`>=3.9`, `<=3.12`
`0.0.3`	`2.4`	`>=3.8`, `<=3.12`

Benchmark Results

The following was generated by running our benchmark script on a lightly loaded 56-core machine.

The top row is a Mandelbrot video generated from FFmpeg that has a resolution of 1280x720 at 60 fps and is 120 seconds long. The bottom row is promotional video from NASA that has a resolution of 960x540 at 29.7 fps and is 206 seconds long. Both videos were encoded with libx264 and yuv420p pixel format.

Planned future work

We are actively working on the following features:

Audio decoding

Let us know if you have any feature requests by opening an issue!

Contributing

We welcome contributions to TorchCodec! Please see our contributing guide for more details.

License

TorchCodec is released under the BSD 3 license.

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
packaging		packaging
src/torchcodec		src/torchcodec
test		test
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorchCodec

Using TorchCodec

Installing TorchCodec

Benchmark Results

Planned future work

Contributing

License

About

Releases 3

Packages

Contributors 9

Languages

License

pytorch/torchcodec

Folders and files

Latest commit

History

Repository files navigation

TorchCodec

Using TorchCodec

Installing TorchCodec

Benchmark Results

Planned future work

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 9

Languages

Packages