Hardware-Accelerated h264 Encoding on Synology NAS
Updated after publishing: I’ve got reports of verified support on DS218+ and DS418play. I’ve added Debian Stretch-specific instructions. Added a disclaimer. Opened a pull request #30 for homebridge-camera-ffmpeg-ufv to add support for VAAPI-based video transcoding.
Disclaimer: I know very little about ffmpeg and video encoding. I played around for several days to figure out how to make hardware video transcoding to work and just wrote down my findings. I’d be happy if somebody who knows knows about these things would help me to better understand why things behave the way they do.
Many Synology NAS do have an Intel CPU that supports hardware-accelerated h264 encoding, which Intel calls QuickSync for marketing purposes. You would get around 10x improvement and most importantly real-time video transcoding with low latency. Surprisingly they seemingly do not use it themselves internally, but it’s possible to use it manually. This easily works from within Docker as well.
Synology NAS Models with Hardware h264
These instructions were verified on Synology NAS DiskStation DS718+ which uses Intel Celeron J3455. It’s the same CPU as DS918+, so this should apply to that model as well. Similar Intel Celeron J3355 also has QuickSync, all of this applies to DS218+ (verified by ArtisanalCollabo) and DS418play (verified by Arsen Vartapetov).
CPUs that do not support QuickSync and do not support hardware acceleration:
- Intel Atom C3538 and Intel Atom C2538 which are used for DS1517+, DS1618+, DS1817+, DS1819+, DS2415+, RS818+, RS1219+, RS2418+, RS2418RP+, RS2819RP+ Synology NAS models
- Intel Xeon D-1541, Intel Xeon D-1531, Intel Xeon D-1521, Intel Xeon D-1527 which are used for for RS18017xs+, RS3618xs, RS4017xs+, DS3617xs, RS1619xs+, RS3617RPxs, RS3617xs+ Synology NAS models
- Intel Pentium D1508 which are used in FS1018, DS3018xs Synology NAS models
All other NASes from Synology as of 2018 use Realtek CPUs, I do not know if they support it or not, but I lean heavily on a “no” side.
Different Flavors of Hardware-Accelerated ffmpeg
ffmpeg supports many different types of accelerated encoding. Luckly for us only libmfx, OpenCL, and VAAPI are supported by Intel CPUs on Linux. OpenCL implementation does not support hardware encoding, and libmfx is very hard to use on Linux which leaves us with only one possibility: VAAPI.
If you see
h264_qsv
recommended somewhere it would use libmfx under the hood. I have not found a simple way to make it work.
Synology’s Own ffmpeg
I was surprised to find ffmpeg version 2.7, which misses some of the hardware acceleration implementations and was released back in 2015:
$ ffmpeg 2>&1 | head -n2
ffmpeg version 2.7.1 Copyright (c) 2000-2015 the FFmpeg developers
built with gcc 4.9.3 (crosstool-NG 1.20.0) 20150311 (prerelease)
Which means it does not support any hardware implementations:
$ ffmpeg -buildconf 2>/dev/null | grep 'vaapi\|hw'
--disable-vaapi
What’s weird is that
/dev/dri/*
devices are present and initialized, which hints that Synology can somehow use hardware encoding. Most likely I’m just looking into the wrong place.
Docker-based ffmpeg and VAAPI
Check VAAPI documentation for all the internal details, I would only show a very short summary.
VAAPI is a magical API that allows ffmpeg to use hardware acceleration
for different video-related operations and works across different hardware.
It’s always one of /dev/dri/*
devices that can be used to talk to
the underlying hardware. We only need one for our purposes:
/dev/dri/renderD128
(literally, D128
is the same across platforms).
Options to add to enable VAAPI:
- Enable VAAPI
-hwaccel vaapi
- Make frame buffer format conversion to make hardware codec happy:
-hwaccel_output_format vaapi
or-vf 'format=nv12,hwupload'
or-vf 'scale_vaapi=w=1280:h=720'
- Actually use h264-codec with VAAPI:
-c:v h264_vaapi
Simplest command to verify your encoding performance using an example video Big Buck Bunny:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
jrottenberg/ffmpeg:vaapi \
-hwaccel vaapi -hwaccel_output_format vaapi \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-c:v h264_vaapi \
/tmp/example.mp4
Check a longer example in a performance section below.
Debian Stretch-based ffmpeg issues
Debian Stretch-based ffmpeg 3.2 differs from Docker defaults and
require additional tweaks to make it work.
I’ve only verified it from within Debian-based
Docker image with ffmpeg installed via apt-get install ffmpeg
, so your
results may differ.
- VAAPI-based surface format is not supported, so we can not use
-hwaccel_output_format vaapi
directly - This means we need to download decoded frames into memory and upload
them back via
-vf 'format=nv12,hwupload'
- We need to explicitly specify device to upload frames to via
-vaapi_device /dev/dri/renderD128
- Overall it’s much slower than full-speed hardware encoding, but it’s still much faster than a software one
sudo docker run --rm \
--device /dev/dri:/dev/dri \
debian:stretch-slim \
/bin/sh -c "
apt-get update
apt-get install --assume-yes ffmpeg
ffmpeg \
-hwaccel vaapi \
-vaapi_device /dev/dri/renderD128 \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf 'format=nv12,hwupload' \
-c:v h264_vaapi \
/tmp/example.mp4
"
- In some cases you can use this format without crashes
-hwaccel_output_format vaapi -vf 'format=nv12|vaapi,hwupload'
this variant has the same performance as hardware variant, but I’m not sure how portable it is
sudo docker run --rm \
--device /dev/dri:/dev/dri \
debian:stretch-slim \
/bin/sh -c "
apt-get update
apt-get install --assume-yes ffmpeg
ffmpeg \
-hwaccel vaapi -hwaccel_output_format vaapi \
-vaapi_device /dev/dri/renderD128 \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf 'format=nv12|vaapi,hwupload' \
-c:v h264_vaapi \
/tmp/example.mp4
"
You may get this warning: Hardware accelerated decoding with frame threading is known to be unstable and its use is discouraged. The way I read it it should be fixed if I specify
-threads 1
, but it does not fix it. This transcode was stable enough for my purposes, so I just ignored it.
Synology Configs & Docker
Normally I’d do something like this:
docker run --device /dev/dri:/dev/dri jrottenberg/ffmpeg:vaapi ...
Synology’s OS DSM 6.x uses its own configuration format for Docker and
does not easily allow one to override docker run
command’s command line
parameters. I have not found a documented way to configure it, but if you
configure a Docker container via a web UI and and “export” config into a file
you can add this into a plain JSON to configure devices mount:
"devices" : [
{
"CgroupPermissions": "rwm",
"PathInContainer": "\/dev/dri",
"PathOnHost": "\/dev\/dri"
}
],
Note: If you run your Dockerized app under non-priviledged user, don’t forget to give access to your devices:
chmod 777 /dev/dri/renderD128
There is no simple way of calling Dockerized ffmpeg from an another Docker image, but if you use Debian-based docker image chances are it would be as easy as:
apt-get install ffmpeg
This may pull in the up-to-date version of ffmpeg with all the right bindings and devices.
Transcoding Performance Results on DS718+
Big Buck Bunny: scaling from 1080p into 720p
fps | CPU% | fps/CPU core | |
---|---|---|---|
Software: | 30 | 380% | 8 |
Mixed1: | 40 | 70% | 60 |
Mixed2:† | 60 | 70% | 85 |
Hardware: | 110 | 85% | 130 |
Improvement: | 3x | 5x | 15x |
† For some reason hardware-only surface formats are not supported on Debian and one needs to copy data between decoder and encoder via a main memory. This is not Debian-specific, but it only affected my Debian-based Docker images for some reason. I may be mistaken and it could be that either encoder or decoder are run in software.
Software transcoding example command:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
jrottenberg/ffmpeg:vaapi \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf 'scale=1280:720' \
/tmp/example.mp4
Mixed1: Transcoding that uses hardware decoder and encoder, but copies data over through a main memory between them. This is what you get by default on Debian Stretch. Example command:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
debian:stretch-slim \
/bin/sh -c "
apt-get update
apt-get install --assume-yes ffmpeg
ffmpeg \
-hwaccel vaapi \
-vaapi_device /dev/dri/renderD128 \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf 'format=nv12,hwupload,scale_vaapi=w=1280:h=720' \
-c:v h264_vaapi \
/tmp/example.mp4
"
Mixed2: Just like Mixed1, but does one additional hack with nv12|vaapi
.
Example command:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
debian:stretch-slim \
/bin/sh -c "
apt-get update
apt-get install --assume-yes ffmpeg
ffmpeg \
-hwaccel vaapi -hwaccel_output_format vaapi \
-vaapi_device /dev/dri/renderD128 \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf 'format=nv12|vaapi,hwupload,scale_vaapi=w=1280:h=720' \
-c:v h264_vaapi \
/tmp/example.mp4
"
Hardware-accelerated transcoding example command:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
jrottenberg/ffmpeg:vaapi \
-hwaccel vaapi -hwaccel_output_format vaapi \
-i http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf 'scale_vaapi=w=1280:h=720' \
-c:v h264_vaapi \
/tmp/example.mp4
UniFi Video: transcoding into Apple HomeKit via HomeBridge
fps | CPU% | fps/CPU core | |
---|---|---|---|
Software: | 15 | 300% | 5 |
Hardware: | 30† | 20% | 150 |
Improvement: | 2x | 15x | 30x |
† Original real-timee stream is 30fps and transcoding is done in real time. It can not go any faster than 30fps.
Software transcoding example command:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
jrottenberg/ffmpeg:vaapi \
-rtsp_transport http -re -i $UNIFI_VIDEO_CAMERA_RSTP_URL?apiKey=$UNIFI_VIDEO_USER_API_KEY \
-threads 0 -vcodec libx264 -an -pix_fmt yuv420p -r 30 -f rawvideo -tune zerolatency \
-vf 'scale=1920:1080' \
/tmp/example.mp4
Hardware-accelerated transcoding example command:
sudo docker run --rm \
--device /dev/dri:/dev/dri \
jrottenberg/ffmpeg:vaapi \
-hwaccel vaapi -hwaccel_output_format vaapi \
-rtsp_transport http -re -i $UNIFI_VIDEO_CAMERA_RSTP_URL?apiKey=$UNIFI_VIDEO_USER_API_KEY \
-threads 0 -vcodec libx264 -an -pix_fmt yuv420p -r 30 -f rawvideo -tune zerolatency \
-vf 'scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi \
/tmp/example.mp4