Niceties of FFmpeg

Photo by Joey Huang on Unsplash

Niceties of FFmpeg

  • Memory Allocations

  • Hardware Acceleration

  • Display frame with SDL 2.x

  • Filters

Memory Allocations

There is no need to allocate the data buffers for the destination frame each time before avcodec_receive_frame(). The same is true for the destination frame of sws_scale(): for this case, allocate the memory of the appropriate size once and re-use (without freeing).

AVFrame *pFrame = av_frame_alloc(),
        *pDestFrame = av_frame_alloc();
int num_bytes = av_image_get_buffer_size(AV_PIX_FMT_RGB24,
                                        width, height, 32);
uint8_t* frame_buffer = (uint8_t*)av_malloc(num_bytes);    
// This call allocates the memory for destination frame data
av_image_fill_arrays(pDestFrame->data, pDestFrame->linesize, frame_buffer,
                    AV_PIX_FMT_RGB24,
                    width, height, 32);

while( true ) {
    // ...
    response = avcodec_receive_frame(pCodecContext, pSrcFrame);
    response = av_frame_copy_props(pDestFrame, pSrcFrame);
    // ...
    response = sws_scale(pSwsContext, 
            (unsigned char const* const*)(pSrcFrame->data), 
            (pSrcFrame->linesize),
            0, height,
            pDestFrame->data, pDestFrame->linesize);
}

HW

Hardware acceleration is used only if hw_device_ctx field of AVCodecContext is set to a valid buffer reference. For example, for DirectX Video Acceleration (DXVA):

const std::string hw_device_name = "dxva2";
AVHWDeviceType device_type = av_hwdevice_find_type_by_name(hw_device_name.c_str());

AVBufferRef* hw_device_ctx = NULL; 
response = av_hwdevice_ctx_create(&hw_device_ctx, device_type,
                                NULL, NULL, 0);
if (response < 0)
    av_log(NULL, AV_LOG_WARNING, "%s", av_err2str(response));
else
    pVideoCodecContext->hw_device_ctx = av_buffer_ref(hw_device_ctx);

After that, decoding is performed on the acceleration device and the decoded frame has its data allocated on a HW surface. In particular, this means that the pointers to its data are invalid in the context of a SW surface and should not be touched.

The only way to get such data is to copy it back to SW with av_hwframe_transfer_data()

// With above settings of hw_device_ctx, this is performed on HW
response = avcodec_receive_frame(pCodecContext, pFrame); 
if( pFrame->hw_frames_ctx ) { // or any other HW format
    AVFrame* sw_frame = av_frame_alloc();
    /* retrieve data from GPU to CPU */
    response = av_hwframe_transfer_data(sw_frame, /* destination */
                                        pFrame, /* source */
                                        0);
}

The data format of the frame after this copy should always be AV_PIX_FMT_NV12 (planar YUV 4:2:0, 1 plane for Y and plane for UV, which is interleaved - first byte U and the following byte V).

Display AVFrame

It will be useful to remind you again that any UI updates can only be performed from the main thread. In this respect, assume you have one (background) thread that decodes the frames, and one wishes to display them. Further, assume a shared and protected queue between these threads.

We will show how to display the poped frame on SDL 2.x. (Note also an excellent SDL tutorial here).

unique_lock<mutex> locker(m_mtx);
while (m_framesQueue.empty()) {
    m_cv.wait(locker);
}        

AVFrame* pFrameBGR = m_framesQueue.front();
m_framesQueue.pop();

// Be extended here

av_frame_free(&pFrameBGR);

This excerpt from the queue loop is executed on the main thread and just for simplicity let's assume the received frame is BGR24-formatted.

To display it on the screen, first prepare the SDL Texture:

    if (SDL_Init(SDL_INIT_VIDEO) < 0) 
        return SDL_GetError();

    m_window = SDL_CreateWindow(window_name, SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
            0, 0,
            SDL_WINDOW_FULLSCREEN_DESKTOP | SDL_WINDOW_SHOWN);
    if( m_window == NULL ) 
        return SDL_GetError();

    //Create renderer for window
    m_renderer = SDL_CreateRenderer(m_window, -1, SDL_RENDERER_ACCELERATED);
    if (m_renderer == NULL)
        return SDL_GetError();

    SDL_DisplayMode DM;
    SDL_GetDesktopDisplayMode(0, &DM);
    auto width = DM.w;
    auto height = DM.h;

    m_texture = SDL_CreateTexture(m_renderer, 
                                    SDL_PIXELFORMAT_BGR24,
                                    SDL_TEXTUREACCESS_STREAMING,
                                    width, height);

Now extend the loop:

    SDL_assert(frame->format == AV_PIX_FMT_BGR24);

    int response = SDL_UpdateTexture(m_texture, NULL, 
                    frame->data[0], 
                    frame->linesize[0]);
    if (response < 0)
        return response;

    SDL_RenderClear(m_renderer);
    if (response < 0)
        return response;

    response = SDL_RenderCopy(m_renderer, m_texture, NULL, NULL);
    if (response < 0)
        return response;

    SDL_RenderPresent(m_renderer);

Filters

The base filter syntaxis

-vf "filter1=
        setting1=value1:
        setting2=value2,
    filter2=
        setting1=value1 :
        setting2=value2
"

Settings names could be omitted. If omitted, they respored by the order mentioned in documentation.

Some examples:

ffmpeg -i pulp.mp4 -i google.png -filter_complex "[0:v][1:v]overlay=10:10" output.mp4

[0:v] - video stream from a first input (pulp.mp4)

[1:v] - video stream of the second input (google.png)

ffmpeg -i pulp.mp4 -vf thumbnail=n=30 output.mp4

It is a "thumbnail" filter that in this case select the most representative frame (thumbnail) from each 100 frames.

If you want to have multiple representative frames from

dsf