Opened 8 years ago

Closed 6 years ago

Last modified 6 years ago

#10922 closed Patch - Feature (fixed)

[PATCH] OpenGL: Speed up transfer of packed images to texture buffer

Reported by: Lawrence Rust <lvr@…> Owned by: stuartm
Priority: minor Milestone: 0.28
Component: MythTV - Video Playback Version: Master Head
Severity: medium Keywords: OpenGL texture optimize
Cc: Ticket locked: no


I have observed that OpenGL TV playback can consume large amounts of CPU time making it unrealistic for some systems.

Profiling playback shows that, for interlaced content, the main overhead is the function pack_yv12interlaced which takes around 23mS using MMX instructions on an Intel i5 661 @ 3.33GHz using onboard graphics with Linux 3.4 KMS. With other processing overheads the VideoLoop? exceeds 40mS causing frames to be dropped.

Further profiling shows that the main overhead is in writing to the texture buffer which is situated in graphics memory. This patch coalesces all writes to the texture buffer into one burst and reduces the time taken by pack_yv12interlaced to 12mS using MMX and 15mS without - a reduction of ~300%

Using this patch, playback of interlaced content is smooth and no frames are dropped.

Attachments (1)

0001-OpenGL-Optimize-writing-packed-images-to-pixel-buffe.patch (15.9 KB) - added by Lawrence Rust <lvr@…> 8 years ago.

Download all attachments as: .zip

Change History (5)

Changed 8 years ago by Lawrence Rust <lvr@…>

comment:1 Changed 8 years ago by beirdo

Owner: set to beirdo
Status: newassigned

comment:2 Changed 7 years ago by stuartm

Milestone: unknown0.28
Owner: changed from beirdo to stuartm
Status: assignedaccepted

comment:3 Changed 6 years ago by Lawrence Rust <lvr@…>

Resolution: fixed
Status: acceptedclosed

In 10d5820665ca01ed04e520651bdad4eae80da34f/mythtv:

OpenGL: Optimize writing packed images to pixel buffer objects (PBO)

pack_yv12interlaced() packs pixels and writes them to an OpenGL
pixel buffer object. This buffer may reside on the graphics card
so writes may be slower than to RAM. Combining writes to the buffer
can improve performance by 200-300%.

This patch also provides optimized non-mmx code and also tightens
const correctness.

Fixes #10922

Signed-off-by: Lawrence Rust <lvr@…>
Signed-off-by: Jean-Yves Avenard <jyavenard@…>

comment:4 Changed 6 years ago by JYA

I should add that I believe this could be optimised (mainly for DXVA and VAAPI) using the same USWC methods implemented in MythUSWCCopy class

Note: See TracTickets for help on using tickets.