Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#5890 closed patch (fixed)

libmythsoundtouch: SIMD optimisation patch

Reported by: foobum@… Owned by: Janne Grunau
Priority: minor Milestone: 0.22
Component: mythtv Version: head
Severity: medium Keywords: soundtouch SIMD
Cc: mark@…, Rob Smith Ticket locked: no

Description

The MMX optimised routines in libmythsoundtouch appear to be disabled since ALLOW_MMX is never defined. Also, the MMX routines were explicitly disabled for x86_64 since they give garbled output on that arch. This patch:

  • Defines ALLOW_MMX in soundtouch if (arch == x86 && defined(MMX))
  • Adds integer SSE2 routines for x86_64

This results in a huge speed increase - enough to make 1.2x possible with 1080i mpeg2 and 6ch AC-3 audio on a 2.2Ghz amd64 with cycles to spare.

Tested on x86_64 only. Patch applies against fixes and trunk.

Attachments (5)

soundtouch.patch (12.5 KB) - added by foobum@… 10 years ago.
soundtouch-update.patch (11.3 KB) - added by foobum@… 10 years ago.
soundtouch-update-1.patch (13.1 KB) - added by foobum@… 10 years ago.
soundtouch-update-2.patch (14.5 KB) - added by foobum@… 10 years ago.
st-fix.patch (989 bytes) - added by foobum@… 10 years ago.

Download all attachments as: .zip

Change History (26)

Changed 10 years ago by foobum@…

Attachment: soundtouch.patch added

comment:1 Changed 10 years ago by jppoet@…

This patch does greatly improve timestretch performance under 64bit. Thank you for that!

I did notice, however, that it causes DD5.1 to fall back to DD2.0.

comment:2 in reply to:  1 ; Changed 10 years ago by foobum@…

Replying to jppoet@gmail.com:

I did notice, however, that it causes DD5.1 to fall back to DD2.0.

The patch doesn't cause this, timestretching 6 ch AC-3 has always given you 2 (reencoded) channels.

However, this patch will be followed by another that addresses this issue amongst a bunch of other things. Just tidying it up.

comment:3 in reply to:  2 ; Changed 10 years ago by jppoet@…

Replying to foobum@gmail.com:

Replying to jppoet@gmail.com:

I did notice, however, that it causes DD5.1 to fall back to DD2.0.

The patch doesn't cause this, timestretching 6 ch AC-3 has always given you 2 (reencoded) channels.

Actually, Mark Spieth added DD5.1 support 9 months ago in ticket #1104

However, this patch will be followed by another that addresses this issue amongst a bunch of other things. Just tidying it up.

Cool Thanks for your efforts.

comment:4 in reply to:  3 Changed 10 years ago by foobum@…

Replying to jppoet@gmail.com:

Actually, Mark Spieth added DD5.1 support 9 months ago in ticket #1104

Yep, but it downmixed DD5.1 to DD2.0 once timestretch was on. This is fixed in the next patch. Glad it's been useful to you.

comment:5 Changed 10 years ago by Rob Smith

Cc: Rob Smith added

comment:6 Changed 10 years ago by Janne Grunau

Milestone: unknown0.22
Owner: changed from Isaac Richards to Janne Grunau
Status: newassigned

how is the SSE asm related to sse_optimized.cpp in upstream soundtouch?

comment:7 in reply to:  6 Changed 10 years ago by foobum@…

Replying to janne:

how is the SSE asm related to sse_optimized.cpp in upstream soundtouch?

It's not.

comment:8 Changed 10 years ago by foobum@…

Think I missed the gist of your question. sse_optimized.cpp implements filters and calcCrossCorrStereo for float samples. The asm in the patch implements overlap and calcCrossCorr for stereo and multichannel and 16 bit int samples. The existing MMX filter code works fine so we use that for both x86 and x86_64.

comment:9 Changed 10 years ago by Janne Grunau

yes, I've seen that after looking at the code. Do you care enough to submit your asm to upstream soundtouch?

ALLOW_MMX is defined in STTypes.h, no need to define it in the project file. I also don't like the ARCH_X86_64 ifdefs. I've changed the preference of mmx and sse code in TDStretch.cpp

comment:10 Changed 10 years ago by Janne Grunau

Resolution: fixed
Status: assignedclosed

(In [19024]) Add SSE optimized routines for soundtouch. Fixes #5890

Speedup for X86_64 since MMX SIMD routines unfortunately don't work there.

patch by: <foobum gmail com>

comment:11 in reply to:  9 Changed 10 years ago by foobum@…

Replying to janne:

yes, I've seen that after looking at the code. Do you care enough to submit your asm to upstream soundtouch?

Perhaps, although I haven't looked at whether their MMX suffers the same problem.

ALLOW_MMX is defined in STTypes.h, no need to define it in the project file. I also don't like the ARCH_X86_64 ifdefs. I've changed the preference of mmx and sse code in TDStretch.cpp

Ah.. Looks fine. Ta.

Changed 10 years ago by foobum@…

Attachment: soundtouch-update.patch added

comment:12 Changed 10 years ago by foobum@…

An incremental patch that improves the asm, makes it x86 compatible and turns it on for x86 (where it should be noticeably faster than the mmx, unfortunately I can't test).

comment:13 Changed 10 years ago by Janne Grunau

Resolution: fixed
Status: closednew

comment:14 Changed 10 years ago by Janne Grunau

Status: newassigned

Which SSE version is do you use that?

iirc soundtouch has no runtime CPU detection. you just can't assume that a MMX capable cpu has also SSE?

comment:15 in reply to:  14 Changed 10 years ago by foobum@…

Replying to janne:

SSE2 and no, I guess we shouldn't assume that.. I'll add runtime cpu detection and update patch tomorrow.

Changed 10 years ago by foobum@…

Attachment: soundtouch-update-1.patch added

comment:16 Changed 10 years ago by foobum@…

Sorry, soundtouch does have runtime cpu detection. I have, however, renamed the define and the class SSE -> SSE2 and changed the rt cpu check to check explicitly for SSE2.

Changed 10 years ago by foobum@…

Attachment: soundtouch-update-2.patch added

comment:17 Changed 10 years ago by foobum@…

Latest (-2) makes the asm more 'correct' - the SSE2 now produces output exactly equal to that of the C (unlike the MMX). Now works properly again when compiled with -O3 too (oops).

comment:18 Changed 10 years ago by Janne Grunau

Status: assignedaccepted

comment:19 Changed 10 years ago by Janne Grunau

Resolution: fixed
Status: acceptedclosed

(In [19283]) soundtouch SSE update. Fixes #5890

bit exact to C code, works for X86_32 patch by: foobum gmail com

Changed 10 years ago by foobum@…

Attachment: st-fix.patch added

comment:20 Changed 10 years ago by foobum@…

Sorry! Last diff, honest.. Missed a couple of lines in OverlapMulti(). This makes OverlapMulti()'s output bit exact too.

comment:21 Changed 10 years ago by Janne Grunau

(In [19291]) make another soundtouch SSE function bit exact. Refs #5890

patch by: foobum gmail com

Note: See TracTickets for help on using tickets.