Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#12046 closed Patch - Bug Fix (Fixed)

Fix random SBE PlaybackSock timeout in MBE

Reported by: Cédric Schieli <cschieli@…> Owned by: JYA
Priority: major Milestone: unknown
Component: MythTV - General Version: 0.27-fixes
Severity: medium Keywords: SBE PlaybackSock timeout
Cc: Stuart Auchterlonie Ticket locked: no

Description

SBE PlaybackSocks? in the master suffer from random disconnection, occuring after a 7000 ms timeout. It often happens when a frontend ask for the thumbnail of a program currently being recorded on a slave backend.

I could identify two problems causing this:

First, there is a race between MainServer::ProcessRequestWork? and PlaybackSock::SendReceiveStringList?. Even if callbacks are disabled during SendReceiveStringList? execution, a ProcessRequestWork? may already be running and can swallow the reply, leading to the timeout in SendReceiveStringList?.

The second problem is that an invocation of ProcessRequestWork? is fired for each block of data arriving in the socket (for example when a reply is long enough to be fragmented, ie. GENERATED_PIXMAP) but this data is consumed all at once by one worker, leaving the other workers without food. This also leads to the timeout in ReadStringList?.

This patch fixes the first problem by assuring that no worker reads from the socket while a SendReceiveStringList? is running and the second one by aborting a worker if there is no more data to read, but only once the lock has been acquired.

Attachments (1)

0001-Fix-random-SBE-PlaybackSock-timeout-in-MBE.patch (4.4 KB) - added by Cédric Schieli <cschieli@…> 10 years ago.
Fix random SBE PlaybackSock? timeout in MBE

Download all attachments as: .zip

Change History (11)

Changed 10 years ago by Cédric Schieli <cschieli@…>

Fix random SBE PlaybackSock? timeout in MBE

comment:1 Changed 10 years ago by Cédric Schieli <cschieli@…>

comment:2 Changed 10 years ago by stuartm

Owner: set to stuartm
Status: newaccepted

comment:3 Changed 10 years ago by Stuart Auchterlonie

Cc: Stuart Auchterlonie added

comment:4 Changed 10 years ago by JYA

it looks to me that the reference count on sock is increased too many times:

    sock->IncrRef(); 
    ReferenceLocker rlocker(sock); 

this will cause the sock's reference counter twice, and always by one once the function exits.

ultimately, the socket will never be destroyed and will leak

comment:5 Changed 10 years ago by JYA

Owner: changed from stuartm to JYA
Status: acceptedassigned

comment:6 Changed 10 years ago by JYA

Not a consequence of this patch but a still existing problem.

But it seems to me that while now the socket won't be closed unnecessarily, and process request sent whose data has been consumed by another will find the request dismissed and ultimately ignored.

there should be a way to queue the request, or have ReadStringList? not consume more data that it should (re-injecting discarded data in the socket)

comment:7 Changed 10 years ago by JYA

Actually, forget that last comment... Looking as MythSocket::ReadStringList?, it only uses the data, not all that is in the buffer.

so the comment: "data is consumed all at once by one worke" is not actually correct. However, there could indeed be a race there...

comment:8 Changed 10 years ago by JYA

Resolution: Fixed
Status: assignedclosed

Fixed: commit c9395d7c96c06cc6f508b5cbbf87979ea2c5de0b Author: Bradley Baetz <bbaetz@…> Date: Thu May 1 09:43:24 2014 +1000

Fix random SBE PlaybackSock? timeout in MBE.

comment:9 Changed 10 years ago by JYA

Apologies, I misread how the ReferenceLocker? class worked which introduced a massive regression.

When I re-applied the patch, I forget to put back the original author...

comment:10 Changed 10 years ago by Cédric Schieli <cschieli@…>

Thanks a lot Jean-Yves for applying this patch.

I still think that all the data of one (fragmented) reply is read all at once by only one MythSocket::ReadStringList? invokation. It will loop until the announced size of the reply (which means the total size of all the fragments) has arrived in the buffer and then consume it all. The other workers fired on each subsequent TCP data packet arrival of the same fragmented reply will never see that data.

Note: See TracTickets for help on using tickets.