Opened 11 years ago
Closed 11 years ago
Last modified 11 years ago
#12046 closed Patch - Bug Fix (Fixed)
Fix random SBE PlaybackSock timeout in MBE
Reported by: | Owned by: | JYA | |
---|---|---|---|
Priority: | major | Milestone: | unknown |
Component: | MythTV - General | Version: | 0.27-fixes |
Severity: | medium | Keywords: | SBE PlaybackSock timeout |
Cc: | Stuart Auchterlonie | Ticket locked: | no |
Description
SBE PlaybackSocks? in the master suffer from random disconnection, occuring after a 7000 ms timeout. It often happens when a frontend ask for the thumbnail of a program currently being recorded on a slave backend.
I could identify two problems causing this:
First, there is a race between MainServer::ProcessRequestWork? and PlaybackSock::SendReceiveStringList?. Even if callbacks are disabled during SendReceiveStringList? execution, a ProcessRequestWork? may already be running and can swallow the reply, leading to the timeout in SendReceiveStringList?.
The second problem is that an invocation of ProcessRequestWork? is fired for each block of data arriving in the socket (for example when a reply is long enough to be fragmented, ie. GENERATED_PIXMAP) but this data is consumed all at once by one worker, leaving the other workers without food. This also leads to the timeout in ReadStringList?.
This patch fixes the first problem by assuring that no worker reads from the socket while a SendReceiveStringList? is running and the second one by aborting a worker if there is no more data to read, but only once the lock has been acquired.
Attachments (1)
Change History (11)
Changed 11 years ago by
Attachment: | 0001-Fix-random-SBE-PlaybackSock-timeout-in-MBE.patch added |
---|
comment:2 Changed 11 years ago by
Owner: | set to stuartm |
---|---|
Status: | new → accepted |
comment:3 Changed 11 years ago by
Cc: | Stuart Auchterlonie added |
---|
comment:4 Changed 11 years ago by
it looks to me that the reference count on sock is increased too many times:
sock->IncrRef(); ReferenceLocker rlocker(sock);
this will cause the sock's reference counter twice, and always by one once the function exits.
ultimately, the socket will never be destroyed and will leak
comment:5 Changed 11 years ago by
Owner: | changed from stuartm to JYA |
---|---|
Status: | accepted → assigned |
comment:6 Changed 11 years ago by
Not a consequence of this patch but a still existing problem.
But it seems to me that while now the socket won't be closed unnecessarily, and process request sent whose data has been consumed by another will find the request dismissed and ultimately ignored.
there should be a way to queue the request, or have ReadStringList? not consume more data that it should (re-injecting discarded data in the socket)
comment:7 Changed 11 years ago by
Actually, forget that last comment... Looking as MythSocket::ReadStringList?, it only uses the data, not all that is in the buffer.
so the comment: "data is consumed all at once by one worke" is not actually correct. However, there could indeed be a race there...
comment:8 Changed 11 years ago by
Resolution: | → Fixed |
---|---|
Status: | assigned → closed |
Fixed: commit c9395d7c96c06cc6f508b5cbbf87979ea2c5de0b Author: Bradley Baetz <bbaetz@…> Date: Thu May 1 09:43:24 2014 +1000
Fix random SBE PlaybackSock? timeout in MBE.
comment:9 Changed 11 years ago by
Apologies, I misread how the ReferenceLocker? class worked which introduced a massive regression.
When I re-applied the patch, I forget to put back the original author...
comment:10 Changed 11 years ago by
Thanks a lot Jean-Yves for applying this patch.
I still think that all the data of one (fragmented) reply is read all at once by only one MythSocket::ReadStringList? invokation. It will loop until the announced size of the reply (which means the total size of all the fragments) has arrived in the buffer and then consume it all. The other workers fired on each subsequent TCP data packet arrival of the same fragmented reply will never see that data.
Fix random SBE PlaybackSock? timeout in MBE