Opened 4 years ago
Closed 4 years ago
Last modified 3 years ago
#13598 closed Bug Report - Crash (Fixed)
mythbackend segfault when frontend and mythweb access program listing
Reported by: | Peter Bennett | Owned by: | Peter Bennett |
---|---|---|---|
Priority: | minor | Milestone: | 31.0 |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
This happened twice on my development master backend, when looking at program listings from the frontend and mythweb.
This is the log at the time
2020-02-15 14:39:26.894643 I MainServer: adding: andromeda(55971de35910) as a file transfer 2020-02-15 14:39:26.897840 I MainServer: adding: andromeda(55971de34680) as a file transfer 2020-02-15 14:39:26.897845 I FileTransfer sock(55971de37a70) disconnected Handling Segmentation fault Segmentation fault (core dumped)
Here is the relevant part of the backtrace. The segfault is in thread 1 in code that is called from QT event processors.
Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055971c7a3bab in MainServer::connectionClosed (this=0x55971dd2ee00, socket=0x55971de37a70) at mainserver.cpp:7896 7896 (*ft)->DecrRef(); [Current thread is 1 (Thread 0x7ff91cbe9700 (LWP 10763))] Thread 1 (Thread 0x7ff91cbe9700 (LWP 10763)): #0 0x000055971c7a3bab in MainServer::connectionClosed(MythSocket*) (this=0x55971dd2ee00, socket=0x55971de37a70) at mainserver.cpp:7896 sock = 0x55971de37a70 ft = 0x0 __FUNCTION__ = "connectionClosed" cs = {i = {i = 0x7ff9698c2d20 <QSocketNotifier::setEnabled(bool)+144>}} #1 0x00007ff96c085fe8 in MythSocket::DisconnectHandler() (this=0x55971de37a70) at mythsocket.cpp:265 __FUNCTION__ = "DisconnectHandler" #2 0x00007ff96c1cf885 in MythSocket::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0x55971de37a70, _c=QMetaObject::InvokeMetaMethod, _id=4, _a=0x7ff91cbe88c0) at moc/moc_mythsocket.cpp:148 _t = 0x55971de37a70
This is the code around the segfault location [(*ft)->DecrRef() is the line which got the seg fault]
for (auto ft = m_fileTransferList.begin(); ft != m_fileTransferList.end(); ++ft) { MythSocket *sock = (*ft)->getSocket(); if (sock == socket) { LOG(VB_GENERAL, LOG_INFO, QString("FileTransfer sock(%1) disconnected") .arg(quintptr(socket),0,16) ); (*ft)->DecrRef(); m_fileTransferList.erase(ft); m_sockListLock.unlock(); UpdateSystemdStatus(); return; } }
In one backtrace ft was 0 and in the other it has a good looking value, but examining the FileTransfer structure it points to gives garbage, for example the FileTransfer m_sock object accesses an invalid location and some boolean variables have values like 114 instead of true or false.
My dilemma is: How can ft (a variable on the stack) be corrupted between the call to (*ft)->getSocket() and (*ft)->DecrRef()? Other threads should not have corrupted it because this thread's stack is not accessible to other threads. In any case, the call is surrounded by m_sockListLock.lockForWrite() and m_sockListLock.unlock() which ensures single threading through this code.
ft is type iterator of <FileTransfer * > and is effectively FileTransfer * *.
Attachments (2)
Change History (13)
Changed 4 years ago by
Attachment: | 20200215_backend_segfault.txt added |
---|
comment:1 Changed 4 years ago by
The loop construction in the code fragment looks similar to that in ticket #13571.
I am probably making a fool of myself but could it be that the "for (auto ft" does do something different from what the code used to do before the "auto" was introduced?
comment:2 Changed 4 years ago by
It does look like the problem may be around iterators perhaps combined with "auto".
Maybe changing it to use a normal subscript counter instead of an iterator and changing "auto" to the actual type may work around some obscure compiler bug that is causing this?
comment:3 Changed 4 years ago by
My guess at this moment is that changing the "auto" to the actual type would be enough. However, it would be good to find the root cause. Comparison of the generated assembly code for the "auto" and for the actual type should then give differences, but it is a long time since I've done things like that.
Problem for testing with ticket #13571 is that the reproducibility is low. I've added there a few log statements and since then it does not crash anymore.....
comment:4 Changed 4 years ago by
It seems that adding elements to m_fileTransferList in MainServer::HandleAnnounce? is no longer protected by a lock since ea5fdd45bdab01aa6f3d577db7e344a90aae98a6. I guess that could be the issue here.
comment:5 Changed 4 years ago by
Owner: | set to Peter Bennett |
---|---|
Status: | new → accepted |
comment:6 Changed 4 years ago by
Thank you Jonatan for finding that missing lock. I will start by fixing that and see if there are any recurrences. This bug is also difficult to reproduce.
comment:10 Changed 4 years ago by
Resolution: | → Fixed |
---|---|
Status: | accepted → closed |
Closing because on recurrence has been observed since the fix was made.
comment:11 Changed 3 years ago by
Milestone: | needs_triage → 31.0 |
---|
Backtrace 1