Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#13598 closed Bug Report - Crash (Fixed)

mythbackend segfault when frontend and mythweb access program listing

Reported by: Peter Bennett Owned by: Peter Bennett
Priority: minor Milestone: 31.0
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

This happened twice on my development master backend, when looking at program listings from the frontend and mythweb.

This is the log at the time

2020-02-15 14:39:26.894643 I  MainServer: adding: andromeda(55971de35910) as a file transfer
2020-02-15 14:39:26.897840 I  MainServer: adding: andromeda(55971de34680) as a file transfer
2020-02-15 14:39:26.897845 I  FileTransfer sock(55971de37a70) disconnected
Handling Segmentation fault
Segmentation fault (core dumped)

Here is the relevant part of the backtrace. The segfault is in thread 1 in code that is called from QT event processors.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055971c7a3bab in MainServer::connectionClosed (this=0x55971dd2ee00, socket=0x55971de37a70) at mainserver.cpp:7896
7896	            (*ft)->DecrRef();
[Current thread is 1 (Thread 0x7ff91cbe9700 (LWP 10763))]

Thread 1 (Thread 0x7ff91cbe9700 (LWP 10763)):
#0  0x000055971c7a3bab in MainServer::connectionClosed(MythSocket*) (this=0x55971dd2ee00, socket=0x55971de37a70) at mainserver.cpp:7896
        sock = 0x55971de37a70
        ft = 0x0
        __FUNCTION__ = "connectionClosed"
        cs = {i = {i = 0x7ff9698c2d20 <QSocketNotifier::setEnabled(bool)+144>}}
#1  0x00007ff96c085fe8 in MythSocket::DisconnectHandler() (this=0x55971de37a70) at mythsocket.cpp:265
        __FUNCTION__ = "DisconnectHandler"
#2  0x00007ff96c1cf885 in MythSocket::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0x55971de37a70, _c=QMetaObject::InvokeMetaMethod, _id=4, _a=0x7ff91cbe88c0) at moc/moc_mythsocket.cpp:148
        _t = 0x55971de37a70

This is the code around the segfault location [(*ft)->DecrRef() is the line which got the seg fault]

for (auto ft = m_fileTransferList.begin(); ft != m_fileTransferList.end(); ++ft)
{
    MythSocket *sock = (*ft)->getSocket();
    if (sock == socket)
    {
        LOG(VB_GENERAL, LOG_INFO, QString("FileTransfer sock(%1) disconnected")
            .arg(quintptr(socket),0,16) );
        (*ft)->DecrRef();
        m_fileTransferList.erase(ft);
        m_sockListLock.unlock();
        UpdateSystemdStatus();
        return;
    }
}

In one backtrace ft was 0 and in the other it has a good looking value, but examining the FileTransfer structure it points to gives garbage, for example the FileTransfer m_sock object accesses an invalid location and some boolean variables have values like 114 instead of true or false.

My dilemma is: How can ft (a variable on the stack) be corrupted between the call to (*ft)->getSocket() and (*ft)->DecrRef()? Other threads should not have corrupted it because this thread's stack is not accessible to other threads. In any case, the call is surrounded by m_sockListLock.lockForWrite() and m_sockListLock.unlock() which ensures single threading through this code.

ft is type iterator of <FileTransfer * > and is effectively FileTransfer * *.

Attachments (2)

20200215_backend_segfault.txt (89.0 KB) - added by Peter Bennett 4 years ago.
Backtrace 1
20200229_backend_segfault.txt (77.5 KB) - added by Peter Bennett 4 years ago.
Backtrace 2

Download all attachments as: .zip

Change History (13)

Changed 4 years ago by Peter Bennett

Backtrace 1

Changed 4 years ago by Peter Bennett

Backtrace 2

comment:1 Changed 4 years ago by Klaas de Waal

The loop construction in the code fragment looks similar to that in ticket #13571.

I am probably making a fool of myself but could it be that the "for (auto ft" does do something different from what the code used to do before the "auto" was introduced?

comment:2 Changed 4 years ago by Peter Bennett

It does look like the problem may be around iterators perhaps combined with "auto".

Maybe changing it to use a normal subscript counter instead of an iterator and changing "auto" to the actual type may work around some obscure compiler bug that is causing this?

comment:3 Changed 4 years ago by Klaas de Waal

My guess at this moment is that changing the "auto" to the actual type would be enough. However, it would be good to find the root cause. Comparison of the generated assembly code for the "auto" and for the actual type should then give differences, but it is a long time since I've done things like that.

Problem for testing with ticket #13571 is that the reproducibility is low. I've added there a few log statements and since then it does not crash anymore.....

comment:4 Changed 4 years ago by Jonatan Lindblad

It seems that adding elements to m_fileTransferList in MainServer::HandleAnnounce? is no longer protected by a lock since ea5fdd45bdab01aa6f3d577db7e344a90aae98a6. I guess that could be the issue here.

comment:5 Changed 4 years ago by Peter Bennett

Owner: set to Peter Bennett
Status: newaccepted

comment:6 Changed 4 years ago by Peter Bennett

Thank you Jonatan for finding that missing lock. I will start by fixing that and see if there are any recurrences. This bug is also difficult to reproduce.

comment:7 Changed 4 years ago by Peter Bennett <pbennett@…>

In 53e30ce4f/mythtv:

mythbackend: Fix missing lock on file transfer list

adding elements to m_fileTransferList in MainServer::HandleAnnounce?
was no longer protected by a lock since ea5fdd45bdab01aa6f3d577db7e344a90aae98a6.

Refs #13598

comment:8 Changed 4 years ago by Peter Bennett <pbennett@…>

In dea4cad656/mythtv:

mythbackend: Fix missing lock on file transfer list

adding elements to m_fileTransferList in MainServer::HandleAnnounce?
was no longer protected by a lock since ea5fdd45bdab01aa6f3d577db7e344a90aae98a6.

Refs #13598

(cherry picked from commit 53e30ce4f5d8182ffb02ec96fabe809cfc6927c7)

comment:9 Changed 4 years ago by Peter Bennett <pbennett@…>

In ba5cfd3f13/mythtv:

mythbackend: Fix missing lock on file transfer list

adding elements to m_fileTransferList in MainServer::HandleAnnounce?
was no longer protected by a lock since ea5fdd45bdab01aa6f3d577db7e344a90aae98a6.

Refs #13598

(cherry picked from commit 53e30ce4f5d8182ffb02ec96fabe809cfc6927c7)

comment:10 Changed 4 years ago by Peter Bennett

Resolution: Fixed
Status: acceptedclosed

Closing because on recurrence has been observed since the fix was made.

comment:11 Changed 3 years ago by Stuart Auchterlonie

Milestone: needs_triage31.0
Note: See TracTickets for help on using tickets.