Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#7603 closed defect (fixed)

mythfrontend can never connect to just waked up master backend

Reported by: bam <mybigspam@…> Owned by: danielk
Priority: major Milestone: 0.23
Component: MythTV - General Version: head
Severity: medium Keywords: wol
Cc: danielk Ticket locked: no

Description

If for some reason frontend has no connection to the backend program, the frontend tries to re-establish it by waking up server/restarting backend/etc, but it can never to re-connect to it, even if the backend was successfully back online.
The frontend's log when the master backend on the same machine is not running:

2009-11-18 20:54:45.388 Loading menu theme from /usr/share/mythtv/themes/defaultmenu//mainmenu.xml
2009-11-18 20:54:45.411 Found mainmenu.xml for theme 'Retro'
2009-11-18 20:54:46.656 MythContext: Connecting to backend server: 192.168.1.2:6543 (try 1 of 15)
master_wake
2009-11-18 20:54:46.795 MythContext: Connecting to backend server: 192.168.1.2:6543 (try 2 of 15)
2009-11-18 20:54:46.796 MythContext: Wake-On-Lan in progress, waiting...
2009-11-18 20:54:47.796 MythContext: Wake-On-Lan in progress, waiting...
2009-11-18 20:54:48.797 MythContext: Wake-On-Lan in progress, waiting...
2009-11-18 20:54:49.797 MythContext: Wake-On-Lan in progress, waiting...
2009-11-18 20:54:50.798 MythContext: Wake-On-Lan in progress, waiting...

...and so on.
I have the "WOLbackendCommand" setting set to "echo master_wake" for debug, but even if I restart the backend by hand, nothing is changed in the log.
The reason is that WaitForWOL() function from libs/libmyth/mythcontext.cpp file can never return:

bool MythContextPrivate::WaitForWOL(int timeout_in_ms)
{
    int timeout_remaining = timeout_in_ms;
    while (WOLInProgress && (timeout_remaining > 0))
    {
        VERBOSE(VB_GENERAL, LOC + "Wake-On-Lan in progress, waiting...");

        int max_wait = min(1000, timeout_remaining);
        WOLInProgressWaitCondition.wait(
            &WOLInProgressLock, max_wait);
        timeout_remaining -= max_wait;
    }

    return !WOLInProgress;
}

I think this is because there is no threads actually created for the WOL process, and the following code from mythcontext.cpp can never get control:

01679     if (we_attempted_wol)
01680     {
01681         QMutexLocker locker(&d->WOLInProgressLock);
01682         d->WOLInProgress = false;
01683         d->WOLInProgressWaitCondition.wakeAll();
01684     }

Attachments (1)

7603-pbb-misc-v1.patch (58.2 KB) - added by danielk 14 years ago.
Misc. PBB fixes, including some related to this ticket

Download all attachments as: .zip

Change History (20)

comment:1 Changed 14 years ago by bam <mybigspam@…>

I forgot to specify the version:

@me:~# mythfrontend --version
Please include all output in bug reports.
MythTV Version   : 22776
MythTV Branch    : branches/release-0-22-fixes
Network Protocol : 50
Library API      : 0.22.20091023-1
QT Version       : 4.5.3

backend is the same.

comment:2 Changed 14 years ago by bam <mybigspam@…>

This is a workaround:

Index: libs/libmyth/mythcontext.cpp
===================================================================
--- libs/libmyth/mythcontext.cpp        (revision 22942)
+++ libs/libmyth/mythcontext.cpp        (working copy)
@@ -1654,7 +1654,7 @@
                     continue;
                 }

-                d->WOLInProgress = we_attempted_wol = true;
+//                d->WOLInProgress = we_attempted_wol = true;
             }

             myth_system(WOLcmd);

comment:3 Changed 14 years ago by bam <mybigspam@…>

Also see #7696.

comment:4 Changed 14 years ago by danielk

Owner: changed from Isaac Richards to danielk
Status: newassigned

The problem is a little more complicated than just this problem. When this code was first ported to MythUI another thread should have been created since MythUI depends on the UI thread continuing to run while the reconnect is in progress. But reconnects are initiated by MythProto? activity which is often initiated by MythProto? calls in the UI thread.

One solution is to have two code paths within the reconnect code, one for when it is called from within the UI and another for when it is called from another thread. Another solution is do reconnects in a dedicated thread and when a MythProto? call is called while disconnected we wait while allowing the UI to update. A third solution is to use the new thread pool facilities within Qt4 instead of adding another dedicated thread and treating GetSetting?() calls like we would with a dedicated thread. Finally, a fourth solution is to not allow MythProto? calls in the UI thread; this one allows for the most elegant and least error prone code and the best UI responsiveness, but would require the greatest number of changes to the existing code.

All of these solutions are more complicated than I would like and there is a danger in allowing the UI to update while we are reconnecting since the user may initiate some action which results in more blocking calls or tries to use uninitialized data. My first preference is to use the threadpool, but there is an additional danger there that the threadpool may already be full of qRunnables which make MythProto? calls.

I'm not promising to work on this myself anytime soon, mostly due to lack of time. But I will assign it to myself as I've thought about these issues. When I last worked on the reconnect code my primary goal was to get slave backends to function correctly again, frontends can always be restarted in the worst case scenario -- you don't lose any recordings while it's dithering.

Please post any follow-on discussion in the mythtv-dev mailing list. I read all those messages even if I don't have time to reply right away.

comment:5 Changed 14 years ago by bam <mybigspam@…>

Thank you for assigning the bug, Danielk.
I just would like to emphasize the severity of the problem, since it is not so easy to restart mythfrontend in case of remote-only driven mythtv set-top-box.

comment:6 Changed 14 years ago by danielk

Milestone: unknown0.23
Priority: minormajor
Version: unknownhead

comment:7 Changed 14 years ago by danielk

(In [23009]) Refs #7603. This moves the QUERY_RECORDINGS mythproto calls out of the UI thread.

The idea is to move as many mythproto calls as possible out of the UI thread so that when a reconnect to the master backend is necessary we can show a dialog to that effect during the reconnect attempt even if using MythUI.

QUERY_RECORDINGS is really the major query that this screen does. But there are also some done when parsing the list returned by this call, which I'll tackle next. And various mythproto calls linked to various actions that don't require the return value, which I'll tackle as one task with a QRunnable dispatcher.

comment:8 Changed 14 years ago by danielk

(In [23028]) Refs #7603. Fixes #7807. Moves more MythProto? calls out of UI thread in PlaybackBox? using an event driven helper thread to handle various calls.

Also fixes various bugs in the playlist handling following the MythUI port and one caused by [23009]. This also adds TODO's for some of the bugs that were not fixed.

comment:9 Changed 14 years ago by danielk

(In [23031]) Refs #7603. Moves preview generation queue handling out of the UI thread into the PBB helper thread.

comment:10 Changed 14 years ago by danielk

(In [23035]) Refs #7603. Adds a couple checks before setting the m_previewImage in the preview image handler.

Changed 14 years ago by danielk

Attachment: 7603-pbb-misc-v1.patch added

Misc. PBB fixes, including some related to this ticket

comment:11 Changed 14 years ago by danielk

(In [23045]) Refs #7603. In the recgroup and playgroup changing code, when operating on playlists we used the currently selected item to determine the default group, but that item may very well not even be in the same group as any of the playlist items.

This changes the code to use the group of the first item in the playlist as the default.

comment:12 Changed 14 years ago by danielk

(In [23046]) Refs #7603. Simplifies update program info event handling.

Since the cache contains pointers to the same program info's used in the UI we can just update those. This is safe since only the UI thread touches the cache onced items m_programInfoCache::Refresh() has been called.

comment:13 Changed 14 years ago by danielk

(In [23047]) Refs #7603. Moves the available status checking to the helper thread and simplifies some of the popup code.

Note: This is the last of the MythProto? stuff in playbackbox.cpp aside from findArtworkFile(), themes not using findArtworkFile() should now feel a bit snappier. There are still the occational MythProto? accesses in the UI thread in other cpp's used by the PBB, and there are still a number of Get*Setting() calls which slow things down when the item is not in the cache and we need to reconnect the DB socket.

comment:14 Changed 14 years ago by danielk

(In [23048]) Refs #7603. This makes the reselection code when reloading a the full list a bit smarter and cleans up verbose error and warning macro usage.

The reselection code now attempts not just the previously selected item, which may have been deleted or moved to another recording group which is not currently visible, it also tries the following items in the same group, then the previous items in the same group, and if the group is gone it tries the following groups.

This should avoid a lot of the jumps to the top of the 'All Recordings' group that happen when the currently selected item is deleted, either locally or on another frontend.

comment:15 Changed 14 years ago by danielk

(In [23051]) Refs #7603. Fixes segfault in mythcontext popup handling.

This drops the old UI popups and uses only MythUI pop-ups. The MythUI pop-ups do not try to launch from the calling thread anymore but instead an event is dispatched to an object whose event handler is in the MythUI thread. This simplifies the code and avoids all the problems, including segfaults, which occur when you try to use MythUI outside the MythUI thread.

Note: When MythProto? is used within the MythUI thread, the popup will not be shown when it should be. This is pre-existing behaviour and is inherent in the design of MythUI. But a previous commit causes such usage of MythProto? to print a warning message: SendReceiveStringList?(...) called from UI thread These should be fixed, as the have potential to make the frontend appear to be hung for minutes at a time.

comment:16 Changed 14 years ago by danielk

Resolution: fixed
Status: assignedclosed

(In [23052]) Fixes #7603. Fixes the immediate wake-on-lan bug.

comment:17 Changed 14 years ago by danielk

(In [23053]) Refs #7603. Backports [23052]. This fixes the immediate problem bam experienced, but doesn't backport the related fixes which are too invasive for backport.

comment:18 Changed 14 years ago by bam <mybigspam@…>

Thank you very much, Danielk.

comment:19 in reply to:  16 Changed 14 years ago by bam <mybigspam@…>

Replying to danielk:

(In [23052]) Fixes #7603. Fixes the immediate wake-on-lan bug.

Sorry if I misunderstand something, but appears that on call of WaitForWOL() function

bool MythContextPrivate::WaitForWOL(int timeout_in_ms)
{
    int timeout_remaining = timeout_in_ms;
    while (WOLInProgress && (timeout_remaining > 0))
    {
        VERBOSE(VB_GENERAL, LOC + "Wake-On-Lan in progress, waiting...");

        int max_wait = min(1000, timeout_remaining);
        WOLInProgressWaitCondition.wait(
            &WOLInProgressLock, max_wait);
        timeout_remaining -= max_wait;
    }

    return !WOLInProgress;
}

the WOLInProgress variable is always false, so the loop inside that function can never take the control.

Note: See TracTickets for help on using tickets.