Opened 14 years ago
Closed 14 years ago
#7847 closed defect (fixed)
After resume from long S3 sleep, scheduling via EPG isn't refreshed on EPG
Reported by: | Owned by: | cpinkham | |
---|---|---|---|
Priority: | minor | Milestone: | 0.24 |
Component: | MythTV - User Interface Library | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
I have issue which is quite annoying and related to FE using S3 sleep-resume. Symptom: scheduling via EPG isn't refreshed on EPG.
Scenario to reproduce: 1.Start system, launch FE. 2.Enter system into S3 sleep for longer than approx 15min. 3.Resume system form S3 sleep. 4.Go to EPG. 5.Select show and press Enter. 6.Choose desirable schedule options, press Save. 7.After return to EPG there is no this show refreshed state.
EPG refresh is working OK if sleep is shorter than approx 15 min, so it looks like sleeping longer that approch 15 minutes somehow broke FE-BE communication.
This issue is quite annoying and strongly decreasing WAF, as scheduling via EPG hasn't feedback to user (and my wife is complying that myth is not recording her lovely shows...)
My sys is minimyth based FE with 0.22-fixes SVN23069 + ticket7836
br
Attachments (8)
Change History (27)
comment:1 Changed 14 years ago by
Milestone: | 0.22 → 0.23 |
---|---|
Status: | new → infoneeded_new |
comment:2 Changed 14 years ago by
Hi, FE/BE (verbose all) logs attached. Generated with scenario: -clear all logs -start be -start fe -wait 20min -on fe go to schedule, schedule "record once". First refresh was OK. Second schedule not (fe.log.zip). -fe was suspended for 8h -after 8 h resume fe, go to epg. schedule show. No refresh on epg. (fe.log.2.zip)
br
comment:3 Changed 14 years ago by
This will happen any time the BE uses the event socket while the FE is asleep; the length of sleep time is irrelevant. E.g. if one FE updates the schedule while another is sleeping, the BE tries and fails to update the sleeping FE, resulting in the BE closing its side of the event socket. When the FE wakes up, it doesn't know that the BE has closed the connection, so the FE forever listens on the dead socket.
[23397] will help, and with some interval tuning, it could be made to work reasonably well. But as Mark alluded, an application-level heartbeat on the event socket from the FE would be a more complete way to address this issue.
comment:4 follow-up: 5 Changed 14 years ago by
Hi,
Thx for quick response.
I applied this patch for testing purposes. Tested on 0.22-fixes 23426 + ticket 7836
Results:
After resume from sleep, with sleep period longer than keep-alive-time+(keep-alive-intrval*keep-alive-probes), user sees dialog "backend connection lost".
By this I would say changset 23397 is serious regression for setups with separated FE & BE and when user is using S3 on FE.
I think it is expected result, as keep-alive is closing all TCP connections between BE-FE when sleep time is longer than keep-alive-time+(keep-alive-intrval*keep-alive-probes).
From this perspective, 23397 is serious regression as it causes FE-BE connection loss on sleeping FEs. In my case user sees "Backend connection lost" popup. When user trying still select recording to watch - FE crashes (this crash is not result of this patch. I have it also i.e. when I restart BE when FE sleeps).
Anyway - I think scenario with separated BE and FE when FE are using S3 for sleeps needs rethinking.
I would consider i.e. using TCP connections for all conn. initiated from FE to BE, and for long, persistent connections initiated from BE to FE I would consider switch from TCP to UDP. As UDP is stateless - maybe it will help solve problem ?
TCP Keep-alive solution is definitely bad idea IMHO and should be reverted form trunk.
Regarding heart-beat approach - I think it will be not so easy, as we will need in BE a way discover FE in sleeping state.
Without it, BE will falsely threat slipping FEs as dead FEs.
Probably simpler approach will be approach with keeping all FE->BE connections as always-on, and changing persistent BE->FE connections to state-less type (UDP)
br
comment:5 follow-up: 6 Changed 14 years ago by
Replying to warpme@…:
By this I would say changset 23397 is serious regression for setups with separated FE & BE and when user is using S3 on FE.
[23397] is not a regression; it just exposes two existing bugs:
- Backend reconnection fails inside SendReceiveStringList? - I have submitted patch #8024 for this issue.
- Segfault related to the popup window - someone better versed in the UI will need to check this one.
comment:6 Changed 14 years ago by
Replying to Jeff Lu <jll544@…>:
Replying to warpme@…:
By this I would say changset 23397 is serious regression for setups with separated FE & BE and when user is using S3 on FE.
[23397] is not a regression; it just exposes two existing bugs:
I have question here. How it will behave when mentioned bugs will be nailed ?.
Today in scenario with sleeping FE and restarted BE (as this scenario is good simulation of keep-alive closed all BE-FE connections) - after FE resume user sees following:
-resume FE
-after few sec main menu appears
-user kick watch recordings
-FE shows select group pop-up
-user selects particular group
-FE halts for 20-30 sec
-FE popups dialog "Backend Connection Lost"
Can we assume that those 20-30 sec waiting is also SendReceiveStringList? bug result and when this bug it will be cleared - lost of FE-BE conn. will be immediately reestablished ?
If Yes - then this is progress. If not - 23397 is for me steep back as by design will cause 20-30 delay after FE resume.
- Backend reconnection fails inside SendReceiveStringList? - I have submitted patch #8024 for this issue.
- Segfault related to the popup window - someone better versed in the UI will need to check this one.
Should I fill separate bug repot for it ?
br
comment:7 Changed 14 years ago by
I applied ticket 8024 with changset 23397 and tickets 7836 & 7839
Indeed - with ticket 8024 - there is no "Backend Connection Lost" and 20-30sec delay after resume now is approx few sec.
Unfortunately EPG auto-refresh still isn't working.
I'm attaching short logs form BE/FE
br
comment:8 Changed 14 years ago by
Status: | infoneeded_new → new |
---|
comment:9 Changed 14 years ago by
Milestone: | 0.23 → 0.24 |
---|
comment:10 Changed 14 years ago by
Component: | MythTV - General → MythTV - User Interface Library |
---|---|
Owner: | changed from Isaac Richards to stuartm |
Status: | new → assigned |
comment:11 Changed 14 years ago by
Owner: | changed from stuartm to paulh |
---|
comment:12 Changed 14 years ago by
Owner: | paulh deleted |
---|---|
Status: | assigned → new |
Doesn't look like this is specific to the program guide but is a problem with our socket communication which is held together with string at the best of time. Throwing back into the pool for someone more knowledgeable of that area of code to look at.
comment:13 Changed 14 years ago by
Owner: | set to cpinkham |
---|---|
Status: | new → assigned |
comment:14 Changed 14 years ago by
Status: | assigned → infoneeded |
---|
If you can reproduce this issue, please run your frontend and backend with "-v network,extra,socket" and paste the relevant portions of the logs so I can see what is going on over the wire when the issue occurs.
comment:15 Changed 14 years ago by
Chris,
Thx keeping eye on this ticked.
Please find FE/BE logs.
Sceario:
1.start backend
2.boot FE
3.sleep FE
4.wait few hours
5.resume FE
6.enter EPG
7.schedulle recording (no refresh)
8.exit EPG
9.enter EPG again. Schedule is from 7. is visible
sys is:
BE: Arch, myth trunk 26331
FE: minimyth derivate, myth trunk 26375
br
Changed 14 years ago by
Attachment: | 7847_reconnect_event_socket.diff added |
---|
Reconnect the event socket when we lose the main BE connection
comment:16 Changed 14 years ago by
Can you try the attached 7847_reconnect_event_socket.diff patch and attempt to reproduce the issue again. We aren't currently reconnecting the event socket if it dies since we never try to send to it. The attached patch will reconnect the event socket if the main command socket has to be reconnected. In the logs, you can see the message "Connection to backend server lost" when we detect the command socket needs to be reconnected but the only connection after that is the command socket, the event socket is never reconnected so you don't receive any more events after unsuspending. This means the FE misses the scheduler's event signalling that the schedule has been changed and needs to be reloaded, so the FE never reloads the new list until after you exit and reenter the screen.
comment:17 Changed 14 years ago by
Version: | 0.22-fixes → Trunk Head |
---|
comment:19 Changed 14 years ago by
Resolution: | → fixed |
---|---|
Status: | infoneeded → closed |
(In [26433]) If we lose the backend command socket, such as when we are suspended for a long period of time, also close the event socket before reconnecting to the master backend. Since we don't ever send to the event socket, we weren't detecting it as closed. We should be able to detect this and handle it in a better manner and only reconnect the event socket when needed, but this option is safer closer to a release.
Fixes #7847.
We need frontend and backend logs.