Opened 10 years ago

Closed 8 years ago

#7608 closed defect (Fixed)

Slave backend disconnects

Reported by: bob@… Owned by: cpinkham
Priority: minor Milestone: unknown
Component: MythTV - General Version: unknown
Severity: medium Keywords:
Cc: Ticket locked: no

Description

Since upgrading to mythtv 0.22-fixes from .21-fixes my slave backend disconnects after a couple of recordings. I have attached the log for the slave backend. The backend is still running according to ps aux|grep myth:

Output from ps aux |grep myth|grep -v grep .. bob 1242 0.1 3.1 223016 32932 ? Sl Nov17 2:03 mythbackend bob 4165 0.0 0.0 1772 480 ? S Oct27 0:00 sh -c /usr/local/bin/mythcommflag -j 35735 -V 3 bob 4166 0.0 2.5 84628 26364 ? Sl Oct27 0:00 /usr/local/bin/mythcommflag -j 35735 -V 3 bob 4897 0.0 0.0 1772 484 ? S 00:59 0:00 sh -c /usr/local/bin/mythcommflag -j 36448 -V 3 bob 4898 0.0 2.5 84632 26796 ? Sl 00:59 0:00 /usr/local/bin/mythcommflag -j 36448 -V 3 bob 5285 0.0 0.0 1772 484 ? S 02:00 0:00 sh -c /usr/local/bin/mythbackend --generate-preview 0x0 --chanid 2048 --starttime 20091118020000 bob 5286 0.0 2.5 85584 26868 ? Sl 02:00 0:00 /usr/local/bin/mythbackend --generate-preview 0x0 --chanid 2048 --starttime 20091118020000 bob 6654 1.3 12.9 281532 134188 pts/0 SLl+ Nov10 155:13 mythfrontend bob 21970 0.0 0.0 1772 480 ? S Nov14 0:00 sh -c /usr/local/bin/mythcommflag -j 36322 -V 3 bob 21971 0.0 2.3 84592 24828 ? Sl Nov14 0:00 /usr/local/bin/mythcommflag -j 36322 -V 3

The backend status page shows the slave backend mythcommflag with a status of starting at 12:59AM. Every time the slave backend disconnects the list of processes is similar. I tried asking on the mythtv-users mailing list but didn't get any response. I never had a problem with .21-fixes, both compiled from svn.

Attachments (8)

slave backend.log (11.1 KB) - added by bob@… 10 years ago.
Slave backend log file
masterbackend.tar.gz (1.5 KB) - added by bob@… 10 years ago.
master backend log showing errors
mbe_first_minutes.log.gz (213.1 KB) - added by James Crow <crow.jamesm@…> 9 years ago.
MBE log after restart showing first writeStringList error
sbe_first_minutes.log.gz (123.7 KB) - added by James Crow <crow.jamesm@…> 9 years ago.
M
mbe_2100.log.gz (165.2 KB) - added by James Crow <crow.jamesm@…> 9 years ago.
MBE log around 9PM showing many socket errors
sbe_2100.log.gz (155.2 KB) - added by James Crow <crow.jamesm@…> 9 years ago.
SBE log around 9PM showing many socket errors
master backend log.zip (12.6 KB) - added by bob@… 9 years ago.
master backend log
Slave backend log.zip (17.2 KB) - added by bob@… 9 years ago.
slave backend log

Download all attachments as: .zip

Change History (20)

Changed 10 years ago by bob@…

Attachment: slave backend.log added

Slave backend log file

comment:1 Changed 10 years ago by bob@…

I'm still seeing this with trunk, only it is now apparent that the master backend is having socket issues. The master backend logs show the slave as disconnected before the slave backend logs show the disconnection. The master backend also become unresponsive to socket requests in general (first mythfrontend can't talk to the backend then port 6544 becomes unresponsive and the backend stops recording). I have to restart the master backend at least once a day so this is a pretty big issue for me. I have deleted all mythtv files and redownloaded trunk and compiled and still have this issue. Attaching master backend logs.

Changed 10 years ago by bob@…

Attachment: masterbackend.tar.gz added

master backend log showing errors

comment:2 Changed 9 years ago by robertm

Possible relation to #8496.

comment:3 Changed 9 years ago by anonymous

This is a good suggestion and it is very appreciated, however I noticed ticket #8496 when it was created, although I did not have my storage groups set up in the manner that he had. I reviewed some of the very thorough comments that Michael T Dean has made regarding how storage groups should be set up on the master and slave backends and I deleted my storage group settings and re-did them, however there was no difference in the result. I have continued to try using the slave backend as I have seen updates to .23-fixes, but have not seen any difference. I compile the code myself and have gone as far as deleting the svn directory and re-downloading everything to ensure I didn't have anything corrupt. I am open to any suggestions as to tests I could do or logs I could provide to help run this down.

comment:4 Changed 9 years ago by robertm

Owner: changed from Isaac Richards to cpinkham
Status: newassigned

comment:5 Changed 9 years ago by cpinkham

Status: assignedinfoneeded

I think that in order to debug this, we're going to need master and slave backend logs from the same exact time, so we can see what is going on on both systems. Please run both the master and slave with "-v network,extra,socket" and attach the compressed logs to the ticket. You can trim the logs to cut down on the size, but please include at least a minute before the issue happens and at least a minute after the problem resolves itself if it does.

comment:6 Changed 9 years ago by James Crow <crow.jamesm@…>

I am not the original poster, but I opened a ticket and it was closed saying it was a dupe of this ticket. The closed ticket also pointed to ticket #8526. 8526 has two deadlock patches. I was able to cleanly apply the first of them to 0.23.1-26231. The other I am not able to apply and I can not even find the right place in the code to try a manual application. Either way with the one patch from 8526 applied I am still experiencing the same problems. I am now restating the MBE multiple times each day. I will be attaching the log file from when I restarted the backend and the first writeStringList errors occurred. When I grepped the log I noticed a cluster of errors around 21:00(9PM). There were errors during other times, but they were more sporadic. The log from 9PM is also attached.

Changed 9 years ago by James Crow <crow.jamesm@…>

Attachment: mbe_first_minutes.log.gz added

MBE log after restart showing first writeStringList error

Changed 9 years ago by James Crow <crow.jamesm@…>

Attachment: sbe_first_minutes.log.gz added

M

Changed 9 years ago by James Crow <crow.jamesm@…>

Attachment: mbe_2100.log.gz added

MBE log around 9PM showing many socket errors

Changed 9 years ago by James Crow <crow.jamesm@…>

Attachment: sbe_2100.log.gz added

SBE log around 9PM showing many socket errors

comment:7 Changed 9 years ago by anonymous

Status: infoneededassigned

comment:8 Changed 9 years ago by Slacker

Not sure if this helps find the issue, but I had the exact same issue with .23, so last week I went to .24 (Mythbuntu daily builds). I still had the problem for several days and then I updated a couple of days back and the problem seems to have been fixed. I'm on 26363 for several days with 3 SBE and I've had no lock ups on the MBE.

comment:9 in reply to:  8 Changed 9 years ago by Slacker

I spoke too soon - still have the issue. I'll try and get logs ASAP.

comment:10 Changed 9 years ago by James Crow <crow.jamesm@…>

I have also upgraded to trunk with Mythbuntu daily builds. Currently running 26454. My system has now been up for over 24 hours. Longer than I had it running with 0.23.1. All my encoders (3 on MBE, and 3 on SBE) have been exercised pretty heavily. Fingers crossed to see how long it stays up.

comment:11 Changed 9 years ago by bob@…

I am the original poster. I had to replace the hard drive and reinstall everything on my slave backend so it took me some time to respond. In reinstalling everything I elected to go with trunk. I'm still running into an issue when running my slave backend, however it appears different than before under .23. I don't know if this is a separate issue or not. Let me know if you want me to put this into a new ticket.

I'm running trunk 26464 on both the master and slave backends. The master is called mythbox with IP 192.168.1.101 and has an HDHomerun pointed to it and a two PVR-250s in it. The slave is called mythbasement at IP 192.168.1.104 and is running the MYSQL server and has one PVR-150 in it. The storage groups are as follows:

/var/mythtv/tv/recordings/

/var/mythtv/tv2/recordings/

/var/mythtv/tv3/recordings/

/var/tv/recordings/

/var/tv2/recordings/

The first three are on the master backend and the last two on the slave backend. All are defined in the master backend only.

Without the slave backend running the master backend will consider /var/tv/recordings and /var/tv2/recordings, say they don't exist and record on one of the other directories, however when the slave backend is connected it tries to record to /var/tv/recordings on the master and fails. This only seems to occur with the recordings from the HDHomerun. The master and slave continue working and there are no mythsocket errors as I saw in .22 and .23 so this may be an entirely different error.

I've attached the logs for the master and slave backends for one minute before the error and a short while afterwards. The actual error appears to be at 09:58:42.635. Let me know if you need anything else or want me to open a new ticket.

Changed 9 years ago by bob@…

Attachment: master backend log.zip added

master backend log

Changed 9 years ago by bob@…

Attachment: Slave backend log.zip added

slave backend log

comment:12 Changed 8 years ago by cpinkham

Resolution: Fixed
Status: assignedclosed

The original issue reported in this ticket is reported to not be a problem anymore by the submitter.

The second issue stated where the master tries to record to a non-existent directory is unrelated and should be handled under another ticket. In order to investigate that issue, I'll need logs from the master backend run with "-v file,schedule" so that I can see the log entries from the Storage scheduler to try to determine why it is picking the wrong directory. If this issue is still happening, please reopen another ticket and attach the indicated master backend log.

Closing this ticket as 'fixed' since the original issue was reported as being corrected by an upgrade.

Note: See TracTickets for help on using tickets.