Opened 16 years ago

Closed 16 years ago

#4112 closed defect (fixed)

frequent backend crashes using EIT

Reported by: rainecc@… Owned by: Stuart Auchterlonie
Priority: major Milestone: 0.21
Component: eit Version: head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

Using the latest SVN (14769) on debian etch 64-bit, the backend is segfaulting around 2-3 times per hour. Mostly with the log/bt attached as gdb1, but very occasionally with the one attached as gdb2. This is a BE-only box, though I compiled both FE and BE. Logs and backtraces attached.

There is a Nova-T 500 installed. I am using the 1.10 firmware and am seeing very occasional mt2060 errors. These don't seem to coincide with any of the segfaults.

Happy to assist in any way I can.

Attachments (9)

gdb1.txt (26.0 KB) - added by rainecc@… 16 years ago.
backtrace 1
log1.txt (5.1 KB) - added by rainecc@… 16 years ago.
log 1
gdb2.txt (21.1 KB) - added by rainecc@… 16 years ago.
backtrace 2
log2.txt (18.9 KB) - added by rainecc@… 16 years ago.
log 2
run2.tar.bz2 (31.4 KB) - added by rainecc@… 16 years ago.
output of second run
mythbackend glibc bt.txt (20.3 KB) - added by rainecc@… 16 years ago.
glibc backtrace
gdb.txt (26.1 KB) - added by panachoi@… 16 years ago.
output of running mythbackend (15244M) under gdb
gdb1.2.txt (18.2 KB) - added by Bill <level42@…> 16 years ago.
With SVN15419, case where I see the SQL errors in the log
gdb.2.txt (14.9 KB) - added by Bill <level42@…> 16 years ago.
Another case with SVN 15419

Download all attachments as: .zip

Change History (25)

Changed 16 years ago by rainecc@…

Attachment: gdb1.txt added

backtrace 1

Changed 16 years ago by rainecc@…

Attachment: log1.txt added

log 1

Changed 16 years ago by rainecc@…

Attachment: gdb2.txt added

backtrace 2

Changed 16 years ago by rainecc@…

Attachment: log2.txt added

log 2

comment:1 Changed 16 years ago by paulh

Does the problem go away if you disable the EIT scanner?

comment:2 in reply to:  1 Changed 16 years ago by rainecc@…

Replying to paulh:

Does the problem go away if you disable the EIT scanner?

Yes. I've been running about 3 hours without EIT and everything is OK so far.

comment:3 Changed 16 years ago by paulh

It looks from the first bt that the crash is happening when the EIT scanner is preparing a DB query. Exactly what is causing that I have no idea. I think a log from the BE with a verbose parameter of 'general,important,eit,database' would be useful to see what exactly is happening around the time of the crash.

The EIT scanner is more stuarta and danielk's area of expertise maybe they have a better idea what might be causing this?

comment:4 Changed 16 years ago by rainecc@…

Thanks - will generate the log tomorrow. I had a brief trawl through the code and there seems to be nothing in the m_db global, so it is causing a db kickstart or causing QtString? violations, etc. Perhaps something is freeing it elsewhere. I'm totally unfamiliar with the code, so this may well be a red herring..

Is there any memory debugging that can be compiled in?

comment:5 Changed 16 years ago by level42@…

I'll throw something else in just in case to check. Do you have anything that continually checks the backend status via html (like the "Backend Status" page in mythweb). I had a script that would check the backend status, check for the backend PID, then query the backend status using the lynx text browser. I found that this query was causing the backend to crash several times a day (the script checked the status every minute or so). Once I removed the lynx query, my crashing stopped. I haven't had time to track down the source of the problem to be able to raise a ticket.

Thought I'd mentioned it, cause I was convinced my problem was EIT related also, when in fact it was the lynx status query.

comment:6 Changed 16 years ago by rainecc@…

Turned eit back on and ran it for a few hours. Of the 20+ dumps, there are four distinct types. I have attached a bzip2 compressed tar file containing the backend startup logs, and logs and backtrace for each of these four types. All appear to have get_chan_id_from_db() in common.

This is happening on the dedicated master backend, but there is also a combined frontend/slave backend running. The master backend does not export any part of the filesystem, so the combined fe/be streams any data. It was not streaming during these tests, but the slave backend was connecting. Nothing else connects to this backend.

Happy to be pointed at anything to try!

Changed 16 years ago by rainecc@…

Attachment: run2.tar.bz2 added

output of second run

comment:7 Changed 16 years ago by rainecc@…

As of the most recent SVN, these same segfaults are happening even more frequently - sometimes more than 10/hour, sometimes an hour can pass without one. With EIT disabled, backend is 100% stable on 64-bit debian etch.

comment:8 Changed 16 years ago by Stuart Auchterlonie

Milestone: unknown0.21
Status: newinfoneeded_new

Could you add a backtrace from the latest SVN build.

See http://www.mythtv.org/docs/mythtv-HOWTO-22.html#ss22.2 for details.

Stuart

comment:9 Changed 16 years ago by rainecc@…

glibc produces memory backtraces when this occurs. There appears to be some memory corruption going on. I have attached the glibc output as "mythbackend glibc bt.txt". I will run a full gdb bt later today.

Changed 16 years ago by rainecc@…

Attachment: mythbackend glibc bt.txt added

glibc backtrace

Changed 16 years ago by panachoi@…

Attachment: gdb.txt added

output of running mythbackend (15244M) under gdb

comment:10 Changed 16 years ago by panachoi@…

This looks to me like a similar crash, from a relatively recent SVN version. Attached the gdb output and a backtrace (in same file). The last bit of the myth.log file is reproduced below:

2008-01-05 09:55:40.295 Reschedule requested for id -1.
2008-01-05 09:55:40.443 DB Error (Power Priority):
Query was:

No error type from QSqlError?  Strange...
2008-01-05 09:55:40.444 DB Error (AddNotListed):
Query was:

No error type from QSqlError?  Strange...
2008-01-05 09:55:40.470 Scheduled 0 items in 0.2 = 0.15 match + 0.03 place
2008-01-05 09:55:40.470 Scheduled 0 items in 0.2 = 0.15 match + 0.03 place

comment:11 Changed 16 years ago by stuartm

Component: mythtveit
Owner: changed from Isaac Richards to Stuart Auchterlonie
Status: infoneeded_newnew

comment:12 Changed 16 years ago by stuartm

Summary: frequent backend crashesfrequent backend crashes using EIT

comment:13 Changed 16 years ago by Stuart Auchterlonie

Status: newinfoneeded_new

Does this still happen with the latest SVN? There have been a number of changes in both the areas that turn up in the backtrace.

Stuart

comment:14 Changed 16 years ago by Bill

Yes still crashing, similar SQL errors. I'll see if I can get a back trace next time.

2008-01-13 22:53:21.537 DB Error (Looking up chanID): Query was:

No error type from QSqlError? Strange... QSqlQuery::exec: empty query 2008-01-13 22:53:21.539 DB Error (Looking up chanID): Query was:

No error type from QSqlError? Strange...

Changed 16 years ago by Bill <level42@…>

Attachment: gdb1.2.txt added

With SVN15419, case where I see the SQL errors in the log

Changed 16 years ago by Bill <level42@…>

Attachment: gdb.2.txt added

Another case with SVN 15419

comment:15 Changed 16 years ago by rainecc@…

Still having the same issue with SVN 15620 with EIT enabled on 64-bit.

comment:16 Changed 16 years ago by danielk

Resolution: fixed
Status: infoneeded_newclosed

There is something wrong with those latest segfaults. Try doing a 'make distclean' before running:

./configure --compile-type=debug ; make ; sudo make install

We'll need new backtraces + new ticket. It looks like the problem areas pinpointed in the earlier segfaults + bug report no longer exist.

Note: See TracTickets for help on using tickets.