Opened 5 years ago
Closed 5 years ago
Last modified 5 years ago
#13519 closed Bug Report - Crash (fixed)
seg fault while channel scanning hdhomerun connect quatro
Reported by: | Mike Bibbings | Owned by: | Klaas de Waal |
---|---|---|---|
Priority: | minor | Milestone: | 31.0 |
Component: | MythTV - Mythtv-setup | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
When channel scanning HDHomerun Connect Quatro in mythtv-setup I sometimes get a seg fault. The point at which the seg fault happens varies. Note this problem occurs with various versions of mythtv (30, devel/2019-render and master)
mythtv-setup log, gdb backtrace and mythtv-version files are attached.
Attachments (14)
Change History (34)
Changed 5 years ago by
Attachment: | mythtv-setup.20191122150226.22047.log added |
---|
Changed 5 years ago by
Attachment: | mythtv-setup-version.txt added |
---|
comment:1 Changed 5 years ago by
comment:2 Changed 5 years ago by
HDHomeRun details Model: HDHR5-4DT, Device ID: 1251E907, Firmware: 20190621
comment:3 Changed 5 years ago by
Could you check the current master with valgrind for memory-related errors and generate also the chanscan log, like:
valgrind --track-origins=yes mythtv-setup -v general,channel,chanscan,record --logpath=./log
Sometimes valgrind is very useful for problems like this.
Changed 5 years ago by
Attachment: | valgrind.log added |
---|
comment:4 Changed 5 years ago by
Klaas,
From a quick look at the valgrind log I just attached, "Invalid read" stands out.
Mike
Changed 5 years ago by
Attachment: | mythtv-setup.20191123111959.10161.log added |
---|
mythtv-setup log to go with valgrind.log
comment:5 Changed 5 years ago by
Milestone: | needs_triage → 31.0 |
---|---|
Owner: | changed from Peter Bennett to Klaas de Waal |
Status: | new → assigned |
Changed 5 years ago by
Attachment: | valgrind-19.10_master.log added |
---|
comment:6 Changed 5 years ago by
Just added valgrind-19.10_master.log which shows the same problem on Xubuntu 19.10 with master built from source. Ubuntu 19.10 has the latest libhdhomerun and libhdhomerun-dev libraries, Ubuntu 18.04 has older versions.
Changed 5 years ago by
Attachment: | 20191125-hdhr-debug.patch added |
---|
HDHR debug output to file ./log/hdhr-dbg.log
comment:7 Changed 5 years ago by
Thanks for testing with the latest&greatest libhdhomerun. This does reduce the search space. I've attached a patch (for today's master) that does enable debug logging from the hdhomerun library. On my system it gives only a single line of output:
20191125-10:16:50 hdhomerun_device_selector_choose_test: device 1410F45C-0 chosen
but there should be more messages if something is wrong. Could you test this on the Ubuntu 19-10?
comment:8 Changed 5 years ago by
Hi Klaas,
Tested on ubuntu 19.10, lots of errors in hdhr-dbg.log before it eventually seg faults and of course the errors are in hdhomerun_control_recv_sock.
Attached are the hdhr-dbg.log and associated mythtv-setup log
Just another datapoint, I also retested using a totally isolated network (just to make sure nothing on my main network was affecting anything) using another router with different ethernet cables, same seg fault.
Mike
Changed 5 years ago by
Attachment: | hdhr-dbg.log added |
---|
Changed 5 years ago by
Attachment: | mythtv-setup.20191125111918.2723.log added |
---|
mythtv-setup log to go with hdhr-debug.log
comment:9 Changed 5 years ago by
Hi Mike,
Can you (temporarily) replace your libhdhomerun.so with the debug version I just attached and then run the "valgrind --track-origins=yes" again? I expect that valgrind then gives the function names and the line numbers of all frames in the traceback.
N.B. This binary is built from the latest sources from https://www.silicondust.com/support/linux/ (version 20190621) on Fedora-31 but with a bit of luck this might work on your Ubuntu-19.10 system. In the Makefile the CFLAGS setting "-O2" is replaced with "-g".
For the casual readers of this history:
- a way to identify the location of libhdhomerun is:
$ ldd $(which mythtv-setup) | grep libhdhomerun
Klaas.
Changed 5 years ago by
Attachment: | mythtv-setup.20191125201104.3699.log added |
---|
with debug libhdhomerun
comment:10 Changed 5 years ago by
Klaas, Added log files from using debug libhdhomerun.
One thing I did note in the mythtv-setup log was AddListener? being triggered at: 2019-11-25 20:13:02.646651 D [3699/3750] SignalMonitor? recorders/hdhrsignalmonitor.cpp:121 (UpdateValues?) - HDHRSigMon[1](1251E907): Tuner status: '0:0:0' 2019-11-25 20:13:02.659983 I [3699/3750] SignalMonitor? recorders/streamhandler.cpp:40 (AddListener?) - SH[1](1251E907): AddListener?(0x177f3058) -- begin
I don't think that is good as signal strength from Tuner status is 0 (threshold is set to 45 as minimum for a lock), this could be a side effect of sock_recv error or incorrect setting of isLocked.
Mike
comment:11 Changed 5 years ago by
Hi Klaas,
Done some more investigating, it appears that first error (and possibly others) in hdhr-dbg.log aligns with the switch in channel scan to the next offset of the frequency being tuned. So maybe some data is not being refreshed with the frequency change say from 482000000 (offset 0) to 482166670 (offset 1) and then 481833330 (offset 2). Frequency 482000000 and associated offsets should never lock (not broadcast by my local transmitter). I have run other types of channel scan e.g. Full Scan (Tuned)and there are no errors in the hdhr-dbg.log. Only Full Scan uses the offset frequencies.
Mike
comment:12 follow-up: 13 Changed 5 years ago by
Hi Mike,
About the offsets. To test that hypothesis you could do a "Full Scan" and select Netherlands. This has a list of DVB-T/T2 channels/frequencies that do not have offsets.
In the meantime, thanks to your logs, I have also come to some tentative conclusions.
- Device failure
It could be possible that your HDHR device is slowly breaking down. According to the mailing list the power supply of the HDHR devices does fail quite often after a few years and this might give incidental errors like the ones in the hdhr-dbg.log file.
- Thread-safety (or lack thereof)
The errors as reported by valgrind in file valgrind-19.10-master6.log show that the failing function, hdhomerun_sock_recv, is called from two different threads: Thread 37 Signal Monitor (see line 10876) and Thread 36 Scanner (see line 15607). Looking carefully at the valgrind messages following line 10876 it shows that the "invalid read" is done by the signal monitor; it is reading the value just free'd by the scanner.
- Why does this not happen more often
It is only when the communication fails that in libhdhomerun the communication is closed and then restarted. This can happen with a broken HDHR device or with anything else that disturbs the communication. When the network communications works correct then there are never retries that involve a close/open/read again. Only when it fails you can have the situation in which the "cs->sock" is free'd in one thread and used in another thread.
The valgrind errors are for me the proof that there is a thread-safety problem. Even if they are only triggered when doing offset scans, what your test suggest, it is still wrong. Therefore the next step is to make sure that the calls to the libhdhomerun functions from the hdhr signal monitor and the hdhr stream handler are serialized. One more mutex should do the job.
In the meantime, in the case that your problems are caused by the HDHR power supply please do not throw it away yet otherwise there is no way to test the fixes....
Klaas.
comment:13 Changed 5 years ago by
Replying to Klaas de Waal:
- Thread-safety (or lack thereof)
The errors as reported by valgrind in file valgrind-19.10-master6.log show that the failing function, hdhomerun_sock_recv, is called from two different threads: Thread 37 Signal Monitor (see line 10876) and Thread 36 Scanner (see line 15607). Looking carefully at the valgrind messages following line 10876 it shows that the "invalid read" is done by the signal monitor; it is reading the value just free'd by the scanner.
I have this recollection this (or something like it) was reported in a previous ticket(*), and SiliconDust? indicated that you needed to create a new object for each thread...
(*) Yes, found it! #9027 which was closed WONTFIX at the time. Note I have not actually spent time to determine if this is *actually* the same issue, so this might be an untamed fowl chase, but the symptoms, in the same library, certainly raise an eyebrow.
comment:14 Changed 5 years ago by
(re-posted, since it appears the email after the edit did not go out?)
I have this recollection this (or something like it) was reported in a previous ticket(*), and SiliconDust?? indicated that you needed to create a new object for each thread...
(*) Yes, found it! #9027 which was closed WONTFIX at the time. Note I have not actually spent time to determine if this is *actually* the same issue, so this might be an untamed fowl chase, but the symptoms, in the same library, certainly raise an eyebrow.
comment:15 Changed 5 years ago by
Hi Klaas,
My HDHomeRun Connect Quatro is new (September 2019), so power supply failure is not likely.
I ran 10 channel scans using Netherlands with no seg faults and no errors in hdhr-dbg.log.
One other thing I noticed in the code is that in HDHRStreamHandler::GetTunerStatus? does not return status of the call hdhomerun_device_get_tuner_status(m_hdhomerun_device, nullptr, status), this call does have the usual result codes for Get operations i.e.
- Returns 1 if the operation was successful.
- Returns 0 if the operation was rejected.
- Returns -1 if a communication error occurred.
so HDHRSignalMonitor::UpdateValues?(void) could be using invalid data.
Mike
comment:16 Changed 5 years ago by
Another datapoint.
I also ran 10 channel scans using Netherlands on Raspberry Pi4 Raspbian Buster which has older libhdhomerun4: Installed: 20180817-2 with mythbackend v31-Pre-1301-g9dc72f3b81 (devel/2019-render branch). No seg faults.
Mike
Changed 5 years ago by
Attachment: | 20191129-hdhr-fix-debug.patch added |
---|
HDHR fix and debug output to file ./log/hdhr-dbg.log
comment:17 Changed 5 years ago by
Mike,
I've just added a patch that can be applied to today's master. This should fix the segfault that you have with scanning and it should also fix the crash in the backend as reported in ticket #9027. The HDHR debug output is also still present. Can you please test this with the United Kingdom frequency list?
As it happens the mutex to lock the HDHR tuner instance was already present and it was already used in the TunerGet? and TunerSet? functions, but not in UpdateValues?.
About why it happens on the UK scans
The UK scans have offsets defined; if there is no lock on the exact frequency there is a try on +160kHz and on -160kHz. Log output shows that for the offset tuning the signal monitor is not stopped. For a non-offset scan, such as with the Netherlands channel list,the signal monitor is started and stopped for each frequency.
About the function value of GetTunerStatus?
You are correct about the function value of hdhomerun_device_get_tuner_status. However, this value is ignored. The status values are returned in a struct hdhomerun_tuner_status_t and this is, as I can see it, done correct.
About ticket #9027.
Gary, thanks for digging this up. I was not aware of this one. The gdb log shows that it is indeed a similar problem; here the call to HDHRStreamHandler::UpdateValues? from the HDHR signal monitor clashes with the HDHRStreamHandler::run. This problem is still present in today's code but should now also be fixed.
comment:18 Changed 5 years ago by
Hi Klaas,
Applied the patch 20191129-hdhr-fix-debug.patch, no more seg faults or errors in hdhr-dbg.log
Tested on Xubuntu 19.10 mythtv master with debug libhdhomerun in place. Also tested with older libhdhomerun, Raspbian Buster on Pi4 devel/2019-render branch and Xubuntu 18.04 master branch, again no seg faults after 10 channel scans on each system.
Mike
comment:19 Changed 5 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Just add, the seg fault also happens on Rasbian Buster with official mythtv-light 30 or built from source mythtv 31 or devel/2019-render on Pi2/3 or 4.