Opened 2 weeks ago

Closed 3 days ago

Last modified 3 days ago

#13519 closed Bug Report - Crash (fixed)

seg fault while channel scanning hdhomerun connect quatro

Reported by: Mike Bibbings Owned by: Klaas de Waal
Priority: minor Milestone: 31.0
Component: MythTV - Mythtv-setup Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

When channel scanning HDHomerun Connect Quatro in mythtv-setup I sometimes get a seg fault. The point at which the seg fault happens varies. Note this problem occurs with various versions of mythtv (30, devel/2019-render and master)

mythtv-setup log, gdb backtrace and mythtv-version files are attached.

Attachments (14)

mythtv-setup.20191122150226.22047.log (1.3 MB) - added by Mike Bibbings 2 weeks ago.
gdb.txt (63.2 KB) - added by Mike Bibbings 2 weeks ago.
mythtv-setup-version.txt (1.1 KB) - added by Mike Bibbings 2 weeks ago.
valgrind.log (425.5 KB) - added by Mike Bibbings 2 weeks ago.
mythtv-setup.20191123111959.10161.log (295.7 KB) - added by Mike Bibbings 2 weeks ago.
mythtv-setup log to go with valgrind.log
valgrind-19.10_master.log (1.6 MB) - added by Mike Bibbings 13 days ago.
20191125-hdhr-debug.patch (1.9 KB) - added by Klaas de Waal 12 days ago.
HDHR debug output to file ./log/hdhr-dbg.log
hdhr-dbg.log (3.0 KB) - added by Mike Bibbings 12 days ago.
mythtv-setup.20191125111918.2723.log (1.8 MB) - added by Mike Bibbings 12 days ago.
mythtv-setup log to go with hdhr-debug.log
libhdhomerun.so (194.3 KB) - added by Klaas de Waal 11 days ago.
debug build for libhdhomerun 20190621
hdhr-dbg.2.log (973 bytes) - added by Mike Bibbings 11 days ago.
with debug libhdhomerun
mythtv-setup.20191125201104.3699.log (990.9 KB) - added by Mike Bibbings 11 days ago.
with debug libhdhomerun
valgrind-19.10_master6.log (1.2 MB) - added by Mike Bibbings 11 days ago.
with debug libhdhomerun
20191129-hdhr-fix-debug.patch (3.3 KB) - added by Klaas de Waal 8 days ago.
HDHR fix and debug output to file ./log/hdhr-dbg.log

Change History (34)

Changed 2 weeks ago by Mike Bibbings

Changed 2 weeks ago by Mike Bibbings

Attachment: gdb.txt added

Changed 2 weeks ago by Mike Bibbings

Attachment: mythtv-setup-version.txt added

comment:1 Changed 2 weeks ago by Mike Bibbings

Just add, the seg fault also happens on Rasbian Buster with official mythtv-light 30 or built from source mythtv 31 or devel/2019-render on Pi2/3 or 4.

comment:2 Changed 2 weeks ago by Mike Bibbings

HDHomeRun details Model: HDHR5-4DT, Device ID: 1251E907, Firmware: 20190621

comment:3 Changed 2 weeks ago by Klaas de Waal

Could you check the current master with valgrind for memory-related errors and generate also the chanscan log, like:

valgrind --track-origins=yes mythtv-setup -v general,channel,chanscan,record --logpath=./log

Sometimes valgrind is very useful for problems like this.

Changed 2 weeks ago by Mike Bibbings

Attachment: valgrind.log added

comment:4 Changed 2 weeks ago by Mike Bibbings

Klaas,

From a quick look at the valgrind log I just attached, "Invalid read" stands out.

Mike

Changed 2 weeks ago by Mike Bibbings

mythtv-setup log to go with valgrind.log

comment:5 Changed 2 weeks ago by Klaas de Waal

Milestone: needs_triage31.0
Owner: changed from Peter Bennett to Klaas de Waal
Status: newassigned

Changed 13 days ago by Mike Bibbings

Attachment: valgrind-19.10_master.log added

comment:6 Changed 13 days ago by Mike Bibbings

Just added valgrind-19.10_master.log which shows the same problem on Xubuntu 19.10 with master built from source. Ubuntu 19.10 has the latest libhdhomerun and libhdhomerun-dev libraries, Ubuntu 18.04 has older versions.

Changed 12 days ago by Klaas de Waal

Attachment: 20191125-hdhr-debug.patch added

HDHR debug output to file ./log/hdhr-dbg.log

comment:7 Changed 12 days ago by Klaas de Waal

Thanks for testing with the latest&greatest libhdhomerun. This does reduce the search space. I've attached a patch (for today's master) that does enable debug logging from the hdhomerun library. On my system it gives only a single line of output:

20191125-10:16:50 hdhomerun_device_selector_choose_test: device 1410F45C-0 chosen

but there should be more messages if something is wrong. Could you test this on the Ubuntu 19-10?

comment:8 Changed 12 days ago by Mike Bibbings

Hi Klaas,

Tested on ubuntu 19.10, lots of errors in hdhr-dbg.log before it eventually seg faults and of course the errors are in hdhomerun_control_recv_sock.

Attached are the hdhr-dbg.log and associated mythtv-setup log

Just another datapoint, I also retested using a totally isolated network (just to make sure nothing on my main network was affecting anything) using another router with different ethernet cables, same seg fault.

Mike

Last edited 12 days ago by Mike Bibbings (previous) (diff)

Changed 12 days ago by Mike Bibbings

Attachment: hdhr-dbg.log added

Changed 12 days ago by Mike Bibbings

mythtv-setup log to go with hdhr-debug.log

Changed 11 days ago by Klaas de Waal

Attachment: libhdhomerun.so added

debug build for libhdhomerun 20190621

comment:9 Changed 11 days ago by Klaas de Waal

Hi Mike,

Can you (temporarily) replace your libhdhomerun.so with the debug version I just attached and then run the "valgrind --track-origins=yes" again? I expect that valgrind then gives the function names and the line numbers of all frames in the traceback.

N.B. This binary is built from the latest sources from https://www.silicondust.com/support/linux/ (version 20190621) on Fedora-31 but with a bit of luck this might work on your Ubuntu-19.10 system. In the Makefile the CFLAGS setting "-O2" is replaced with "-g".

For the casual readers of this history:

  • a way to identify the location of libhdhomerun is:
$ ldd $(which mythtv-setup) | grep libhdhomerun

Klaas.

Changed 11 days ago by Mike Bibbings

Attachment: hdhr-dbg.2.log added

with debug libhdhomerun

Changed 11 days ago by Mike Bibbings

with debug libhdhomerun

Changed 11 days ago by Mike Bibbings

Attachment: valgrind-19.10_master6.log added

with debug libhdhomerun

comment:10 Changed 11 days ago by Mike Bibbings

Klaas, Added log files from using debug libhdhomerun.

One thing I did note in the mythtv-setup log was AddListener? being triggered at: 2019-11-25 20:13:02.646651 D [3699/3750] SignalMonitor? recorders/hdhrsignalmonitor.cpp:121 (UpdateValues?) - HDHRSigMon[1](1251E907): Tuner status: '0:0:0' 2019-11-25 20:13:02.659983 I [3699/3750] SignalMonitor? recorders/streamhandler.cpp:40 (AddListener?) - SH[1](1251E907): AddListener?(0x177f3058) -- begin

I don't think that is good as signal strength from Tuner status is 0 (threshold is set to 45 as minimum for a lock), this could be a side effect of sock_recv error or incorrect setting of isLocked.

Mike

Last edited 11 days ago by Mike Bibbings (previous) (diff)

comment:11 Changed 9 days ago by Mike Bibbings

Hi Klaas,

Done some more investigating, it appears that first error (and possibly others) in hdhr-dbg.log aligns with the switch in channel scan to the next offset of the frequency being tuned. So maybe some data is not being refreshed with the frequency change say from 482000000 (offset 0) to 482166670 (offset 1) and then 481833330 (offset 2). Frequency 482000000 and associated offsets should never lock (not broadcast by my local transmitter). I have run other types of channel scan e.g. Full Scan (Tuned)and there are no errors in the hdhr-dbg.log. Only Full Scan uses the offset frequencies.

Mike

comment:12 Changed 9 days ago by Klaas de Waal

Hi Mike,

About the offsets. To test that hypothesis you could do a "Full Scan" and select Netherlands. This has a list of DVB-T/T2 channels/frequencies that do not have offsets.

In the meantime, thanks to your logs, I have also come to some tentative conclusions.

  1. Device failure

It could be possible that your HDHR device is slowly breaking down. According to the mailing list the power supply of the HDHR devices does fail quite often after a few years and this might give incidental errors like the ones in the hdhr-dbg.log file.

  1. Thread-safety (or lack thereof)

The errors as reported by valgrind in file valgrind-19.10-master6.log show that the failing function, hdhomerun_sock_recv, is called from two different threads: Thread 37 Signal Monitor (see line 10876) and Thread 36 Scanner (see line 15607). Looking carefully at the valgrind messages following line 10876 it shows that the "invalid read" is done by the signal monitor; it is reading the value just free'd by the scanner.

  1. Why does this not happen more often

It is only when the communication fails that in libhdhomerun the communication is closed and then restarted. This can happen with a broken HDHR device or with anything else that disturbs the communication. When the network communications works correct then there are never retries that involve a close/open/read again. Only when it fails you can have the situation in which the "cs->sock" is free'd in one thread and used in another thread.

The valgrind errors are for me the proof that there is a thread-safety problem. Even if they are only triggered when doing offset scans, what your test suggest, it is still wrong. Therefore the next step is to make sure that the calls to the libhdhomerun functions from the hdhr signal monitor and the hdhr stream handler are serialized. One more mutex should do the job.

In the meantime, in the case that your problems are caused by the HDHR power supply please do not throw it away yet otherwise there is no way to test the fixes....

Klaas.

comment:13 in reply to:  12 Changed 9 days ago by Gary Buhrmaster

Replying to Klaas de Waal:

  1. Thread-safety (or lack thereof)

The errors as reported by valgrind in file valgrind-19.10-master6.log show that the failing function, hdhomerun_sock_recv, is called from two different threads: Thread 37 Signal Monitor (see line 10876) and Thread 36 Scanner (see line 15607). Looking carefully at the valgrind messages following line 10876 it shows that the "invalid read" is done by the signal monitor; it is reading the value just free'd by the scanner.

I have this recollection this (or something like it) was reported in a previous ticket(*), and SiliconDust? indicated that you needed to create a new object for each thread...

(*) Yes, found it! #9027 which was closed WONTFIX at the time. Note I have not actually spent time to determine if this is *actually* the same issue, so this might be an untamed fowl chase, but the symptoms, in the same library, certainly raise an eyebrow.

Last edited 9 days ago by Gary Buhrmaster (previous) (diff)

comment:14 Changed 9 days ago by Gary Buhrmaster

(re-posted, since it appears the email after the edit did not go out?)

I have this recollection this (or something like it) was reported in a previous ticket(*), and SiliconDust?? indicated that you needed to create a new object for each thread...

(*) Yes, found it! #9027 which was closed WONTFIX at the time. Note I have not actually spent time to determine if this is *actually* the same issue, so this might be an untamed fowl chase, but the symptoms, in the same library, certainly raise an eyebrow.

comment:15 Changed 9 days ago by Mike Bibbings

Hi Klaas,

My HDHomeRun Connect Quatro is new (September 2019), so power supply failure is not likely.

I ran 10 channel scans using Netherlands with no seg faults and no errors in hdhr-dbg.log.

One other thing I noticed in the code is that in HDHRStreamHandler::GetTunerStatus? does not return status of the call hdhomerun_device_get_tuner_status(m_hdhomerun_device, nullptr, status), this call does have the usual result codes for Get operations i.e.

  • Returns 1 if the operation was successful.
  • Returns 0 if the operation was rejected.
  • Returns -1 if a communication error occurred.

so HDHRSignalMonitor::UpdateValues?(void) could be using invalid data.

Mike

comment:16 Changed 9 days ago by Mike Bibbings

Another datapoint.

I also ran 10 channel scans using Netherlands on Raspberry Pi4 Raspbian Buster which has older libhdhomerun4: Installed: 20180817-2 with mythbackend v31-Pre-1301-g9dc72f3b81 (devel/2019-render branch). No seg faults.

Mike

Changed 8 days ago by Klaas de Waal

HDHR fix and debug output to file ./log/hdhr-dbg.log

comment:17 Changed 8 days ago by Klaas de Waal

Mike,

I've just added a patch that can be applied to today's master. This should fix the segfault that you have with scanning and it should also fix the crash in the backend as reported in ticket #9027. The HDHR debug output is also still present. Can you please test this with the United Kingdom frequency list?

As it happens the mutex to lock the HDHR tuner instance was already present and it was already used in the TunerGet? and TunerSet? functions, but not in UpdateValues?.

About why it happens on the UK scans

The UK scans have offsets defined; if there is no lock on the exact frequency there is a try on +160kHz and on -160kHz. Log output shows that for the offset tuning the signal monitor is not stopped. For a non-offset scan, such as with the Netherlands channel list,the signal monitor is started and stopped for each frequency.

About the function value of GetTunerStatus?

You are correct about the function value of hdhomerun_device_get_tuner_status. However, this value is ignored. The status values are returned in a struct hdhomerun_tuner_status_t and this is, as I can see it, done correct.

About ticket #9027.

Gary, thanks for digging this up. I was not aware of this one. The gdb log shows that it is indeed a similar problem; here the call to HDHRStreamHandler::UpdateValues? from the HDHR signal monitor clashes with the HDHRStreamHandler::run. This problem is still present in today's code but should now also be fixed.

comment:18 Changed 8 days ago by Mike Bibbings

Hi Klaas,

Applied the patch 20191129-hdhr-fix-debug.patch, no more seg faults or errors in hdhr-dbg.log

Tested on Xubuntu 19.10 mythtv master with debug libhdhomerun in place. Also tested with older libhdhomerun, Raspbian Buster on Pi4 devel/2019-render branch and Xubuntu 18.04 master branch, again no seg faults after 10 channel scans on each system.

Mike

comment:19 Changed 3 days ago by Klaas de Waal <kdewaal@…>

Resolution: fixed
Status: assignedclosed

In ec2a518f20/mythtv:

Extend instance locking in HDHomeRun stream handler

Extend the instance locking already present in HDHRStreamHandler
functions TunerGet? and TunerSet? to GetTunerStatus?.
This fixes a segfault in libhdhomerun, reported in ticket #13519,
that appeared when doing a UK DVB-T/T2 scan with offset frequencies.
The segfault happened because GetTunerStatus? was called from the
HDHRSignalMonitor in a different thread than the TunerGet/TunerSet? functions.
Extend the instance locking also to the HDHRStreamHandler::run function
where the libhdhomerun functions are called. This should fix the segfault
in the mythbackend startup as reported in ticket #9027.

Refs #9027

Fixes #13519

comment:20 Changed 3 days ago by Klaas de Waal <kdewaal@…>

In a27754ae7/mythtv:

Extend instance locking in HDHomeRun stream handler

Extend the instance locking already present in HDHRStreamHandler
functions TunerGet? and TunerSet? to GetTunerStatus?.
This fixes a segfault in libhdhomerun, reported in ticket #13519,
that appeared when doing a UK DVB-T/T2 scan with offset frequencies.
The segfault happened because GetTunerStatus? was called from the
HDHRSignalMonitor in a different thread than the TunerGet/TunerSet? functions.
Extend the instance locking also to the HDHRStreamHandler::run function
where the libhdhomerun functions are called. This should fix the segfault
in the mythbackend startup as reported in ticket #9027.

Refs #9027

Fixes #13519

(cherry picked from commit ec2a518f20385b06f0fe730fa6f567409f34d50e)
Signed-off-by: Klaas de Waal <kdewaal@…>

Note: See TracTickets for help on using tickets.