Opened 14 years ago
Closed 14 years ago
#9792 closed Bug Report - Hang/Deadlock (Fixed)
Deadlocked scheduler
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | 0.25 |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
Hi,
Today I have quite unpleasant system state (not seen ever before) where:
-at 9:25PM I discovered be missed 2 recordings (6:30PM and 7:00PM)
-be log since 5:35PM shows only auto-expire entries
-mythweb at 9:25PM in upcoming recordings page shows 7:00PM rec. as still ongoing (show starts at 7PM and should end 7:30PM)
-mythweb status shows all tuners idle
-schedulled recording at 9:30PM not started (mythweb upcoming recordings page shows 7:00PM rec. as still ongoing)
-I can start LiveTV and mythweb status shows correctly shows tuner
It looks like scheduler is deadlocked and no any new recordings can be started
BE backtrace attached.
I'm not sure is this problem related to 9773 - but symptoms are different as in this case for all FE backends seems to be fully functional.
Attachments (11)
Change History (31)
Changed 14 years ago by
Attachment: | trace3.txt added |
---|
comment:1 Changed 14 years ago by
Duplicate of [9773]. Again the problem is not returning from a DB query.
comment:2 Changed 14 years ago by
Daniel, Thx. It looks like I have to do something with DB - right ? Just for sanity checking - I'm attaching another trace with exactly the same symphtoms. It will be really great if You find minute ot two and look at it - just for confirmation....
Also may You advice me what will be best to do. Sorry bothering You with such question but You are person best knowing context of this issue.
Which option is best to choose:
1.backup, clean, restore tables
2.clean DB and go with virgin tables
3.keep tables and downgrade engine to older version
4.<maybe You have any great idea>
thx
comment:4 Changed 14 years ago by
Well,
Week ago I do backup,delete,create,restore DB. I'm not sure also move I to default mysql settings at that time or earlier. Anyway - since week I was on my.cfg from arch package).
Week of stressing (45-50 rec. per day; alot of TV zapping with LiveTV) - all was OK. Today I was ready to put comment on this ticket that issue resolved.
3h ago I compiled latest HEAD (g62d7084), install and also decide to tune DB by mysqltuner.pl script. Following changes are added into my.conf:
changed table_open_cache 64->192
added query_cache_size = 12M
added query_cache_type = 1
added join_buffer_size = 384k
added tmp_table_size = 24M
added max_heap_table_size = 24M
added thread_cache_size = 8
Tuning gives mysqltuner.pl output like attached file. (also output on default settings is attached).
After tuning DB tuning i.e. sched time was reduced from 83sec to 23sec. Refreshing rec. UI is really faster with tuned DB.
1h I was really sad to discover deadlock again. Trace attached.
Hmm - first what comes to mind is tuning impact. Is it possible that this is DB bug ?
comment:5 Changed 14 years ago by
If you look at "Thread 20 (Thread 0xaaba4b70 (LWP 1532)):" it still looks like it is deadlocked in the mysql libraries.
What Linux distro and release are you running? (I'm assuming you are up to date with fixes)
Also from the timing it sounds like running mysqltuner.pl was the immediate cause of the deadlock. This may be a false positive, but had you run that script on the previous database that was deadlocking?
comment:6 Changed 14 years ago by
Daniel,
I'm running on ArchLinux? (in fact I'm using Arch as base for my own distro). In context of fixes Arch is rolling release distro - so sys is usually more fresh than other distros. Average delay between upstream and arch packagers is week or two.
Last pkg sync between my distro and arch pkg base was at 20110513 - so I would say OS is latest greatest.
As I understand deadlock is in mysql libraries (so not in myth code). Is this mean issue is in OS ?. If so - shame on Arch (other than that - it is really great distro for appliances).
I started to drill in DB optimization and currently I'm experimenting with query_caching strategies.
Now I'm testing "Do not cache results in or retrieve results from the query cache". 24h so far OK
"Cache all cacheable query results except for those that begin with SELECT SQL_NO_CACHE"
and "Cache results only for cacheable queries that begin with SELECT SQL_CACHE" both gave deadlocks within hours.
In past I wasn't use mysqltuner.pl - but I had tuned DB settings from some posts in users-mailing list. I just checked - my tunings were touching caching strategy. It was set to "Cache all cacheable query results except for those that begin with SELECT SQL_NO_CACHE". If current tests with "Do not cache results in or retrieve results from the query cache" will finalize OK - it will probably mean that for some reason MySql? 5.1.11, 5.1.12 & 5.5.12 (only those I tested) can't use caching with MythTV on Archlinux....
Daniel - btw thx for Your time helping to deal with issue probably not related to myth !
comment:7 Changed 14 years ago by
Hi,
After week of heavy stress tests I'm happy to report that if I optimize mysql query caching with:
"Do not cache results in or retrieve results from the query cache"
system works stable. Test scenarios, results I will post on relevant to ticket9773 thread on dev-list as I think this ticket and 9773 have common root cause related to mysql running on ArchLinux?.
Situation where mysql query caching with:
"Cache all cacheable query results except for those that begin with SELECT SQL_NO_CACHE" and
"Cache results only for cacheable queries that begin with SELECT SQL_CACHE"
are giving deadlocks is still strange for me as issue is with 3 diferent mysql versions and 4 full system upgrades within quite long periods of time (so almost all key OS components are changed).
I think both tickets can be closed with status invalid.
Big thx for Your help !
comment:8 Changed 14 years ago by
There are mysql bug reports concerning query_cache_type and query_cache_size. Apparently, query_cache_size = 0 is the default but is not enough to turn query caching off in some versions of mysql.
In any case, this turns out to not work for me. I set query_cache_type = 0, and I still get lock ups.
I have a master BE and two slave BE's. This is fairly new for me. Until recently, I had a single BE (most often with single FE on the same machine). I did not see these backend lock ups when I only had one!
comment:9 Changed 14 years ago by
Ian,
Thx for this post. Your insights intrigued me so I do quick research on bugs.mysql and it seems that query_cache has still unresolved problems. Many ppl are suggesting turn-off this feature and what is more important - query_cache_size=0 is not enough to completely turn it off as query_cache_type is defaulting to 1 - and being non-zero still acquires internal DB spin-locks. It is highly recommended to set both (query_cache_size and query_cache_type) to 0. I do this and will observe system stability. Thx again and good luck !
comment:10 Changed 14 years ago by
Hi,
Since my last post into this ticket I do following:
-build fresh mysql 5.1.57 (most recent from 5.1 mainline) & rebuild all dependent pkg (qt, perl/python, etc)
-reinstall DB from full scratch (also with deleting DB dir). Also I disable query_cache as described in my previous post.
-restore myth BD via mythconverg_restore.pl
-turn off all other apps using mysql. Now there is only mythtv.
It was 3 weeks ago. Since that time I had 2 times situation when:
-myth is still recording via already defined recording rules.
-no any new rule can be added (mythweb, UI; I can add it normally but rule simply is not appearing in recording rules).
-after few hours of such state some events causes unblocking scheduler and all missed records are queued and triggered to record at once. This produces series of 0B recordings. I however still can't add new rules into recording rules list.
-be restart helps.
From usage point of view - scheduler is programmed by me & wife. I entered sometimes ago many custom advanced rules. My wife is using myth exactly like VCR - she is going into EPG and manually program their shows usually with "record once". She is doing this really quickly and usually 5-10 shows during programming session.
History of last deadlock: wife ended programing 3 shows approx at 11PM. Next myth finished recording at 11:05PM. Next be log for 6h entries are only about eutoexpiring. BE misses 6 recordings in this period. At 6:50AM there is DB related entry log. Exactly after this BE kicks all queued delayed recordings to record. All them are 0B.
I'm attaching BE debug log and trace from BE when it was in "no accepting new rules" state. There is NO any entry in any system log saying anything about mysql error, restart, whatever.
Hope log/trace helps. (I don't want to mention about v.hard time with WAF. This is bad luck. Lastly I spent considerable time bisecting FE issues but in reality wife sees her missed recordings. Oh...)
br
comment:11 Changed 14 years ago by
@ Ian - What version of mysql is being used and which distribution? I'd like to see if there are any common factors with other database related deadlocks.
comment:13 Changed 14 years ago by
Type: | Bug Report - General → Bug Report - Hang/Deadlock |
---|
comment:14 Changed 14 years ago by
Enable automatic reconnection for MySQL connections.
After 155138a127 changed the minimum supported Qt version to 4.6, we now have support for MYSQL_OPT_RECONNECT in QSqlDatabase::setConnectOptions().
Note that before MySQL 5, reconnecting connections were the default in the MySQL C API (so, were always enabled for the Qt-MySQL driver). Unfortunately, depending on how the application is written, reconnections may cause issues (since reconnections have effects on transactions, table locks, temporary tables, session and user variables, prepared statements, and such), so MySQL 5.0.3 changed the default to disable automatic reconnections, and MySQL 5.0.13 added MYSQL_OPT_RECONNECT to make reconnection an application-controllable option. Since the Qt-MySQL driver didn't have support for specifying MYSQL_OPT_RECONNECT, it was impossible to get reconnecting connections with MySQL 5.0.3+ in Qt applications. Finally in Qt4.6, Qt devs added support for specifying MYSQL_OPT_RECONNECT, so we can re-enable automatic reconnect.
Thanks to David Edwards for the reminder to set this option.
Branch: master Changeset: dd2d7a14d10c55f8bc8d12e944adb3b3c9ac8cea
comment:15 Changed 14 years ago by
Hi, I rebuild BE/FE to dd2d7a14. Unfortunately I had deadlock again. Trace, BE log, DB stats/proc_list attached
comment:16 Changed 14 years ago by
Fix SQL reconnection logic. Refs #9704. Refs #9773. Refs #9792.
What this does is get rid of the old KickDatabase?() logic and replaces it with a simple reconnect if the DB socket is disconnected, and also gets rid of a recent re-enablement of the automatic reconnects in [dd2d7a14d1] and importantly it extends the reconnection logic in [528324d8ba] so that it resends the prepare() and rebinds the bindings when a reconnect happens in an exec().
This also makes the QSqlQuery parent of MSqlQuery private and adds some wrappers so we can prevent MSqlQuery users from accessing methods that may invalidate some assumption in MSqlQuery + it adds debugging wrappers for first(), last(), previous(), and seek() like we already had for next().
A little bit of history is in order.
In Qt3 the SQL driver automatically reconnected whenever a connection was lost. But the first query after this auto-reconnect always failed. So in 2003 mdz added KickDatabase?() which would recover disconnect and reconnect an automatically all was good with the world.
In Qt4 the QSqlQuery::prepare() started to actually do prepared statements instead of emulating them. This caused a small problem in now it was possible that a disconnect happened between the call to prepare() and exec() and then the exec() would fail. But odds of this were still pretty low for most uses of the database in MythTV. But at some point trolltech also got rid of the automatic reconnections which caused us trouble in Qt3. To fix this we added reconnection logic in [528324d8ba] and this is where the real trouble began. While this reconnection logic really does exactly the same thing that the auto-reconnection logic did in Qt3, the fact that prepare() doesn't just emulate prepared statements in Qt4 means that it's now that when we reconnect to the database in exec() the prepared statement has not been sent to the database.
Note: This should be back-ported to 0.24-fixes after a reasonable burn-in time in master.
Branch: master Changeset: 4dfcdb8dd0c80d096ed95d5eccd489c692efa90d
comment:17 Changed 14 years ago by
Few bits missing in [4dfcdb8dd]. Refs #9704. Refs #9773. Refs #9792.
Branch: master Changeset: 1fb0a77fc17089b5cd2bf36ef07998bf68461261
comment:18 Changed 14 years ago by
Note: until all the db reconnection issues are resolved and this ticket is possibly fixed, I will only be referencing the #9704 on reconnection related commits.
comment:19 Changed 14 years ago by
Hi, After Daniel's MySql? reconnect commits I didn't observe this issue within 4 weeks on 24/7 working production system. From my perspective we can close this ticket with status resolved.
comment:20 Changed 14 years ago by
Milestone: | unknown → 0.25 |
---|---|
Resolution: | → Fixed |
Status: | new → closed |
BE trace