My current daytime setup is for various reasons a Windows XP installation with Ubuntu Jaunty running inside VirtualBox. I use Microsoft Windows for Outlook, SQL Navigator and some web browsing while using the Linux installation for development. This morning I started Firefox in Windows XP, changed focus to VirtualBox or some other window, and when I returned to Firefox it was frozen. I followed the standard Windows trouble-shooting procedure: reboot and get a coffee. When I was logged in again in both Windows and Ubuntu I got the same issue with Firefox in Linux. WTF?
At least I have the tools in Ubuntu to debug this issue. This is a simplified version and approximate order of what I did.
First, create ~/.gdbinit to make GDB a tad more user-friendly:
set pagination off
set radix 16
set print pretty
set history save on
Second, add ddebs.ubuntu.com to /etc/apt/sources.list:
deb //ddebs.ubuntu.com/ jaunty main restricted universe multiverse
deb //ddebs.ubuntu.com/ jaunty-updates main restricted universe multiverse
deb //ddebs.ubuntu.com/ jaunty-security main restricted universe multiverse
deb //ddebs.ubuntu.com/ jaunty-proposed main restricted universe multiverse
Install some debug symbols:
sudo apt-get install firefox-3.0-dbgsym libnspr4-0d-dbgsym xulrunner-1.9-dbgsym
Debugging time!
$ gdb `which firefox` `pidof firefox`
…
(gdb) thread apply all bt
…
Thread 2 (Thread 0xb08eab90 (LWP 4253)):
…
#9 0xb7e16c7f in getaddrinfo () from /lib/tls/i686/cmov/libc.so.6
#10 0xb7c8d739 in PR_GetAddrInfoByName (hostname=0xbc01ff4 “safebrowsing-cache.google.com”, af=0x0, flags=0x8020) at prnetdb.c:2026
#11 0xb7267940 in nsHostResolver::ThreadFunc (arg=0x92d9fd8) at nsHostResolver.cpp:697
…
Thread 1 (Thread 0xb7d4b6d0 (LWP 4243)):
#0 0xb8003422 in __kernel_vsyscall ()
#1 0xb7fe30e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb7c94ed9 in PR_WaitCondVar (cvar=0xcd1ebf8, timeout=0xffffffff) at ptsynch.c:405
#3 0xb7c94f57 in PR_Wait (mon=0xd47d178, timeout=0xffffffff) at ptsynch.c:584
#4 0xb726621b in nsDNSService::Resolve (this=0x92d4b00, hostname=@0xabaf730, flags=<value optimized out>, result=0xbff19ac0) at nsDNSService2.cpp:49
So, we have a thread that is resolving “safebrowsing-cache.google.com” and another thread waiting for this hostname to be resolved. Could this be an issue?
Back at the command line, is there an issue with this domain name? Checking on my local computer:
$ host safebrowsing-cache.google.com
;; connection timed out; no servers could be reached
Trouble at Google? I must confirm that, so I login to one of my servers and run the same command:
$ host safebrowsing-cache.google.com
;; Truncated, retrying in TCP mode.
safebrowsing-cache.google.com is an alias for safebrowsing.cache.l.google.com.
safebrowsing.cache.l.google.com has address 74.125.10.92
…
Works fine, but what does Truncated, retrying in TCP mode mean? I will investigate that later.
Apparently the company firewall is unable to resolve this domain name, at least for the time being. Google Safe Browsing is built into Firefox 3, so how do I disable it? I looked in about:config and yes, there was a setting called browser.safebrowsing.enabled set to true. I set it to false and… Firefox still froze. Looking at about:config again, I found browser.safebrowsing.malware.enabled and set that one to false as well. Now I am able to write this blog post!
Disabling these configuration options is only curing the symptoms, not the disease. But can I cure an enterprise DNS server that fails to handle truncated responses? I doubt it.