My current daytime setup is for various reasons a Windows XP installation with Ubuntu Jaunty running inside VirtualBox. I use Microsoft Windows for Outlook, SQL Navigator and some web browsing while using the Linux installation for development. This morning I started Firefox in Windows XP, changed focus to VirtualBox or some other window, and when I returned to Firefox it was frozen. I followed the standard Windows trouble-shooting procedure: reboot and get a coffee. When I was logged in again in both Windows and Ubuntu I got the same issue with Firefox in Linux. WTF?
At least I have the tools in Ubuntu to debug this issue. This is a simplified version and approximate order of what I did.
First, create ~/.gdbinit to make GDB a tad more user-friendly:
set pagination off set radix 16 set print pretty set history save on
Second, add ddebs.ubuntu.com to /etc/apt/sources.list:
deb //ddebs.ubuntu.com/ jaunty main restricted universe multiverse
deb //ddebs.ubuntu.com/ jaunty-updates main restricted universe multiverse
deb //ddebs.ubuntu.com/ jaunty-security main restricted universe multiverse
deb //ddebs.ubuntu.com/ jaunty-proposed main restricted universe multiverse
Install some debug symbols:
sudo apt-get install firefox-3.0-dbgsym libnspr4-0d-dbgsym xulrunner-1.9-dbgsym
Debugging time!
$ gdb `which firefox` `pidof firefox`
…
(gdb) thread apply all bt
…
Thread 2 (Thread 0xb08eab90 (LWP 4253)):
…
#9 0xb7e16c7f in getaddrinfo () from /lib/tls/i686/cmov/libc.so.6
#10 0xb7c8d739 in PR_GetAddrInfoByName (hostname=0xbc01ff4 “safebrowsing-cache.google.com”, af=0x0, flags=0x8020) at prnetdb.c:2026
#11 0xb7267940 in nsHostResolver::ThreadFunc (arg=0x92d9fd8) at nsHostResolver.cpp:697
…Thread 1 (Thread 0xb7d4b6d0 (LWP 4243)):
#0 0xb8003422 in __kernel_vsyscall ()
#1 0xb7fe30e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb7c94ed9 in PR_WaitCondVar (cvar=0xcd1ebf8, timeout=0xffffffff) at ptsynch.c:405
#3 0xb7c94f57 in PR_Wait (mon=0xd47d178, timeout=0xffffffff) at ptsynch.c:584
#4 0xb726621b in nsDNSService::Resolve (this=0x92d4b00, hostname=@0xabaf730, flags=<value optimized out>, result=0xbff19ac0) at nsDNSService2.cpp:49
So, we have a thread that is resolving “safebrowsing-cache.google.com” and another thread waiting for this hostname to be resolved. Could this be an issue?
Back at the command line, is there an issue with this domain name? Checking on my local computer:
$ host safebrowsing-cache.google.com
;; connection timed out; no servers could be reached
Trouble at Google? I must confirm that, so I login to one of my servers and run the same command:
$ host safebrowsing-cache.google.com
;; Truncated, retrying in TCP mode.
safebrowsing-cache.google.com is an alias for safebrowsing.cache.l.google.com.
safebrowsing.cache.l.google.com has address 74.125.10.92
…
Works fine, but what does Truncated, retrying in TCP mode mean? I will investigate that later.
Apparently the company firewall is unable to resolve this domain name, at least for the time being. Google Safe Browsing is built into Firefox 3, so how do I disable it? I looked in about:config and yes, there was a setting called browser.safebrowsing.enabled set to true. I set it to false and… Firefox still froze. Looking at about:config again, I found browser.safebrowsing.malware.enabled and set that one to false as well. Now I am able to write this blog post!
Disabling these configuration options is only curing the symptoms, not the disease. But can I cure an enterprise DNS server that fails to handle truncated responses? I doubt it.
I saw this problem too, very annoying. Trying to get to the bottom of the ‘truncated’ problem – all ways to look up that domain seem to fail on my linux box. I suppose the server I use ignores TCP on port 53. Using wireshark the packet returned by UDP seems to be OK. Maybe it’s a bug that the packet seems truncated when it’s really OK.
I found this link with a relevant extract from RFC1123:
//mailman.powerdns.com/pipermail/pdns-users/2003-October/000783.html
The DNS response for safebrowsing-cache.google.com returns 24 address records, but it seems like 15 is the limit to avoid truncation.
Obviously not all DNS servers handle truncation properly, or they suffer from firewalls that block TCP access to port 53.
Add “minimal-responses yes;” in your bind9 configuration or ask your ISP to do so.
/etc/bind/named.conf.options
options {
// …
// only add records to the authority and additional data sections when required
minimal-responses yes;
};
By doing this Google’s safebrowsing-cache.google.com
will fit in a standard UDP DNS packet otherwise with additional section it will be TCP DNS packet.
check the result with or without minimal-responses of
dig safebrowsing-cache.google.com
With minimal-responses no (default on Bind9)
IP (tos 0x0, ttl 64, id 40627, offset 0, flags [none], proto UDP (17), length 75) 127.0.0.1.49553 > 127.0.0.1.53: [bad udp cksum 6429!] 40815+ A? safebrowsing-cache.google.com. (47)
IP (tos 0x0, ttl 64, id 40628, offset 0, flags [none], proto UDP (17), length 526) 127.0.0.1.53 > 127.0.0.1.49553: 40815| q: A? safebrowsing-cache.google.com. 25/2/0 safebrowsing-cache.google.com.[|domain]
IP (tos 0x0, ttl 64, id 4337, offset 0, flags [DF], proto TCP (6), length 60) 127.0.0.1.57552 > 127.0.0.1.53: S, cksum 0x30e4 (correct), 272739230:272739230(0) win 32792
IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 127.0.0.1.53 > 127.0.0.1.57552: S, cksum 0x6453 (correct), 281541131:281541131(0) ack 272739231 win 32768
IP (tos 0x0, ttl 64, id 4338, offset 0, flags [DF], proto TCP (6), length 52) 127.0.0.1.57552 > 127.0.0.1.53: ., cksum 0x4b76 (correct), 1:1(0) ack 1 win 513
IP (tos 0x0, ttl 64, id 4339, offset 0, flags [DF], proto TCP (6), length 101) 127.0.0.1.57552 > 127.0.0.1.53: P 1:50(49) ack 1 win 513 5198+[|domain]
IP (tos 0x0, ttl 64, id 16739, offset 0, flags [DF], proto TCP (6), length 52) 127.0.0.1.53 > 127.0.0.1.57552: ., cksum 0x4b46 (correct), 1:1(0) ack 50 win 512
14:44:32.883449 IP (tos 0x0, ttl 64, id 16740, offset 0, flags [DF], proto TCP (6), length 632) 127.0.0.1.53 > 127.0.0.1.57552: P 1:581(580) ack 50 win 512 5198 q:[|domain]
IP (tos 0x0, ttl 64, id 4340, offset 0, flags [DF], proto TCP (6), length 52) 127.0.0.1.57552 > 127.0.0.1.53: ., cksum 0x48ef (correct), 50:50(0) ack 581 win 531
IP (tos 0x0, ttl 64, id 4341, offset 0, flags [DF], proto TCP (6), length 52) 127.0.0.1.57552 > 127.0.0.1.53: F, cksum 0x48ee (correct), 50:50(0) ack 581 win 531
IP (tos 0x0, ttl 64, id 16741, offset 0, flags [DF], proto TCP (6), length 52) 127.0.0.1.53 > 127.0.0.1.57552: F, cksum 0x4900 (correct), 581:581(0) ack 51 win 512
IP (tos 0x0, ttl 64, id 4342, offset 0, flags [DF], proto TCP (6), length 52) 127.0.0.1.57552 > 127.0.0.1.53: ., cksum 0x48ed (correct), 51:51(0) ack 582 win 531
With minimal-responses yes
IP (tos 0x0, ttl 64, id 40623, offset 0, flags [none], proto UDP (17), length 75) 127.0.0.1.40215 > 127.0.0.1.53: [bad udp cksum 8a13!] 55747+ A? safebrowsing-cache.google.com. (47)
IP (tos 0x0, ttl 64, id 40624, offset 0, flags [none], proto UDP (17), length 494) 127.0.0.1.53 > 127.0.0.1.40215: 55747 q: A? safebrowsing-cache.google.com. 25/0/0 safebrowsing-cache.google.com.[|domain]
Best Regards,
Guy Baconniere
Thanks a lot Guy, I’ll see if the I get get the network people to set the “minimal-responses yes” option… 🙂