[radvd-devel-l] losing default route

Pekka Savola pekkas at netcore.fi
Mon Aug 8 03:36:44 EDT 2005


Hi,

I've gone through this extensily off-list with Tomasz, and it appears 
this is caused by a kernel bug at least in some versions of Linux.

Attached is the patch that I'm proposing to make the timer handling 
more robust on Linuxes.

Comments?  I'll commit it otherwise sometime this week.

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
-------------- next part --------------
Index: CHANGES
===================================================================
RCS file: /work/cvsroot/radvd/CHANGES,v
retrieving revision 1.43
diff -u -r1.43 CHANGES
--- CHANGES	26 Jul 2005 19:21:04 -0000	1.43
+++ CHANGES	8 Aug 2005 07:33:25 -0000
@@ -1,5 +1,11 @@
 $Id: CHANGES,v 1.43 2005/07/26 19:21:04 psavola Exp $
 
+08/08/2005	Implement more robust timer handler, especially
+		because some Linux kernels don't seem to behave
+		all that well; see http://lkml.org/lkml/2005/4/29/163.
+		Based on extensive testing & reports by
+		Tomasz Grobelny.
+
 07/24/2005	Implement a new logging method "stderr_syslog" which is
 		now also the default.  Everything is logged on syslog,
 		while the most important messages (i.e., start-up failures)
Index: timer.c
===================================================================
RCS file: /work/cvsroot/radvd/timer.c,v
retrieving revision 1.7
diff -u -r1.7 timer.c
--- timer.c	24 Jul 2005 08:04:55 -0000	1.7
+++ timer.c	8 Aug 2005 07:33:25 -0000
@@ -25,6 +25,7 @@
 };
 
 static void alarm_handler(int sig);
+int inline check_time_diff(struct timer_lst *tm, struct timeval tv);
 
 static void
 schedule_timer(void)
@@ -86,6 +87,7 @@
 
 	lst = &timers_head;
 
+	/* the timers are in the list in the order they expire, the soonest first */
 	do {
 		lst = lst->next;
 	} while ((tm->expires.tv_sec > lst->expires.tv_sec) ||
@@ -128,13 +130,24 @@
 {
 	struct timer_lst *tm, *back;
 	struct timeval tv;
-
 	gettimeofday(&tv, NULL);
 	tm = timers_head.next;
 
-	while ((tm->expires.tv_sec < tv.tv_sec)
-			|| ((tm->expires.tv_sec == tv.tv_sec) 
-			    && (tm->expires.tv_usec <= tv.tv_usec)))
+	/*
+	 * This handler is called when the alarm goes off, so at least one of
+	 * the interfaces' timers should satisfy the while condition.
+	 *
+	 * Sadly, this is not always the case, at least on Linux kernels:
+	 * see http://lkml.org/lkml/2005/4/29/163. :-(.  It seems some
+	 * versions of timers are not accurate and get called up to a couple of
+	 * hundred microseconds before they expire.
+	 *
+	 * Therefore we allow some inaccuracy here; it's sufficient for us
+	 * that a timer should go off in a millisecond.
+	 */
+
+	/* unused timers are initialized to LONG_MAX so we skip them */
+	while (tm->expires.tv_sec != LONG_MAX && check_time_diff(tm, tv))
 	{		
 		tm->prev->next = tm->next;
 		tm->next->prev = tm->prev;
@@ -158,3 +171,34 @@
 	tm->handler = handler;
 	tm->data = data;
 }
+
+int inline
+check_time_diff(struct timer_lst *tm, struct timeval tv)
+{
+	struct itimerval diff;
+	memset(&diff, 0, sizeof(diff));
+
+	#define ALLOW_CLOCK_USEC 1000
+
+	timersub(&tm->expires, &tv, &diff.it_value);
+	dlog(LOG_DEBUG, 5, "check_time_diff, difference: %ld sec + %ld usec",
+		diff.it_value.tv_sec, diff.it_value.tv_usec);
+
+	if (diff.it_value.tv_sec <= 0) {
+		/* already gone, this is the "good" case */
+		if (diff.it_value.tv_sec < 0)
+			return 1;
+#ifdef __linux__ /* we haven't seen this on other OSes */
+		/* still OK if the expiry time is not too much in the future */
+		else if (diff.it_value.tv_usec > 0 &&
+		            diff.it_value.tv_usec <= ALLOW_CLOCK_USEC) {
+			dlog(LOG_DEBUG, 4, "alarm_handler clock was probably off by %ld usec, allowing %u",
+			     tm->expires.tv_usec-tv.tv_usec, ALLOW_CLOCK_USEC);
+			return 2;
+		}
+#endif /* __linux__ */
+		else /* scheduled intentionally in the future? */
+			return 0;
+	}
+	return 0;
+}


More information about the radvd-devel-l mailing list