2008年5月16日

freebsd-stable Digest, Vol 252, Issue 8

Send freebsd-stable mailing list submissions to
freebsd-stable@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-stable-request@freebsd.org

You can reach the person managing the list at
freebsd-stable-owner@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."


Today's Topics:

1. Re: thread scheduling at mutex unlock (David Xu)
2. Re: thread scheduling at mutex unlock (Daniel Eischen)
3. Upgrade path from 5.5? (Jason Porter)
4. Re: bin/40278: mktime returns -1 for certain dates/timezones
when it should normalize (Gavin Atkinson)
5. Re: how much memory does increasing max rules for IPFW take
up? (Vivek Khera)
6. Re: how much memory does increasing max rules for IPFW take
up? (Jeremy Chadwick)
7. Re: Upgrade path from 5.5? (Sergey N. Voronkov)
8. cron hanging on to child processes (Pete French)
9. Re: thread scheduling at mutex unlock (Daniel Eischen)
10. RE: thread scheduling at mutex unlock (David Schwartz)
11. Re: thread scheduling at mutex unlock (Brent Casavant)
12. RE: thread scheduling at mutex unlock (David Schwartz)
13. Re: thread scheduling at mutex unlock (Andriy Gapon)
14. Re: thread scheduling at mutex unlock (Andrew Snow)
15. Re: thread scheduling at mutex unlock (Andriy Gapon)
16. Re: thread scheduling at mutex unlock (Andriy Gapon)
17. Re: thread scheduling at mutex unlock (Brent Casavant)
18. RE: thread scheduling at mutex unlock (David Schwartz)
19. RE: cvsup.uk.FreeBSD.org (Dr Josef Karthauser)
20. today's build is causing errors for me (Rob Lytle)
21. Re: how much memory does increasing max rules for IPFW take
up? (Ian Smith)
22. Re: how much memory does increasing max rules for IPFW take
up? (Andrey V. Elsukov)
23. Re: today's build is causing errors for me (Jeremy Chadwick)
24. Re: today's build is causing errors for me (Rob Lytle)
25. Re: today's build is causing errors for me (Jeremy Chadwick)
26. Re: today's build is causing errors for me / Fixed for now
(Rob Lytle)
27. Re: today's build is causing errors for me (Rob Lytle)
28. just one last question about /etc/rc.d file permissions
(Rob Lytle)
29. ciss(4) not coping with large arrays? (Emil Mikulic)
30. Re: ciss(4) not coping with large arrays? (Claus Guttesen)


----------------------------------------------------------------------

Message: 1
Date: Thu, 15 May 2008 20:57:23 +0800
From: David Xu <davidxu@freebsd.org>
Subject: Re: thread scheduling at mutex unlock
To: Andriy Gapon <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, Brent Casavant
<b.j.casavant@ieee.org>, freebsd-threads@freebsd.org
Message-ID: <482C3333.1070205@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Andriy Gapon wrote:
>
> Maybe. But that's not what I see with my small example program. One
> thread releases and re-acquires a mutex 10 times in a row while the
> other doesn't get it a single time.
> I think that there is a very slim chance of a blocked thread
> preempting a running thread in this circumstances. Especially if
> execution time between unlock and re-lock is very small.
It does not depends on how many times your thread acquires or
re-acquires mutex, or
how small the region the mutex is protecting. as long as current thread
runs too long,
other threads will have higher priorities and the ownership definitely
will be transfered,
though there will be some extra context switchings.

> I'd rather prefer to have an option to have FIFO fairness in mutex
> lock rather than always avoiding context switch at all costs and
> depending on scheduler to eventually do priority magic.
>
It is better to implement this behavior in your application code, if it
is implemented in thread library, you still can not control how many
times acquiring and re-acquiring can be allowed for a thread without
context switching, a simple FIFO as you said here will cause dreadful
performance problem.


------------------------------

Message: 2
Date: Thu, 15 May 2008 09:24:56 -0400 (EDT)
From: Daniel Eischen <deischen@freebsd.org>
Subject: Re: thread scheduling at mutex unlock
To: Andriy Gapon <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, David Xu <davidxu@freebsd.org>, Brent
Casavant <b.j.casavant@ieee.org>, freebsd-threads@freebsd.org
Message-ID: <Pine.GSO.4.64.0805150916400.28524@sea.ntplx.net>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Thu, 15 May 2008, Andriy Gapon wrote:

> Or even more realistic: there should be a feeder thread that puts things on
> the queue, it would never be able to enqueue new items until the queue
> becomes empty if worker thread's code looks like the following:
>
> while(1)
> {
> pthread_mutex_lock(&work_mutex);
> while(queue.is_empty())
> pthread_cond_wait(...);
> //dequeue item
> ...
> pthread_mutex_unlock(&work mutex);
> //perform some short and non-blocking processing of the item
> ...
> }
>
> Because the worker thread (while the queue is not empty) would never enter
> cond_wait and would always re-lock the mutex shortly after unlocking it.

Well in theory, the kernel scheduler will let both threads run fairly
with regards to their cpu usage, so this should even out the enqueueing
and dequeueing threads.

You could also optimize the above a little bit by dequeueing everything
in the queue instead of one at a time.

> So while improving performance on small scale this mutex re-acquire-ing
> unfairness may be hurting interactivity and thread concurrency and thus
> performance in general. E.g. in the above example queue would always be
> effectively of depth 1.
> Something about "lock starvation" comes to mind.
>
> So, yes, this is not about standards, this is about reasonable expectations
> about thread concurrency behavior in a particular implementation (libthr).
> I see now that performance advantage of libthr over libkse came with a price.
> I think that something like queued locks is needed. They would clearly reduce
> raw throughput performance, so maybe that should be a new (non-portable?)
> mutex attribute.

--
DE


------------------------------

Message: 3
Date: Thu, 15 May 2008 08:31:06 -0600
From: "Jason Porter" <lightguard.jp@gmail.com>
Subject: Upgrade path from 5.5?
To: freebsd-stable@freebsd.org
Message-ID:
<699477f60805150731p1f396bbaj27fc9ed49e69a916@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

I have an old sandbox (obviously if it's 5.5) that I was thinking of
upgrading. Is there an upgrade path I should think about taking, or would
it be best to backup my /home directory and install from scratch? Note, I'm
currently not subscribed to the list.

--
--Jason Porter
Real Programmers think better when playing Adventure or Rogue.

PGP key id: 926CCFF5
PGP fingerprint: 64C2 C078 13A9 5B23 7738 F7E5 1046 C39B 926C CFF5
PGP key available at: keyserver.net, pgp.mit.edu


------------------------------

Message: 4
Date: Thu, 15 May 2008 16:10:45 +0100
From: Gavin Atkinson <gavin@FreeBSD.org>
Subject: Re: bin/40278: mktime returns -1 for certain dates/timezones
when it should normalize
To: Marc Olzheim <marcolz@stack.nl>
Cc: freebsd-stable@FreeBSD.org
Message-ID: <1210864245.29891.93.camel@buffy.york.ac.uk>
Content-Type: text/plain

On Thu, 2008-05-15 at 10:51 +0200, Marc Olzheim wrote:
> With the testcode I put on
> http://www.stack.nl/~marcolz/FreeBSD/pr-bin-40278/40278.c I can
> reproduce it on FreeBSD 4.11:

[snip]

> But it is fixed on my FreeBSD 6.x and up systems:

[snip]

Many thanks for going to the effort of testing this. I've closed the
PR.

Gavin


------------------------------

Message: 5
Date: Thu, 15 May 2008 12:09:39 -0400
From: Vivek Khera <vivek@khera.org>
Subject: Re: how much memory does increasing max rules for IPFW take
up?
To: FreeBSD Stable <freebsd-stable@freebsd.org>
Cc: freebsd-ipfw@freebsd.org
Message-ID: <6ADAB997-FAA4-43B8-AB57-3CC4A04F3700@khera.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes


On May 15, 2008, at 6:03 AM, Bruce M. Simpson wrote:

> Having said that the default tunable of 256 state entries is
> probably quite low for use cases other than "home/small office NAT
> gateway".

The deafult on my systems seems to be 4096. My steady state on a
pretty popular web server is about 400, on a busy inbound mail server,
around 800 states. I need to account for peaks much higher, though.
Luckily most of my connections are short-lived.

Thanks for the answers!

------------------------------

Message: 6
Date: Thu, 15 May 2008 09:20:56 -0700
From: Jeremy Chadwick <koitsu@FreeBSD.org>
Subject: Re: how much memory does increasing max rules for IPFW take
up?
To: "Bruce M. Simpson" <bms@FreeBSD.org>
Cc: Vivek Khera <vivek@khera.org>, "Andrey V. Elsukov"
<bu7cher@yandex.ru>, FreeBSD Stable <freebsd-stable@freebsd.org>,
freebsd-ipfw@freebsd.org
Message-ID: <20080515162056.GA17187@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Thu, May 15, 2008 at 11:03:53AM +0100, Bruce M. Simpson wrote:
> Andrey V. Elsukov wrote:
>> Vivek Khera wrote:
>>> I had a box run out of dynamic state space yesterday. I found I can
>>> increase the number of dynamic rules by increasing the sysctl parameter
>>> net.inet.ip.fw.dyn_max. I can't find, however, how this affects memory
>>> usage on the system. Is it dyanamically allocated and de-allocated, or
>>> is it a static memory buffer?
>>
>> Each dynamic rule allocated dynamically. Be careful, too many dynamic
>> rules will work very slow.
>
> Got any figures for this? I took a quick glance and it looks like it just
> uses a hash over dst/src/dport/sport. If there are a lot of raw IP or ICMP
> flows then that's going to result in hash collisions.
>
> It might be a good project for someone to optimize if it isn't scaling for
> folk. "Bloomier" filters are probably worth a look -- bloom filters are a
> class of probabilistic hash which may return a false positive, "bloomier"
> filters are a refinement which tries to limit the false positives.
>
> Having said that the default tunable of 256 state entries is probably quite
> low for use cases other than "home/small office NAT gateway".

It's far too low for home/small office. Standard Linux NAT routers,
such as the Linksys WRT54G/GL, come with a default state table count of
2048, and often is increased by third-party firmwares to 8192 based on
justified necessity. Search for "conntrack" below:

http://www.polarcloud.com/firmware

256 can easily be exhausted by more than one user loading multiple HTTP
1.0 web pages at one time (such is the case with many users now have
browsers that load 7-8 web pages into separate tabs during startup).

And if that's not enough reason, consider torrents, which is quite often
what results in a home or office router exhausting its state table.

Bottom line: the 256 default is too low. It needs to be increased to at
least 2048.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking

http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 7
Date: Thu, 15 May 2008 22:14:36 +0600
From: "Sergey N. Voronkov" <serg@tmn.ru>
Subject: Re: Upgrade path from 5.5?
To: Jason Porter <lightguard.jp@gmail.com>
Cc: freebsd-stable@freebsd.org
Message-ID: <20080515161436.GA52741@tmn.ru>
Content-Type: text/plain; charset=us-ascii

On Thu, May 15, 2008 at 08:31:06AM -0600, Jason Porter wrote:
> I have an old sandbox (obviously if it's 5.5) that I was thinking of
> upgrading. Is there an upgrade path I should think about taking, or would
> it be best to backup my /home directory and install from scratch? Note, I'm
> currently not subscribed to the list.

Source upgrade 5.5 -> 6.2 -> 6.3 works fine for me.

Serg N. Voronkov,
Sibitex JSC.


------------------------------

Message: 8
Date: Thu, 15 May 2008 18:39:36 +0100
From: Pete French <petefrench@ticketswitch.com>
Subject: cron hanging on to child processes
To: stable@freebsd.org
Message-ID: <E1JwhQW-000L1H-IE@dilbert.ticketswitch.com>

I have a process which is run daily from cron that stops mysql, does
some stuff, and starts it again. The scriput outputs a number of lines
which are emailed to me in the output of the cron job.

What I have noticed is that my emials actually lag by a day - it turns out
that the cron job appears to not send the email until mysql is sut down the
following day. I can only assume that when mysql is restarted, cron sees it
as a child process, and thus does not terminate until that process does. Which
happens when a new cron job shuts it down again 24 hours later.

Any suggestions on fixing this ? I wouldn't have thought that stopping
and starting a daemon was a particularly unusual thing to want to
do from a cron job.

-pete.


------------------------------

Message: 9
Date: Thu, 15 May 2008 13:54:09 -0400 (EDT)
From: Daniel Eischen <deischen@freebsd.org>
Subject: Re: thread scheduling at mutex unlock
To: Andriy Gapon <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, David Xu <davidxu@freebsd.org>, Brent
Casavant <b.j.casavant@ieee.org>, freebsd-threads@freebsd.org
Message-ID: <Pine.GSO.4.64.0805151329150.29431@sea.ntplx.net>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Thu, 15 May 2008, Daniel Eischen wrote:

> On Thu, 15 May 2008, Andriy Gapon wrote:
>
>> Or even more realistic: there should be a feeder thread that puts things on
>> the queue, it would never be able to enqueue new items until the queue
>> becomes empty if worker thread's code looks like the following:
>>
>> while(1)
>> {
>> pthread_mutex_lock(&work_mutex);
>> while(queue.is_empty())
>> pthread_cond_wait(...);
>> //dequeue item
>> ...
>> pthread_mutex_unlock(&work mutex);
>> //perform some short and non-blocking processing of the item
>> ...
>> }
>>
>> Because the worker thread (while the queue is not empty) would never enter
>> cond_wait and would always re-lock the mutex shortly after unlocking it.
>
> Well in theory, the kernel scheduler will let both threads run fairly
> with regards to their cpu usage, so this should even out the enqueueing
> and dequeueing threads.
>
> You could also optimize the above a little bit by dequeueing everything
> in the queue instead of one at a time.

I suppose you could also enforce your own scheduling with
something like the following:

pthread_cond_t writer_cv;
pthread_cond_t reader_cv;
pthread_mutex_t q_mutex;
...
thingy_q_t thingy_q;
int writers_waiting = 0;
int readers_waiting = 0;
...

void
enqueue(thingy_t *thingy)
{
pthread_mutex_lock(q_mutex);
/* Insert into thingy q */
...
if (readers_waiting > 0) {
pthread_cond_broadcast(&reader_cv, &q_mutex);
readers_waiting = 0;
}
while (thingy_q.size > ENQUEUE_THRESHOLD_HIGH) {
writers_waiting++;
pthread_cond_wait(&writer_cv, &q_mutex);
}
pthread_mutex_unlock(&q_mutex);
}

thingy_t *
dequeue(void)
{
thingy_t *thingy;

pthread_mutex_lock(&q_mutex);
while (thingy_q.size == 0) {
readers_waiting++;
pthread_cond_wait(&reader_cv, &q_mutex);
}
/* Dequeue thingy */
...

if ((writers_waiting > 0)
&& thingy_q.size < ENQUEUE_THRESHOLD_LOW)) {
/* Wakeup the writers. */
pthread_cond_broadcast(&writer_cv, &q_mutex);
writers_waiting = 0;
}
pthread_mutex_unlock(&q_mutex);
return (thingy);
}

The above is completely untested and probably contains some
bugs ;-)

You probably shouldn't need anything like that if the kernel
scheduler is scheduling your threads fairly.

--
DE


------------------------------

Message: 10
Date: Thu, 15 May 2008 12:29:13 -0700
From: "David Schwartz" <davids@webmaster.com>
Subject: RE: thread scheduling at mutex unlock
To: <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, freebsd-threads@freebsd.org
Message-ID: <MDEHLPKNGKAHNMBLJOLKIEKLMKAC.davids@webmaster.com>
Content-Type: text/plain; charset="UTF-8"


> what if you have an infinite number of items on one side and finite
> number on the other, and you want to process them all (in infinite time,
> of course). Would you still try to finish everything on one side (the
> infinite one) or would you try to look at what you have on the other side?
>
> I am sorry about fuzzy wording of my original report, I should have
> mentioned "starvation" somewhere in it.

There is no such thing as a "fair share" when comparing an infinite quantity to a finite quantity. It is just as sensible to do 1 then 1 as 10 then 10 or a billion then 1.

What I would do in this case is work on one side for one timeslice then the other side for one timeslice, continuuing until either side was finished, then I'd work exclusively on the other side. This is precisely the purpose for having timeslices in a scheduler.

The timeslice is carefully chosen so that it's not so long that you ignore a side for too long. It's also carefully chosen so that it's not so short that you spend all your time switching swides.

What sane schedulers do is assume that you want to make as much forward progress as quickly as possible. This means getting as many work units done per unit time as possible. This means as few context switches as possible.

A scheduler that switches significantly more often than once per timeslice with a load like this is *broken*. The purpose of the timeslice is to place an upper bound on the number of context switches in cases where forward progress can be made on more than one process. An ideal scheduler would not switch more often than once per timeslice unless it could not make further forward progress.

Real-world schedulers actually may allow one side to pre-empt the other, and may switch a bit more often than a scheduler that's "ideal" in the sense described above. This is done in an attempt to boost interactive performance.

But your basic assumption that strict alternation is desirable is massively wrong. That's the *worst* *possible* outcome.

DS


------------------------------

Message: 11
Date: Thu, 15 May 2008 14:51:08 -0500 (CDT)
From: Brent Casavant <b.j.casavant@ieee.org>
Subject: Re: thread scheduling at mutex unlock
To: Andriy Gapon <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, David Xu <davidxu@freebsd.org>,
freebsd-threads@freebsd.org
Message-ID:
<alpine.BSF.1.10.0805151345110.62691@pkunk.americas.sgi.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Thu, 15 May 2008, Andriy Gapon wrote:

> With current libthr behavior the GUI thread would never have a chance to get
> the mutex as worker thread would always be a winner (as my small program
> shows).

The example you gave indicates an incorrect mechanism being used for the
GUI to communicate with this worker thread. For the behavior you desire,
you need a common condition that lets both the GUI and the work item
generator indicate that there is something for the worker to do, *and*
you need seperate mechanisms for the GUI and work item generator to add
to their respective queues.

Something like this (could be made even better with a little effor):

struct worker_queues_s {
pthread_mutex_t work_mutex;
struct work_queue_s work_queue;

pthread_mutex_t gui_mutex;
struct gui_queue_s gui_queue;

pthread_mutex_t stuff_mutex;
int stuff_todo;
pthread_cond_t stuff_cond;
};
struct worker_queue_s wq;

int
main(int argc, char *argv[]) {
// blah blah
init_worker_queue(&wq);
// blah blah
}

void
gui_callback(...) {
// blah blah

// Set up GUI message

pthread_mutex_lock(&wq.gui_mutex);
// Add GUI message to queue
pthread_mutex_unlock(&wq.gui_mutex);

pthread_mutex_lock(&wq.stuff_mutex);
wq.stuff_todo++;
pthread_cond_signal(&wq.stuff_cond);
pthread_mutex_unlock(&wq.stuff_mutex);

// blah blah
}

void*
work_generator_thread(void*) {
// blah blah

while (1) {
// Set up work to do

pthread_mutex_lock(&wq.work_mutex);
// Add work item to queue
pthread_mutex_unlock(&wq.work_mutex);

pthread_mutex_lock(&wq.stuff_mutex);
wq.stuff_todo++;
pthread_cond_signal(&wq.stuff_cond);
pthread_mutex_unlock(&wq.stuff_mutex);
}

// blah blah
}

void*
worker_thread(void* arg) {
// blah blah

while (1) {
// Wait for there to be something to do
pthread_mutex_lock(&wq.stuff_mutex);
while (wq.stuff_todo < 1) {
pthread_cond_wait(&wq.stuff_cond,
&wq.stuff_mutex);
}
pthread_mutex_unlock(&wq.stuff_mutex);

// Handle GUI messages
pthread_mutex_lock(&wq.gui_mutex);
while (!gui_queue_empty(&wq.gui_queue) {
// dequeue and process GUI messages
pthread_mutex_lock(&wq.stuff_mutex);
wq.stuff_todo--;
pthread_mutex_unlock(&wq.stuff_mutex);
}
pthread_mutex_unlock(&wq.gui_mutex);

// Handle work items
pthread_mutex_lock(&wq.work_mutex);
while (!work_queue_empty(&wq.work_queue)) {
// dequeue and process work item
pthread_mutex_lock(&wq.stuff_mutex);
wq.stuff_todo--;
pthread_mutex_unlock(&wq.stuff_mutex);
}
pthread_mutex_unlock(&wq.work_mutex);
}

// blah blah
}

This should accomplish what you desire. Caution that I haven't
compiled, run, or tested it, but I'm pretty sure it's a solid
solution.

The key here is unifying the two input sources (the GUI and work queues)
without blocking on either one of them individually. The value of
(wq.stuff_todo < 1) becomes a proxy for the value of
(gui_queue_empty(...) && work_queue_empty(...)).

I hope that helps,
Brent

--
Brent Casavant Dance like everybody should be watching.
www.angeltread.org
KD5EMB, EN34lv


------------------------------

Message: 12
Date: Thu, 15 May 2008 13:25:59 -0700
From: "David Schwartz" <davids@webmaster.com>
Subject: RE: thread scheduling at mutex unlock
To: "David Xu" <davidxu@freebsd.org>, "Brent Casavant"
<b.j.casavant@ieee.org>
Cc: freebsd-stable@freebsd.org, freebsd-threads@freebsd.org
Message-ID: <MDEHLPKNGKAHNMBLJOLKAEKPMKAC.davids@webmaster.com>
Content-Type: text/plain; charset="iso-8859-1"


> Brent, David,
>
> thank you for the responses.
> I think I incorrectly formulated my original concern.
> It is not about behavior at mutex unlock but about behavior at mutex
> re-lock. You are right that waking waiters at unlock would hurt
> performance. But I think that it is not "fair" that at re-lock former
> owner gets the lock immediately and the thread that waited on it for
> longer time doesn't get a chance.

You are correct, but fairness is not the goal, performance is. If you want
fairness, you are welcome to code it. But threads don't file union
grievances, and it would be absolute foolishness for a scheduler to
sacrifice performance to make threads happier.

The scheduler decides which thread runs, you decide what the running thread
does. You are expected to use your control over that latter part to exercise
whatever your application notion of "fairness" is.

Your test program is a classic example of a case where the use of a mutex is
inappropriate.

> Here's a more realistic example than the mock up code.
> Say you have a worker thread that processes queued requests and the load
> is such that there is always something on the queue. Thus the worker
> thread doesn't ever have to block waiting on it.
> And let's say that there is a GUI thread that wants to convey some
> information to the worker thread. And for that it needs to acquire some
> mutex and "do something".
> With current libthr behavior the GUI thread would never have a chance to
> get the mutex as worker thread would always be a winner (as my small
> program shows).

Nonsense. The worker thread would be doing work most of the time and
wouldn't be holding the mutex.

> Or even more realistic: there should be a feeder thread that puts things
> on the queue, it would never be able to enqueue new items until the
> queue becomes empty if worker thread's code looks like the following:
>
> while(1)
> {
> pthread_mutex_lock(&work_mutex);
> while(queue.is_empty())
> pthread_cond_wait(...);
> //dequeue item
> ...
> pthread_mutex_unlock(&work mutex);
> //perform some short and non-blocking processing of the item
> ...
> }
>
> Because the worker thread (while the queue is not empty) would never
> enter cond_wait and would always re-lock the mutex shortly after
> unlocking it.

So what? The feeder thread could get the mutex after the mutex is unlocked
before the worker thread goes to do work. The only reason your test code
encountered a "problem" was because you yielded the CPU while you held the
mutex and never used up a timeslice.

> So while improving performance on small scale this mutex re-acquire-ing
> unfairness may be hurting interactivity and thread concurrency and thus
> performance in general. E.g. in the above example queue would always be
> effectively of depth 1.
> Something about "lock starvation" comes to mind.

Nope. You have to create a situation where the mutex is held much more often
than not held to get this behavior. That's a pathological case where the use
of a mutex is known to be inappropriate.

> So, yes, this is not about standards, this is about reasonable
> expectations about thread concurrency behavior in a particular
> implementation (libthr).
> I see now that performance advantage of libthr over libkse came with a
> price. I think that something like queued locks is needed. They would
> clearly reduce raw throughput performance, so maybe that should be a new
> (non-portable?) mutex attribute.

If you want queued locks, feel free to code them and use them. But you have
to work very hard to create a case where they are useful. If you find you're
holding the mutex more often than not, you're doing something *very* wrong.

DS


------------------------------

Message: 13
Date: Thu, 15 May 2008 23:48:17 +0300
From: Andriy Gapon <avg@icyb.net.ua>
Subject: Re: thread scheduling at mutex unlock
To: David Xu <davidxu@freebsd.org>
Cc: freebsd-stable@freebsd.org, Brent Casavant
<b.j.casavant@ieee.org>, freebsd-threads@freebsd.org
Message-ID: <482CA191.1030004@icyb.net.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

on 15/05/2008 15:57 David Xu said the following:
> Andriy Gapon wrote:
>>
>> Maybe. But that's not what I see with my small example program. One
>> thread releases and re-acquires a mutex 10 times in a row while the
>> other doesn't get it a single time.
>> I think that there is a very slim chance of a blocked thread
>> preempting a running thread in this circumstances. Especially if
>> execution time between unlock and re-lock is very small.
> It does not depends on how many times your thread acquires or
> re-acquires mutex, or
> how small the region the mutex is protecting. as long as current thread
> runs too long,
> other threads will have higher priorities and the ownership definitely
> will be transfered,
> though there will be some extra context switchings.

David,

did you examine or try the small program that I sent before?
The "lucky" thread slept for 1 second each time it held mutex.
So in total it spent about 8 seconds sleeping and holding the mutex. And
the "unlucky" thread, consequently, spent 8 seconds blocked waiting for
that mutex. And it didn't get "lucky".
Yes, technically the "lucky" thread was not running while holding the
mutex, so probably this is why scheduling algorithm didn't immediately work.

I did more testing and see that the "unlucky" thread eventually gets a
chance (eventually means after very many lock/unlock cycles), but I
think that it is penalized too much still.
I wonder if with current code it is possible and easy to make this
behavior more deterministic.
Maybe something like the following:
if (oldest_waiter.wait_time < X)
do what we do now...
else
go into kernel for possible switch

I have very little idea about unit and value of X.

>> I'd rather prefer to have an option to have FIFO fairness in mutex
>> lock rather than always avoiding context switch at all costs and
>> depending on scheduler to eventually do priority magic.
>>
> It is better to implement this behavior in your application code, if it
> is implemented in thread library, you still can not control how many
> times acquiring and re-acquiring can be allowed for a thread without
> context switching, a simple FIFO as you said here will cause dreadful
> performance problem.

I almost agree. But I still wouldn't take your last statement for a
fact. "Dreadful performance" - on micro-scale maybe, not necessarily on
macro scale.
After all, never switching context would be the best performance for a
single CPU-bound task, but you wouldn't think that this is the best
performance for the whole system.

As a data point: it seems that current Linux threading library is not
significantly worse than libthr, but my small test program on Fedora 7
works to my expectations.

--
Andriy Gapon


------------------------------

Message: 14
Date: Fri, 16 May 2008 06:51:01 +1000
From: Andrew Snow <andrew@modulus.org>
Subject: Re: thread scheduling at mutex unlock
To: freebsd-stable@freebsd.org
Message-ID: <482CA235.6090400@modulus.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

> But I think that it is not "fair" that at re-lock former
> owner gets the lock immediately and the thread that waited on it for
> longer time doesn't get a chance.

I believe this is what yield() is for. Before attempting a re-lock you
should call yield() to allow other threads a chance to run.

(Side note: On FreeBSD, I believe only high priority threads will run
when you yield(). As a workaround, I think you have to lower the
thread's priority before yield() and then raise it again afterwards.)


- Andrew


------------------------------

Message: 15
Date: Thu, 15 May 2008 23:56:18 +0300
From: Andriy Gapon <avg@icyb.net.ua>
Subject: Re: thread scheduling at mutex unlock
To: davids@webmaster.com
Cc: freebsd-stable@freebsd.org, freebsd-threads@freebsd.org
Message-ID: <482CA372.3000400@icyb.net.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed

on 15/05/2008 22:29 David Schwartz said the following:
>> what if you have an infinite number of items on one side and finite
>> number on the other, and you want to process them all (in infinite
>> time, of course). Would you still try to finish everything on one
>> side (the infinite one) or would you try to look at what you have
>> on the other side?
>>
>> I am sorry about fuzzy wording of my original report, I should have
>> mentioned "starvation" somewhere in it.
>
> There is no such thing as a "fair share" when comparing an infinite
> quantity to a finite quantity. It is just as sensible to do 1 then 1
> as 10 then 10 or a billion then 1.
>
> What I would do in this case is work on one side for one timeslice
> then the other side for one timeslice, continuuing until either side
> was finished, then I'd work exclusively on the other side. This is
> precisely the purpose for having timeslices in a scheduler.
>
> The timeslice is carefully chosen so that it's not so long that you
> ignore a side for too long. It's also carefully chosen so that it's
> not so short that you spend all your time switching swides.
>
> What sane schedulers do is assume that you want to make as much
> forward progress as quickly as possible. This means getting as many
> work units done per unit time as possible. This means as few context
> switches as possible.
>
> A scheduler that switches significantly more often than once per
> timeslice with a load like this is *broken*. The purpose of the
> timeslice is to place an upper bound on the number of context
> switches in cases where forward progress can be made on more than one
> process. An ideal scheduler would not switch more often than once per
> timeslice unless it could not make further forward progress.
>
> Real-world schedulers actually may allow one side to pre-empt the
> other, and may switch a bit more often than a scheduler that's
> "ideal" in the sense described above. This is done in an attempt to
> boost interactive performance.
>
> But your basic assumption that strict alternation is desirable is
> massively wrong. That's the *worst* *possible* outcome.

David,

thank you for the tutorial, it is quite enlightening.
But first of all, did you take a look at my small test program?
There are 1 second sleeps in it, this is not about timeslices and
scheduling at that level at all. This is about basic expectation about
fairness of acquiring a lock at macro level. I know that when one thread
acquires and releases and reacquires a mutex during 10 seconds while the
other thread is blocked on that mutex for 10 seconds, then this is not
about timeslices.

--
Andriy Gapon


------------------------------

Message: 16
Date: Fri, 16 May 2008 00:02:43 +0300
From: Andriy Gapon <avg@icyb.net.ua>
Subject: Re: thread scheduling at mutex unlock
To: Brent Casavant <b.j.casavant@ieee.org>
Cc: freebsd-stable@freebsd.org, David Xu <davidxu@freebsd.org>,
freebsd-threads@freebsd.org
Message-ID: <482CA4F3.6090501@icyb.net.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

on 15/05/2008 22:51 Brent Casavant said the following:
> On Thu, 15 May 2008, Andriy Gapon wrote:
>
>> With current libthr behavior the GUI thread would never have a chance to get
>> the mutex as worker thread would always be a winner (as my small program
>> shows).
>
> The example you gave indicates an incorrect mechanism being used for the
> GUI to communicate with this worker thread. For the behavior you desire,
> you need a common condition that lets both the GUI and the work item
> generator indicate that there is something for the worker to do, *and*
> you need seperate mechanisms for the GUI and work item generator to add
> to their respective queues.


Brent,

that was just an example. Probably a quite bad example.
I should only limit myself to the program that I sent and I should
repeat that the result that it produces is not what I would call
reasonably expected. And I will repeat that I understand that the
behavior is not prohibited by standards (well, never letting other
threads to run is probably not prohibited either).


> Something like this (could be made even better with a little effor):
>
> struct worker_queues_s {
> pthread_mutex_t work_mutex;
> struct work_queue_s work_queue;
>
> pthread_mutex_t gui_mutex;
> struct gui_queue_s gui_queue;
>
> pthread_mutex_t stuff_mutex;
> int stuff_todo;
> pthread_cond_t stuff_cond;
> };
> struct worker_queue_s wq;
>
> int
> main(int argc, char *argv[]) {
> // blah blah
> init_worker_queue(&wq);
> // blah blah
> }
>
> void
> gui_callback(...) {
> // blah blah
>
> // Set up GUI message
>
> pthread_mutex_lock(&wq.gui_mutex);
> // Add GUI message to queue
> pthread_mutex_unlock(&wq.gui_mutex);
>
> pthread_mutex_lock(&wq.stuff_mutex);
> wq.stuff_todo++;
> pthread_cond_signal(&wq.stuff_cond);
> pthread_mutex_unlock(&wq.stuff_mutex);
>
> // blah blah
> }
>
> void*
> work_generator_thread(void*) {
> // blah blah
>
> while (1) {
> // Set up work to do
>
> pthread_mutex_lock(&wq.work_mutex);
> // Add work item to queue
> pthread_mutex_unlock(&wq.work_mutex);
>
> pthread_mutex_lock(&wq.stuff_mutex);
> wq.stuff_todo++;
> pthread_cond_signal(&wq.stuff_cond);
> pthread_mutex_unlock(&wq.stuff_mutex);
> }
>
> // blah blah
> }
>
> void*
> worker_thread(void* arg) {
> // blah blah
>
> while (1) {
> // Wait for there to be something to do
> pthread_mutex_lock(&wq.stuff_mutex);
> while (wq.stuff_todo < 1) {
> pthread_cond_wait(&wq.stuff_cond,
> &wq.stuff_mutex);
> }
> pthread_mutex_unlock(&wq.stuff_mutex);
>
> // Handle GUI messages
> pthread_mutex_lock(&wq.gui_mutex);
> while (!gui_queue_empty(&wq.gui_queue) {
> // dequeue and process GUI messages
> pthread_mutex_lock(&wq.stuff_mutex);
> wq.stuff_todo--;
> pthread_mutex_unlock(&wq.stuff_mutex);
> }
> pthread_mutex_unlock(&wq.gui_mutex);
>
> // Handle work items
> pthread_mutex_lock(&wq.work_mutex);
> while (!work_queue_empty(&wq.work_queue)) {
> // dequeue and process work item
> pthread_mutex_lock(&wq.stuff_mutex);
> wq.stuff_todo--;
> pthread_mutex_unlock(&wq.stuff_mutex);
> }
> pthread_mutex_unlock(&wq.work_mutex);
> }
>
> // blah blah
> }
>
> This should accomplish what you desire. Caution that I haven't
> compiled, run, or tested it, but I'm pretty sure it's a solid
> solution.
>
> The key here is unifying the two input sources (the GUI and work queues)
> without blocking on either one of them individually. The value of
> (wq.stuff_todo < 1) becomes a proxy for the value of
> (gui_queue_empty(...) && work_queue_empty(...)).
>
> I hope that helps,
> Brent
>


--
Andriy Gapon


------------------------------

Message: 17
Date: Thu, 15 May 2008 17:23:45 -0500 (CDT)
From: Brent Casavant <b.j.casavant@ieee.org>
Subject: Re: thread scheduling at mutex unlock
To: Andriy Gapon <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, freebsd-threads@freebsd.org
Message-ID:
<alpine.BSF.1.10.0805151605230.62691@pkunk.americas.sgi.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Fri, 16 May 2008, Andriy Gapon wrote:

> that was just an example. Probably a quite bad example.
> I should only limit myself to the program that I sent and I should repeat that
> the result that it produces is not what I would call reasonably expected. And
> I will repeat that I understand that the behavior is not prohibited by
> standards (well, never letting other threads to run is probably not prohibited
> either).

Well, I don't know what to tell you at this point. I believe I
understand the nature of the problem you're encountering, and I
believe there are perfectly workable mechanisms in Pthreads to
allow you to accomplish what you desire without depending on
implementation-specific details. Yes, it's more work on your
part, but if done well it's one-time work.

The behavior you desire is useful only in limited situations,
and can be implemented at the application level through the
use of Pthreads primitives. If Pthreads behaved as you apparently
expect, it would be impossible to implement the current behavior
at the application level.

Queueing mutexes are innappropriate in the majority of code designs.
I'll take your word that it is appropriate in your particular case,
but that does not make it appropriate for more typical designs.

Several solutions have been presented, including one from me. If
you choose not to implement such solutions, then best of luck to you.

OK, I'm a sucker for punishment. So use this instead of Pthreads
mutexes. This should work on both FreeBSD and Linux (FreeBSD has
some convenience routines in the sys/queue.h package that Linux doesn't):

#include <sys/queue.h>
#include <pthread.h>

struct thread_queue_entry_s {
TAILQ_ENTRY(thread_queue_entry_s) tqe_list;
pthread_cond_t tqe_cond;
pthread_mutex_t tqe_mutex;
int tqe_wakeup;
};
TAILQ_HEAD(thread_queue_s, thread_queue_entry_s);

typedef struct {
struct thread_queue_s qm_queue;
pthread_mutex_t qm_queue_lock;
unsigned int qm_users;
} queued_mutex_t;

int
queued_mutex_init(queued_mutex_t *qm) {
TAILQ_INIT(&qm->qm_queue);
qm->qm_users = 0;
return pthread_mutex_init(&qm->qm_queue_lock, NULL);
}

int
queued_mutex_lock(queued_mutex_t *qm) {
struct thread_queue_entry_s waiter;

pthread_mutex_lock(&qm->qm_queue_lock);
qm->qm_users++;
if (1 == qm->qm_users) {
/* Nobody was waiting for mutex, we own it.
* Fast path out.
*/
pthread_mutex_unlock(&qm->qm_queue_lock);
return 0;
}

/* There are others waiting for the mutex. Slow path. */

/* Initialize this thread's wait structure */
pthread_cond_init(&waiter->tqe_cond, NULL);
pthread_mutex_init(&waiter->tqe_mutex, NULL);
pthread_mutex_lock(&waiter->tqe_mutex);
waiter->tqe_wakeup = 0;

/* Add this thread's wait structure to queue */
TAILQ_INSERT_TAIL(&qm->qm_queue, &waiter, tqe_list);
pthread_mutex_unlock(&qm->qm_queue_lock);

/* Wait for somebody to hand the mutex to us */
while (!waiter->tqe_wakeup) {
pthread_cond_wait(&waiter->tqe_cond,
&waiter->tqe_mutex);
}

/* Destroy this thread's wait structure */
pthread_mutex_unlock(&waiter->tqe_mutex);
pthread_mutex_destroy(&waiter->tqe_mutex);
pthread_cond_destroy(&waiter->tqe_cond);

/* We own the queued mutex (handed to us by unlock) */
return 0;
}

int
queued_mutex_unlock(queued_mutex_t *qm) {
struct thread_queue_entry_s *waiter;

pthread_mutex_lock(&qm->qm_queue_lock);
qm->qm_users--;
if (0 == qm->qm_users) {
/* No waiters to wake up. Fast path out. */
pthread_mutex_unlock(&qm->qm_queue_lock);
return 0;
}

/* Wake up first waiter. Slow path. */

/* Remove the first waiting thread. */
waiter = qm->qm_queue.tqh_first;
TAILQ_REMOVE(&qm->qm_queue, waiter, tqe_list);
pthread_mutex_unlock(&qm->qm_queue_lock);

/* Wake up the thread. */
pthread_mutex_lock(&waiter->tqe_mutex);
waiter->tqe_wakeup = 1;
pthread_cond_signal(&waiter->tqe_cond);
pthread_mutex_unlock(&waiter->tqe_mutex);

return 0;
}

int
queued_mutex_destroy(queued_mutex_t *qm) {
pthread_mutex_lock(&qm->qm_queue_lock);
if (qm->qm_users > 1) {
pthread_mutex_unlock(&qm->qm_queue_lock);
return EBUSY;
}
return pthread_mutex_destroy(&qm->qm_queue_lock);
}

These queued_mutex_t mutexes should have the behavior you're looking
for, and will be portable to any platform with Pthreads and sys/queue.h.
Be warned that I haven't compiled, run, or debugged this, but the
code should be pretty solid (typos aside). Of course, in production
code I'd check a bunch of return values, but those would just get in
the way of this illustration.

So use something this or change the application's threading model
(like my previous post showed). There's no use complaining about
the Pthreads implementation in this regard because your application's
use of mutexes is the exception, not the rule. The fact that Linux
behaves as you expect is irrelevant, as POSIX doesn't speak to this
facet of implementation, so both Linux and BSD are correct. Relying
on this behavior in Linux is ill-advised as it is non-portable, and
likely to break in future releases.

Brent

--
Brent Casavant Dance like everybody should be watching.
www.angeltread.org
KD5EMB, EN34lv


------------------------------

Message: 18
Date: Thu, 15 May 2008 15:37:00 -0700
From: "David Schwartz" <davids@webmaster.com>
Subject: RE: thread scheduling at mutex unlock
To: <avg@icyb.net.ua>
Cc: freebsd-stable@freebsd.org, freebsd-threads@freebsd.org
Message-ID: <MDEHLPKNGKAHNMBLJOLKIEMOMKAC.davids@webmaster.com>
Content-Type: text/plain; charset="UTF-8"


> David,

> thank you for the tutorial, it is quite enlightening.
> But first of all, did you take a look at my small test program?

Yes. It demonstrates the classic example of mutex abuse. A mutex is not an appropriate synchronization mechanism when it's going to be held most of the time and released briefly.

> There are 1 second sleeps in it, this is not about timeslices and
> scheduling at that level at all. This is about basic expectation about
> fairness of acquiring a lock at macro level. I know that when one thread
> acquires and releases and reacquires a mutex during 10 seconds while the
> other thread is blocked on that mutex for 10 seconds, then this is not
> about timeslices.

I guess it comes down to what your test program is supposed to test. Threading primitives can always be made to look bad in toy test programs that don't even remotely reflect real-world use cases. No sane person optimizes for such toys.

The reason your program behaves the way it does is because the thread that holds the mutex relinquishes the CPU while it holds it. As such, it appears to be very nice and is its dynamic priority level rises. In a real-world case, the threads waiting for the mutex will have their priorities rise while the thread holding the mutex will use up its timeslice working.

This is simply not appropriate use of a mutex. It would be absolute foolishness to encumber the platform's default mutex implementation with any attempt to make such abuses do more what you happen to expect them to do.

In fact, the behavior I expect is the behavior seen. So the defect is in your unreasonable expectations. The scheduler's goal is to allow the running thread to make forward progress, and it does this perfectly.

DS


------------------------------

Message: 19
Date: Fri, 16 May 2008 00:25:26 +0100
From: "Dr Josef Karthauser" <joe@tao.org.uk>
Subject: RE: cvsup.uk.FreeBSD.org
To: "'Tony Finch'" <dot@dotat.at>
Cc: freebsd-hubs@freebsd.org, freebsd-stable@freebsd.org
Message-ID: <07f101c8b6e2$f4b13f20$de13bd60$@org.uk>
Content-Type: text/plain; charset="us-ascii"

> -----Original Message-----
> From: owner-freebsd-hubs@freebsd.org [mailto:owner-freebsd-
> hubs@freebsd.org] On Behalf Of Tony Finch
> Sent: 12 May 2008 17:06
> To: Dr Joe Karthauser
> Cc: freebsd-hubs@freebsd.org; freebsd-stable@freebsd.org
> Subject: Re: cvsup.uk.FreeBSD.org
>
> On Sun, 11 May 2008, Dr Joe Karthauser wrote:
> >
> > I have reclassified this faulty mirror as cvsup1 and made cvsup a
> cname to
> > cvsup3, which is the most recent addition and best hardware
> available. In
> > the future we will always point to the most available machine in this
> way.
>
> Looks like I'm getting a bit more traffic than before - peaking at over
> 100 logins an hour.

As a matter of interest, do you know what the peak bandwidth usage is?

Joe


------------------------------

Message: 20
Date: Thu, 15 May 2008 16:44:57 -0700
From: "Rob Lytle" <jan6146@gmail.com>
Subject: today's build is causing errors for me
To: freebsd-stable@freebsd.org
Message-ID:
<784966050805151644m43b9a5e1qe8a4568576fc081f@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

First I am running 7.0-Stable and just cvsup'd today. Then I built the
system, new kernel, and installed them. Second I am using the GENERIC
KERNEL. Sysctl.conf is empty. I will put my /etc/rc.conf at the end. I
tried to do a very careful job of merging /usr/src/etc with /etc. I didn't
touch any files that I or the computer configured.

But I am getting these errors upon bootup:

1. eval: /etc/rc.d/cleanvar: Permission Denied
2. syslogd: bind: Can't assign requested address. (repeated twice)
3. syslogd: child pid 134 exited with return code 1
4. /etc/rc: Warning: Dump device does not exist. Savecore will not
run. (this always worked before)
5. /etc/rc.d/securelevel: /etc/rc.d/sysctl: Permission denied.
6. My computer says "Amnesiac" yet the host name is clearly in rc.conf
7. My WiFi no longer starts up by myself. I have to do it all manually
using ifconfig and dhclient.

Any help would be appreciated. I'm kind of lost as some of it makes no
sense to me, esp #6 and 7.
Has the default rc.conf format changed???

Thanks,

Sincerely, Rob

------------------------------------------
My rc.conf file

# -- sysinstall generated deltas -- # Sun Oct 28 11:36:26 2007
# Created: Sun Oct 28 11:36:26 2007
# Enable network daemons for user convenience.
# Please make all changes to this file, not to /etc/defaults/rc.conf.
# This file now contains just the overrides from /etc/defaults/rc.conf.
hostname="xenon" # "" for DHCP
linux_enable="YES"
#moused_enable="YES"
sshd_enable="YES"
usbd_enable="YES"
lpd_enable="YES"
kern_securelevel_enable="NO"
dumpdev="AUTO"
dumpdir="/var/crash"
cron_enable="YES"
performance_cx_lowest="LOW"
performance_cpu_freq="HIGH"
economy_cx_lowest="LOW"
economy_cpu_freq="LOW"
ipfilter_enable="YES"
ipfilter_rules="/etc/ipfw.rules"
ipmon_enable="YES"
ipmon_flags="-Ds"
watchdogd_enable="YES"
powerd_enable="YES"
mixer_enable="YES"

#ifconfig_ath0="WPA DHCP channel 3"
#ifconfig_msk0="DHCP"
ifconfig_ath0="DHCP ssid leighmorlock channel 6"
# added by mergebase.sh
local_startup="/usr/local/etc/rc.d"


------------------------------

Message: 21
Date: Fri, 16 May 2008 13:56:49 +1000 (EST)
From: Ian Smith <smithi@nimnet.asn.au>
Subject: Re: how much memory does increasing max rules for IPFW take
up?
To: Jeremy Chadwick <koitsu@freebsd.org>
Cc: Vivek Khera <vivek@khera.org>, "Andrey V. Elsukov"
<bu7cher@yandex.ru>, "Bruce M. Simpson" <bms@freebsd.org>,
freebsd-stable@freebsd.org, freebsd-ipfw@freebsd.org
Message-ID:
<Pine.BSF.3.96.1080516121446.12512A-100000@gaia.nimnet.asn.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Thu, 15 May 2008, Jeremy Chadwick wrote:
> On Thu, May 15, 2008 at 11:03:53AM +0100, Bruce M. Simpson wrote:
> > Andrey V. Elsukov wrote:
> >> Vivek Khera wrote:
> >>> I had a box run out of dynamic state space yesterday. I found I can
> >>> increase the number of dynamic rules by increasing the sysctl parameter
> >>> net.inet.ip.fw.dyn_max. I can't find, however, how this affects memory
> >>> usage on the system. Is it dyanamically allocated and de-allocated, or
> >>> is it a static memory buffer?
> >>
> >> Each dynamic rule allocated dynamically. Be careful, too many dynamic
> >> rules will work very slow.
> >
> > Got any figures for this? I took a quick glance and it looks like it just
> > uses a hash over dst/src/dport/sport. If there are a lot of raw IP or ICMP
> > flows then that's going to result in hash collisions.
> >
> > It might be a good project for someone to optimize if it isn't scaling for
> > folk. "Bloomier" filters are probably worth a look -- bloom filters are a
> > class of probabilistic hash which may return a false positive, "bloomier"
> > filters are a refinement which tries to limit the false positives.
> >
> > Having said that the default tunable of 256 state entries is probably quite
> > low for use cases other than "home/small office NAT gateway".
>
> It's far too low for home/small office. Standard Linux NAT routers,
> such as the Linksys WRT54G/GL, come with a default state table count of
> 2048, and often is increased by third-party firmwares to 8192 based on
> justified necessity. Search for "conntrack" below:
>
> http://www.polarcloud.com/firmware

>
> 256 can easily be exhausted by more than one user loading multiple HTTP
> 1.0 web pages at one time (such is the case with many users now have
> browsers that load 7-8 web pages into separate tabs during startup).
>
> And if that's not enough reason, consider torrents, which is quite often
> what results in a home or office router exhausting its state table.
>
> Bottom line: the 256 default is too low. It needs to be increased to at
> least 2048.

I think there may be some confusion in terms. Looking at defaults on my
older 5.5 system - sure, call it a "home/small office NAT gateway":

net.inet.ip.fw.dyn_buckets: 256
net.inet.ip.fw.curr_dyn_buckets: 256
net.inet.ip.fw.dyn_count: 212
net.inet.ip.fw.dyn_max: 4096
net.inet.ip.fw.static_count: 153

What defaults to 256 is the number of hash table buckets, not the max
number of dynamic rules, here 4096 (though the 5.5 manual says 8192).

On hash collisions, a linked list is used for duplicate hashes of:

i = (id->dst_ip) ^ (id->src_ip) ^ (id->dst_port) ^ (id->src_port);
i &= (curr_dyn_buckets - 1);

So while 256 may well be too few buckets for many systems, and like
Bruce I wonder about the effectiveness of the xor hash for raw IP & ICMP
and wouldn't mind seeing some stats on bucket use vs linked list lengths
for various workloads, it doesn't determine the max no. of dynamic rules
available, which is adjustable up without any apparent static memory
allocation, and is moderated by the various expiry timeout sysctls.

For reference, I admin a 4.8 filtering bridge with up to 20 boxes behind
it, that has only very rarely reported exceeding the max no. of dynamic
rules with the (4.8) default net.inet.ip.fw.dyn_max of 1000 .. however
it only keeps state for UDP connections (and yes, it only ever hits that
limit on torrents or skype, which are generally admin. prohib. :)

cheers, Ian (not subscribed to -ipfw)

------------------------------

Message: 22
Date: Fri, 16 May 2008 08:33:11 +0400
From: "Andrey V. Elsukov" <bu7cher@yandex.ru>
Subject: Re: how much memory does increasing max rules for IPFW take
up?
To: "Bruce M. Simpson" <bms@FreeBSD.org>
Cc: Vadim Goncharov <vadim_nuclight@mail.ru>, Vivek Khera
<vivek@khera.org>, FreeBSD Stable <freebsd-stable@freebsd.org>,
freebsd-ipfw@freebsd.org
Message-ID: <482D0E87.6000003@yandex.ru>
Content-Type: text/plain; charset=KOI8-R; format=flowed

Bruce M. Simpson wrote:
> Got any figures for this? I took a quick glance and it looks like it
> just uses a hash over dst/src/dport/sport. If there are a lot of raw IP
> or ICMP flows then that's going to result in hash collisions.

It's my guess, i haven't any figures..
Yes, hash collisions will trigger many searching in buckets lists.
And increasing only dyn_max without increasing dyn_buckets will
grow collisions.

> It might be a good project for someone to optimize if it isn't scaling
> for folk. "Bloomier" filters are probably worth a look -- bloom filters
> are a class of probabilistic hash which may return a false positive,
> "bloomier" filters are a refinement which tries to limit the false
> positives.

There were some ideas from Vadim Goncharov about rewriting dynamic
rules implementation..

--
WBR, Andrey V. Elsukov


------------------------------

Message: 23
Date: Thu, 15 May 2008 21:34:12 -0700
From: Jeremy Chadwick <koitsu@FreeBSD.org>
Subject: Re: today's build is causing errors for me
To: Rob Lytle <jan6146@gmail.com>
Cc: freebsd-stable@freebsd.org
Message-ID: <20080516043411.GA44491@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Thu, May 15, 2008 at 04:44:57PM -0700, Rob Lytle wrote:
> First I am running 7.0-Stable and just cvsup'd today. Then I built the
> system, new kernel, and installed them. Second I am using the GENERIC
> KERNEL. Sysctl.conf is empty. I will put my /etc/rc.conf at the end. I
> tried to do a very careful job of merging /usr/src/etc with /etc. I didn't
> touch any files that I or the computer configured.
>
> But I am getting these errors upon bootup:
>
> 1. eval: /etc/rc.d/cleanvar: Permission Denied
> 2. syslogd: bind: Can't assign requested address. (repeated twice)
> 3. syslogd: child pid 134 exited with return code 1
> 4. /etc/rc: Warning: Dump device does not exist. Savecore will not
> run. (this always worked before)
> 5. /etc/rc.d/securelevel: /etc/rc.d/sysctl: Permission denied.
> 6. My computer says "Amnesiac" yet the host name is clearly in rc.conf
> 7. My WiFi no longer starts up by myself. I have to do it all manually
> using ifconfig and dhclient.

You should have followed the instructions in /usr/src/Makefile. You
don't "merge things by hand". You can use mergemaster for that. Please
use it, as I'm willing to bet there's a portion of your rc framework
which is broken in some way.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking

http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 24
Date: Thu, 15 May 2008 23:04:34 -0700
From: "Rob Lytle" <jan6146@gmail.com>
Subject: Re: today's build is causing errors for me
To: "Jeremy Chadwick" <koitsu@freebsd.org>
Cc: freebsd-stable@freebsd.org
Message-ID:
<784966050805152304t1827c53bi6988ae9c8f837f32@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Jeremy,

I used Mergemaster. Thats what I mean't when I said that I carefully
"merged" /usr/src/etc/ with /etc. But like I said, no files were
replaced that contained my own configuration, e.g. group. I will say
this- that I have always considered Mergemaster a confusing mess,
despite the dogma on the lists. I have been running FreeBSD and
installing it since 1998, so I have some experience- but this is new
behavior beyond my previous experiences.

Sincerely, Rob.

On 5/15/08, Jeremy Chadwick <koitsu@freebsd.org> wrote:
> On Thu, May 15, 2008 at 04:44:57PM -0700, Rob Lytle wrote:
>> First I am running 7.0-Stable and just cvsup'd today. Then I built the
>> system, new kernel, and installed them. Second I am using the GENERIC
>> KERNEL. Sysctl.conf is empty. I will put my /etc/rc.conf at the end. I
>> tried to do a very careful job of merging /usr/src/etc with /etc. I
>> didn't
>> touch any files that I or the computer configured.
>>
>> But I am getting these errors upon bootup:
>>
>> 1. eval: /etc/rc.d/cleanvar: Permission Denied
>> 2. syslogd: bind: Can't assign requested address. (repeated twice)
>> 3. syslogd: child pid 134 exited with return code 1
>> 4. /etc/rc: Warning: Dump device does not exist. Savecore will not
>> run. (this always worked before)
>> 5. /etc/rc.d/securelevel: /etc/rc.d/sysctl: Permission denied.
>> 6. My computer says "Amnesiac" yet the host name is clearly in rc.conf
>> 7. My WiFi no longer starts up by myself. I have to do it all manually
>> using ifconfig and dhclient.
>
> You should have followed the instructions in /usr/src/Makefile. You
> don't "merge things by hand". You can use mergemaster for that. Please
> use it, as I'm willing to bet there's a portion of your rc framework
> which is broken in some way.
>
> --
> | Jeremy Chadwick jdc at parodius.com |
> | Parodius Networking

http://www.parodius.com/ |
> | UNIX Systems Administrator Mountain View, CA, USA |
> | Making life hard for others since 1977. PGP: 4BD6C0CB |
>
>


------------------------------

Message: 25
Date: Thu, 15 May 2008 23:17:24 -0700
From: Jeremy Chadwick <koitsu@FreeBSD.org>
Subject: Re: today's build is causing errors for me
To: Rob Lytle <jan6146@gmail.com>
Cc: freebsd-stable@freebsd.org
Message-ID: <20080516061724.GA47953@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Thu, May 15, 2008 at 11:04:34PM -0700, Rob Lytle wrote:
> Hi Jeremy,
>
> I used Mergemaster. Thats what I mean't when I said that I carefully
> "merged" /usr/src/etc/ with /etc. But like I said, no files were
> replaced that contained my own configuration, e.g. group. I will say
> this- that I have always considered Mergemaster a confusing mess,
> despite the dogma on the lists. I have been running FreeBSD and
> installing it since 1998, so I have some experience- but this is new
> behavior beyond my previous experiences.

"Permission denied" could imply that the rc scripts aren't set to
executable. Possibly a umask problem?

Additionally, mergemaster isn't a confusing mess. If anything, it's one
of the most simple tools there is for managing /etc. The part you
probably find "confusing", which is the same part I did when I started
using it, is the side-by-side interactive diff. It's very easy to use;
"r" means use the text shown on the right, and "l" means use the text
shown on the left.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking

http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 26
Date: Thu, 15 May 2008 23:22:29 -0700
From: "Rob Lytle" <jan6146@gmail.com>
Subject: Re: today's build is causing errors for me / Fixed for now
To: "Jeremy Chadwick" <koitsu@freebsd.org>
Cc: freebsd-stable@freebsd.org
Message-ID:
<784966050805152322n5e03b67bhc29d912860c0c3c3@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Jeremy,

I always back up /etc before I upgrade the system. The old /etc works just
fine. I will eventually go back in and check out the new /etc to see what
is wrong.

Sincerely, Rob.

On Thu, May 15, 2008 at 11:17 PM, Jeremy Chadwick <koitsu@freebsd.org>
wrote:

> On Thu, May 15, 2008 at 11:04:34PM -0700, Rob Lytle wrote:
> > Hi Jeremy,
> >
> > I used Mergemaster. Thats what I mean't when I said that I carefully
> > "merged" /usr/src/etc/ with /etc. But like I said, no files were
> > replaced that contained my own configuration, e.g. group. I will say
> > this- that I have always considered Mergemaster a confusing mess,
> > despite the dogma on the lists. I have been running FreeBSD and
> > installing it since 1998, so I have some experience- but this is new
> > behavior beyond my previous experiences.
>
> "Permission denied" could imply that the rc scripts aren't set to
> executable. Possibly a umask problem?
>
> Additionally, mergemaster isn't a confusing mess. If anything, it's one
> of the most simple tools there is for managing /etc. The part you
> probably find "confusing", which is the same part I did when I started
> using it, is the side-by-side interactive diff. It's very easy to use;
> "r" means use the text shown on the right, and "l" means use the text
> shown on the left.
>
> --
> | Jeremy Chadwick jdc at parodius.com |
> | Parodius Networking

http://www.parodius.com/ |
> | UNIX Systems Administrator Mountain View, CA, USA |
> | Making life hard for others since 1977. PGP: 4BD6C0CB |
>
>


------------------------------

Message: 27
Date: Fri, 16 May 2008 00:02:47 -0700
From: "Rob Lytle" <jan6146@gmail.com>
Subject: Re: today's build is causing errors for me
To: "Jeremy Chadwick" <koitsu@freebsd.org>
Cc: freebsd-stable@freebsd.org
Message-ID:
<784966050805160002g63defc51j11465f0d51025a0d@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Jeremy,

You were correct. Somehow some files in /etc/rc.d had permissions of 644.
Setting the new permissions to that of the old fixed the problem. Thanks.

Rob.

On Thu, May 15, 2008 at 11:17 PM, Jeremy Chadwick <koitsu@freebsd.org>
wrote:

> On Thu, May 15, 2008 at 11:04:34PM -0700, Rob Lytle wrote:
> > Hi Jeremy,
> >
> > I used Mergemaster. Thats what I mean't when I said that I carefully
> > "merged" /usr/src/etc/ with /etc. But like I said, no files were
> > replaced that contained my own configuration, e.g. group. I will say
> > this- that I have always considered Mergemaster a confusing mess,
> > despite the dogma on the lists. I have been running FreeBSD and
> > installing it since 1998, so I have some experience- but this is new
> > behavior beyond my previous experiences.
>
> "Permission denied" could imply that the rc scripts aren't set to
> executable. Possibly a umask problem?
>
> Additionally, mergemaster isn't a confusing mess. If anything, it's one
> of the most simple tools there is for managing /etc. The part you
> probably find "confusing", which is the same part I did when I started
> using it, is the side-by-side interactive diff. It's very easy to use;
> "r" means use the text shown on the right, and "l" means use the text
> shown on the left.
>
> --
> | Jeremy Chadwick jdc at parodius.com |
> | Parodius Networking

http://www.parodius.com/ |
> | UNIX Systems Administrator Mountain View, CA, USA |
> | Making life hard for others since 1977. PGP: 4BD6C0CB |
>
>


------------------------------

Message: 28
Date: Fri, 16 May 2008 00:13:19 -0700
From: "Rob Lytle" <jan6146@gmail.com>
Subject: just one last question about /etc/rc.d file permissions
To: "Jeremy Chadwick" <koitsu@freebsd.org>
Cc: freebsd-stable@freebsd.org
Message-ID:
<784966050805160013w7d3ec31euc95be9e4a0a68fa9@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Jeremy,

I noticed that most all of the files in my old /etc/rc.d had 555
permissions. There were 4 or 5 that had 644 permissions in my old
/etc/rc.d. What I am wondering is if all the files in rc.d should be
555? So far I am not experiencing any problems with anything with a
very few 644 files.

Thanks, Rob.

On 5/16/08, Rob Lytle <jan6146@gmail.com> wrote:
> Hi Jeremy,
>
> You were correct. Somehow some files in /etc/rc.d had permissions of 644.
> Setting the new permissions to that of the old fixed the problem. Thanks.
>
> Rob.
>


------------------------------

Message: 29
Date: Fri, 16 May 2008 17:39:32 +1000
From: Emil Mikulic <emikulic@gmail.com>
Subject: ciss(4) not coping with large arrays?
To: freebsd-stable@freebsd.org
Message-ID: <20080516073932.GA39803@dmr.ath.cx>
Content-Type: text/plain; charset=us-ascii

Hi all,

Running today's RELENG_7 (although 7.0-RELEASE has the same problem),
GENERIC kernel on an amd64 and I can't seem to get a da(4) device for
any arrays bigger than 2TB.

dmesg:
<...>
ciss0: <HP Smart Array P400> port 0x4000-0x40ff mem 0xfdf00000-0xfdffffff,0xfdef
0000-0xfdef0fff irq 16 at device 0.0 on pci10
ciss0: [ITHREAD]
<...>
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-5 device
da0: 135.168MB/s transfers
da0: 953837MB (1953459632 512 byte sectors: 255H 32S/T 65535C)
da1 at ciss0 bus 0 target 1 lun 0
da1: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-5 device
da1: 135.168MB/s transfers
da1: 953837MB (1953459632 512 byte sectors: 255H 32S/T 65535C)
da2 at ciss0 bus 0 target 2 lun 0
da2: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-5 device
da2: 135.168MB/s transfers
da2: 1907675MB (3906918832 512 byte sectors: 255H 32S/T 65535C)
(da3:ciss0:0:3:0): got CAM status 0x4
(da3:ciss0:0:3:0): fatal error, failed to attach to device
(da3:ciss0:0:3:0): lost device
(da3:ciss0:0:3:0): removing device entry
(da4:ciss0:0:4:0): got CAM status 0x4
(da4:ciss0:0:4:0): fatal error, failed to attach to device
(da4:ciss0:0:4:0): lost device
(da4:ciss0:0:4:0): removing device entry
<...>

The arrays I'm testing with:
da1 = 1 x 1TB
da2 = 2 x 1TB
da3 = 3 x 1TB
da4 = 4 x 1TB

Also:
# camcontrol devlist
<COMPAQ RAID 1 VOLUME OK> at scbus0 target 0 lun 0 (pass0,da0)
<COMPAQ RAID 0 VOLUME OK> at scbus0 target 1 lun 0 (pass1,da1)
<COMPAQ RAID 0 VOLUME OK> at scbus0 target 2 lun 0 (pass2,da2)
<COMPAQ RAID 0 VOLUME OK> at scbus0 target 3 lun 0 (pass3)
<COMPAQ RAID 0 VOLUME OK> at scbus0 target 4 lun 0 (pass4)
# camcontrol readcap pass2
Last Block: 3906918831, Block Length: 512 bytes
# camcontrol readcap pass3
(pass3:ciss0:0:3:0): SERVICE ACTION IN(16). CDB: 9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0
(pass3:ciss0:0:3:0): CAM Status: CCB request completed with an error

Is it possible to get FreeBSD to recognize arrays > 2TB?
Are there any further diagnostics I can provide?

--Emil


------------------------------

Message: 30
Date: Fri, 16 May 2008 09:50:10 +0200
From: "Claus Guttesen" <kometen@gmail.com>
Subject: Re: ciss(4) not coping with large arrays?
To: "Emil Mikulic" <emikulic@gmail.com>
Cc: freebsd-stable@freebsd.org
Message-ID:
<b41c75520805160050l3c9acc4fx94bfabccb1f4d1d3@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> Running today's RELENG_7 (although 7.0-RELEASE has the same problem),
> GENERIC kernel on an amd64 and I can't seem to get a da(4) device for
> any arrays bigger than 2TB.

In earlier releases (5 and 6 at least) you couldn't create partitions
larger than 2 TB. I don't know whether work has been to circumvent
this in 7 but tools like fsck has to be changed as well. Have you
tried zfs?

> dmesg:
> <...>
> ciss0: <HP Smart Array P400> port 0x4000-0x40ff mem 0xfdf00000-0xfdffffff,0xfdef
> 0000-0xfdef0fff irq 16 at device 0.0 on pci10
> ciss0: [ITHREAD]
> <...>
> da0 at ciss0 bus 0 target 0 lun 0
> da0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-5 device
> da0: 135.168MB/s transfers
> da0: 953837MB (1953459632 512 byte sectors: 255H 32S/T 65535C)
> da1 at ciss0 bus 0 target 1 lun 0
> da1: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-5 device
> da1: 135.168MB/s transfers
> da1: 953837MB (1953459632 512 byte sectors: 255H 32S/T 65535C)
> da2 at ciss0 bus 0 target 2 lun 0
> da2: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-5 device
> da2: 135.168MB/s transfers
> da2: 1907675MB (3906918832 512 byte sectors: 255H 32S/T 65535C)
> (da3:ciss0:0:3:0): got CAM status 0x4
> (da3:ciss0:0:3:0): fatal error, failed to attach to device
> (da3:ciss0:0:3:0): lost device
> (da3:ciss0:0:3:0): removing device entry
> (da4:ciss0:0:4:0): got CAM status 0x4
> (da4:ciss0:0:4:0): fatal error, failed to attach to device
> (da4:ciss0:0:4:0): lost device
> (da4:ciss0:0:4:0): removing device entry
> <...>
>
> The arrays I'm testing with:
> da1 = 1 x 1TB
> da2 = 2 x 1TB
> da3 = 3 x 1TB
> da4 = 4 x 1TB
>
> Also:
> # camcontrol devlist
> <COMPAQ RAID 1 VOLUME OK> at scbus0 target 0 lun 0 (pass0,da0)
> <COMPAQ RAID 0 VOLUME OK> at scbus0 target 1 lun 0 (pass1,da1)
> <COMPAQ RAID 0 VOLUME OK> at scbus0 target 2 lun 0 (pass2,da2)
> <COMPAQ RAID 0 VOLUME OK> at scbus0 target 3 lun 0 (pass3)
> <COMPAQ RAID 0 VOLUME OK> at scbus0 target 4 lun 0 (pass4)
> # camcontrol readcap pass2
> Last Block: 3906918831, Block Length: 512 bytes
> # camcontrol readcap pass3
> (pass3:ciss0:0:3:0): SERVICE ACTION IN(16). CDB: 9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0
> (pass3:ciss0:0:3:0): CAM Status: CCB request completed with an error
>
> Is it possible to get FreeBSD to recognize arrays > 2TB?
> Are there any further diagnostics I can provide?
>
> --Emil
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare


------------------------------

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

End of freebsd-stable Digest, Vol 252, Issue 8
**********************************************

0 条评论:

发表评论

<< 主页