From dgaudet-list-new-httpd@arctic.org Tue May  9 00:35:49 2000
Reply-To: new-httpd@apache.org
Date: Mon, 24 Apr 2000 19:18:48 -0700 (PDT)
From: dean gaudet <dgaudet-list-new-httpd@arctic.org>
To: new-httpd@apache.org
Subject: aligned copies
X-comment: visit http://arctic.org/~dean/legal for information regarding
    copyright and disclaimer.

anyone feel like benchmarking with/without the below patch?  it can
probably dropped into 2.0 without much effort.

the 32 is a guess... might try with 8, 16, and 64.  4 is essentially a
waste 'cause the shortest padding header you can add is 4 bytes long.

-dean

---------- Forwarded message ----------
Date: Mon, 24 Apr 2000 19:00:59 -0700 (PDT)
From: dean gaudet <dgaudet-list-linux-kernel@arctic.org>
To: Artur Skawina <skawina@geocities.com>
Cc: kumon@flab.fujitsu.co.jp, Linus Torvalds <torvalds@transmeta.com>,
     Manfred Spraul <manfreds@colorfullife.com>, linux-kernel@vger.rutgers.edu
Subject: Re: lockless poll() (was Re: namei() query)
X-comment: visit http://arctic.org/~dean/legal for information regarding
    copyright and disclaimer.

On Mon, 24 Apr 2000, Artur Skawina wrote:

> kumon@flab.fujitsu.co.jp wrote:
> > 
> > In the heavy duty case, csum_partial_copy_generic() becomes the new
> > winner of the worst time consuming function with the poll()
> > optimization. We are arranging the global figure  now.
> > 
> > Though csum_partial_copy_generic() is highly optimized with
> > hand-crafted code, it eats lots of time. It may be inevitable, but may
> > be reducible. We are now investigating why it does.
> 
> csum_partial_copy_generic() could certainly be more optimized;
> attached is a snapshot of a version that does upto 20% better in
> dumb benchmarks, if and what difference there is for real loads
> i haven't yet measured.
> 
> [patch vs 2.3.99pre6pre5, offsets are wrong]
> 

here is a patch against apache-1.3 which forces it to align the length of
its headers to a 32-byte boundary.  if your benchmark is requesting
objects greater than ~4k in size this will cause apache to generate
writev()s such as:

writev(3, [{"HTTP/1.1 200 OK\r\nDate: Tue, 25 A"..., 288}, {"<!DOCTYPE
HTML PUBLIC \"-//W3C//D"..., 32768}], 2) = 33056

which are 32-byte aligned, and sized...

folks have observed a performance boost with other web-servers with this
technique.  i haven't tested it at all, just thought you might want to try
it if you're trying out new csum_partial_copy_generic()s.

-dean

Index: src/main/buff.c
===================================================================
RCS file: /home/cvs/apache-1.3/src/main/buff.c,v
retrieving revision 1.96
diff -u -r1.96 buff.c
--- src/main/buff.c	2000/03/04 20:51:02	1.96
+++ src/main/buff.c	2000/04/25 01:59:00
@@ -390,8 +390,11 @@
 
     /* overallocate so that we can put a chunk trailer of CRLF into this
      * buffer */
-    if (flags & B_WR)
-	fb->outbase = ap_palloc(p, fb->bufsiz + 2);
+    if (flags & B_WR) {
+#define ALIGN (32)
+	fb->outbase = ap_palloc(p, fb->bufsiz + 2 + ALIGN);
+	fb->outbase += ALIGN - ((long)fb->outbase % ALIGN);
+    }
     else
 	fb->outbase = NULL;
 
Index: src/main/http_protocol.c
===================================================================
RCS file: /home/cvs/apache-1.3/src/main/http_protocol.c,v
retrieving revision 1.289
diff -u -r1.289 http_protocol.c
--- src/main/http_protocol.c	2000/02/20 01:14:47	1.289
+++ src/main/http_protocol.c	2000/04/25 01:59:00
@@ -1445,6 +1445,21 @@
     if (bs >= 255 && bs <= 257)
         ap_bputs("X-Pad: avoid browser bug" CRLF, client);
 
+#define ALIGN (32)
+    ap_bgetopt(client, BO_BYTECT, &bs);
+    bs += 2; /* for the final terminating empty line */
+    if (bs % ALIGN) {
+	ap_bputc('X', client);
+	ap_bputc(':', client);
+	bs += 4;	/* 2 for "X:" and 2 for the final CRLF */
+	while (bs % ALIGN) {
+	    ap_bputc('X', client);
+	    ++bs;
+	}
+	ap_bputc('\r', client);
+	ap_bputc('\n', client);
+    }
+
     ap_bputs(CRLF, client);  /* Send the terminating empty line */
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/


---------- Forwarded message ----------
Date: Mon, 24 Apr 2000 19:12:24 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
To: dean gaudet <dgaudet-list-linux-kernel@arctic.org>
Cc: Artur Skawina <skawina@geocities.com>, kumon@flab.fujitsu.co.jp,
     Manfred Spraul <manfreds@colorfullife.com>, linux-kernel@vger.rutgers.edu
Subject: Re: lockless poll() (was Re: namei() query)



On Mon, 24 Apr 2000, dean gaudet wrote:
> 
> here is a patch against apache-1.3 which forces it to align the length of
> its headers to a 32-byte boundary.  if your benchmark is requesting
> objects greater than ~4k in size this will cause apache to generate
> writev()s such as:

Nice.

The intel guys already did an experimental (and fairly ugly) patch to make
the kernel try to semi-pad the destination by selecting specific sizes for
the packets sent out over TCP. They claimed a 3% speedup in specweb (or
something) from that. The argument from Dave and Alan was that it should
be done from within the web-server.

		Linus


