Path: newssvr16.news.prodigy.com!newssvr06.news.prodigy.com!newsmst01.news.prodigy.com!prodigy.com!postmaster.news.prodigy.com!newssvr17.news.prodigy.com.POSTED!97296ea1!not-for-mail
From: "Andy Glew" <andy-glew-public@sbcglobal.net>
Newsgroups: comp.arch
References: <3486820e.0305112321.1b4c3b26@posting.google.com>
Subject: Re: Does PowerPC 970 has Tagged TLBs (Address Space Identifiers)
Lines: 126
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1106
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
Message-ID: <A5Qva.63$d97.24@newssvr17.news.prodigy.com>
NNTP-Posting-Host: 63.203.205.229
NNTP-Posting-Date: Mon, 12 May 2003 12:37:52 EDT
Organization: Prodigy Internet http://www.prodigy.com
Date: Mon, 12 May 2003 16:37:52 GMT
Xref: newsmst01.news.prodigy.com comp.arch:123734
MIME-Version: 1.0
Content-Type: text/plain

> As you know a tagged TLB is useful for improving performance in
> context switches so that it's not required to refill the entire TLB.
> When a context switch happens, the new context does not have to refill
> the TLB entries that their ASID is the same as its ASID.
>
> The only processor that has a tagged TLB atchitecture and I'm aware of
> it is MIPS. Tagged TLBs are key to the success of microkernel based
> OSes.

Tagged TLBs are one way of allowing TLB entries to persist across
context switches.  However, TLBs tagged with process IDs do not
help shared libraries or data that are shared between some, but not
all, processes.

I'm tempted to say that process-ID TLB tagging is now known to be
a dead end wrt instruction set architecture, because of these
other alternatives:
    (0) Process ID tagged TLBs
    (1) Folded Address Space
    (2) Object ID tagged TLBs
    (3) Snoopy TLBs

(0) - the original poster probably understands them.  TLB entries are tagged
with a process ID, and are only hit if the current process ID matches that
in the TLB.
    Usually, there is a kluge to allow the process ID tag for kernel TLB
entries to
be ignored, so that kernel TLB entries can be shared amongst all processes.

(1) As others have pointed out, the sort of "folded address space" (my name)
that is in some of the Power chips, the IBM PA, and the Intel Itanium,
give you something better than tagged TLBs.
    I call these "folded" because, given a V1-bit virtual address, the upper
S1 bits are looked up in a table that provide you with V2 = V1-S1+S2 bits
of virtual address - the smaller V1-bit virtual address is "unfolded" into
a larger V2-bit virtual address.  I.e. the V1-bit virtual address space is
divided up into 2^S1 "segments" of (2^(V1-S1)) bits each.
    Typical sizes are V1=64 bits, V2=80 bits, and S1=64-52 bits.

So, at the very least, you can just use this as a 16 bit process ID TLB tag.
But you should be able to see how this can be used for partial sharing.

(2) Let me just briefly mention object ID tagged TLBs, where enries are
tagged not with the process ID, but with an "objectID" that corresponds
to (a) shared library id, or (b) program text id, or (c) finally, the
process
id for process private, unshared, data.  I.e. instead of all TLB entries
having the same TLB tag, a process uses TLB entries with several different
TLB tags.
    One implementation reads TLB entries, and compares the tag to a list of
"currently active" tags on the processor. A miss might constitute a TLB miss,
or possibly the list of currently active tags can itself be considered to miss.
    Another implementation uses "activate/deactivate" CAMs or scans.
    It can be seen that the special handling of kernel TLB entries in
process ID tagged TLBs is just a runt form of this.

(3) Finally, there arises the possibility of snooped or coherent TLBs.
TLB entries can be snooped to remain consistent with the memory copy
of the page tables, and consistent with the current process.
    AMD's "TLB Probe Filter" can be considered a step in this direction,
towards snoopy TLBs.  Interestingly, while I was at Intel and at university
(both prior to Intel, and at Wisconsin after/between Intel stints) I designed
similar structures preceding AMD's announcement, and then I tried
to figure out what AMD had built based on the few scanty slides;
now that I have seen what AMD actually built, I can also see how
to be better.
    If TLB miss costs matter, there could well be an arms race in this
area between AMD and Intel.

The nice thing about snoopy TLBs is that they do not require any
architectural changes, or OS changes.   The bad thing is that, done
naive, they are quite expensive in hardware; potentially lots of
snoopers.   Of course, you don't need to naive; snoopers can be
shared between many different TLB entries, if there is any degree
of locality or non-sparseness in the virtual address space.

I feel reasonably confident in saying that snoopy TLBs are buildable
for conventional virtual memory architectures.   I feel less confident
in saying that snoopy TLBs are buildable for IBM style multilayer
virtual machine architectures;  or, rather, they are buildable, but
virtualized page tables provide an extra combinatorica factor;
and, since the most common ways of dealing with page tables in
VMs involve the virtual machine host unmapping the virtual machine
guest's page tables, it is not clear that snoopy page tables need
to be extended to multilayer VMs.  But they could be.

It is important to note that there are two issues here:
    (1) snooping page table memory writes, so that the TLBs
can be consistent, whether instantaneously or delayed until
the next TLB "invalidate"
    (2) tracking which TLB entries belong to which process
They are related.

While it would certainly be possible to have TLBs "instantaneously"
coherent with mmory, it is not clear
    (a) if that might not break some OSes
    (b) if that might not be unnecessarily expensive
    (b') if that might not prohibit some interesting implementations.

===

It's not clear if  any of this is worthwhile, if TLB misses are cheap
- e.g. if they can be done speculatively, if they can use the cache, etc.

MIPS probably needed some help because of software TLB miss handling.
Although even this can be accelerated, e.g. on a multithreaded machine.

===

Anyway, bottom line:

Tagged TLBs are probably reasonable, since just about all implementations
described above have one form or another of TLB tags.

MIPS-style process ID tagged TLBs are probably a dead-end.

Folded virtual addresses or object ID tagged TLBs are probably better.

Snoopy TLBs are a bit more expensive, but not as expensive as the naive think,
and are architecturally invisible.


Path: newssvr17.news.prodigy.com!newscon03.news.prodigy.com!newsmst01.news.prodigy.com!prodigy.com!news.cc.ukans.edu!logbridge.uoregon.edu!newsfeed.vmunix.org!newsfeed.hanau.net!news-fra1.dfn.de!newsfeed01.univie.ac.at!aconews-feed.univie.ac.at!news.tuwien.ac.at!a0.complang.tuwien.ac.at!anton
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Does PowerPC 970 has Tagged TLBs (Address Space Identifiers)
Date: Tue, 13 May 2003 08:07:55 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 74
Sender: anton@a0.complang.tuwien.ac.at (Anton Ertl)
Message-ID: <2003May13.100755@a0.complang.tuwien.ac.at>
References: <3486820e.0305112321.1b4c3b26@posting.google.com>
	<A5Qva.63$d97.24@newssvr17.news.prodigy.com>
NNTP-Posting-Host: a0.complang.tuwien.ac.at
X-Newsreader: xrn 9.03-beta-14
Xref: newsmst01.news.prodigy.com comp.arch:123751
MIME-Version: 1.0
Content-Type: text/plain

"Andy Glew" <andy-glew-public@sbcglobal.net> writes:
>Tagged TLBs are one way of allowing TLB entries to persist across
>context switches.  However, TLBs tagged with process IDs do not
>help shared libraries or data that are shared between some, but not
>all, processes.

They help them just in the same way as they help non-shared VMAs, i.e,
by letting their TLB entries persist across context switches.  They
may result in multiple TLB entries for the same object.  But is this a
significant performance problem?

Trying to think of a typical scenario where persistence across context
switches helps significantly: There would be a lot of processes that
do little computation before activating the scheduler again (due to a
blocking system call, e.g., I/O or IPC), and there would be one or
several CPU-bound processes who would suffer from TLB misses after
each activation of a non-CPU-bound process if there were no ASIDs.

In such a scenario:

Is there significant sharing of active pages between the CPU-bound
processes and the non-CPU-bound processes?  Probably not.

Is there sharing between the non-CPU-bound processes?  There probably
would be, but if the CPU-bound processes need lots of TLB entries, it
will throw out the other TLB entries anyway, so this sharing cannot be
utilized.  If the CPU-bound process does not need lots of TLB entries,
would utilizing the sharing with more sophisticated TLB tagging help
much?

Is there sharing between the CPU-bound processes?  Maybe, but context
switches between them are rare enough that this is not a significant
issue.

>I'm tempted to say that process-ID TLB tagging is now known to be
>a dead end wrt instruction set architecture, because of these
>other alternatives:
>    (0) Process ID tagged TLBs

Benefit: persistence across context switches

Cost: some changes to the OS

>    (1) Folded Address Space

Benefit: sharing of TLB entries between objects

Cost: (In addition to OS changes) Various restrictions at the user
level if you want to make use of the benefit.  E.g., AIX originally
allowed only 10 mmaps per process.  I don't consider the benefits
worth such costs.

>    (2) Object ID tagged TLBs

386 style segmentation?  Or something like 2, but with more
flexibility?

The former requires lots of user-level changes.

>    (3) Snoopy TLBs

Benefit: completely transparent to software.

Cost: Additional hardware complexity.

If the hardware cost can be made small enough, this looks like a
winner.  Otherwise I think that (0) still has the best benefit/cost
ratio.

- anton
-- 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html