diff -puN /dev/null fs/reiser4/znode.c
--- /dev/null Thu Apr 11 07:25:15 2002
+++ 25-akpm/fs/reiser4/znode.c Wed Mar 30 14:55:08 2005
@@ -0,0 +1,1141 @@ /* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
* reiser4/README */ /* Znode manipulation functions. */ /* Znode is the in-memory header for a tree node. It is stored
separately from the node itself so that it does not get written to
disk. In this respect znode is like buffer head or page head. We
also use znodes for additional reiser4 specific purposes:
. they are organized into tree structure which is a part of whole
reiser4 tree.
. they are used to implement node grained locking
. they are used to keep additional state associated with a
node
. they contain links to lists used by the transaction manager
Znode is attached to some variable "block number" which is instance of
fs/reiser4/tree.h:reiser4_block_nr type. Znode can exist without
appropriate node being actually loaded in memory. Existence of znode itself
is regulated by reference count (->x_count) in it. Each time thread
acquires reference to znode through call to zget(), ->x_count is
incremented and decremented on call to zput(). Data (content of node) are
brought in memory through call to zload(), which also increments ->d_count
reference counter. zload can block waiting on IO. Call to zrelse()
decreases this counter. Also, ->c_count keeps track of number of child
znodes and prevents parent znode from being recycled until all of its
children are. ->c_count is decremented whenever child goes out of existence
(being actually recycled in zdestroy()) which can be some time after last
reference to this child dies if we support some form of LRU cache for
znodes.
*/ /* EVERY ZNODE'S STORY
1. His infancy.
Once upon a time, the znode was born deep inside of zget() by call to
zalloc(). At the return from zget() znode had:
. reference counter (x_count) of 1
. assigned block number, marked as used in bitmap
. pointer to parent znode. Root znode parent pointer points
to its father: "fake" znode. This, in turn, has NULL parent pointer.
. hash table linkage
. no data loaded from disk
. no node plugin
. no sibling linkage
2. His childhood
Each node is either brought into memory as a result of tree traversal, or
created afresh, creation of the root being a special case of the latter. In
either case it's inserted into sibling list. This will typically require
some ancillary tree traversing, but ultimately both sibling pointers will
exist and JNODE_LEFT_CONNECTED and JNODE_RIGHT_CONNECTED will be true in
zjnode.state.
3. His youth.
If znode is bound to already existing node in a tree, its content is read
from the disk by call to zload(). At that moment, JNODE_LOADED bit is set
in zjnode.state and zdata() function starts to return non null for this
znode. zload() further calls zparse() that determines which node layout
this node is rendered in, and sets ->nplug on success.
If znode is for new node just created, memory for it is allocated and
zinit_new() function is called to initialise data, according to selected
node layout.
4. His maturity.
After this point, znode lingers in memory for some time. Threads can
acquire references to znode either by blocknr through call to zget(), or by
following a pointer to unallocated znode from internal item. Each time
reference to znode is obtained, x_count is increased. Thread can read/write
lock znode. Znode data can be loaded through calls to zload(), d_count will
be increased appropriately. If all references to znode are released
(x_count drops to 0), znode is not recycled immediately. Rather, it is
still cached in the hash table in the hope that it will be accessed
shortly.
There are two ways in which znode existence can be terminated:
. sudden death: node bound to this znode is removed from the tree
. overpopulation: znode is purged out of memory due to memory pressure
5. His death.
Death is complex process.
When we irrevocably commit ourselves to decision to remove node from the
tree, JNODE_HEARD_BANSHEE bit is set in zjnode.state of corresponding
znode. This is done either in ->kill_hook() of internal item or in
kill_root() function when tree root is removed.
At this moment znode still has:
. locks held on it, necessary write ones
. references to it
. disk block assigned to it
. data loaded from the disk
. pending requests for lock
But once JNODE_HEARD_BANSHEE bit set, last call to unlock_znode() does node
deletion. Node deletion includes two phases. First all ways to get
references to that znode (sibling and parent links and hash lookup using
block number stored in parent node) should be deleted -- it is done through
sibling_list_remove(), also we assume that nobody uses down link from
parent node due to its nonexistence or proper parent node locking and
nobody uses parent pointers from children due to absence of them. Second we
invalidate all pending lock requests which still are on znode's lock
request queue, this is done by invalidate_lock(). Another JNODE_IS_DYING
znode status bit is used to invalidate pending lock requests. Once it set
all requesters are forced to return -EINVAL from
longterm_lock_znode(). Future locking attempts are not possible because all
ways to get references to that znode are removed already. Last, node is
uncaptured from transaction.
When last reference to the dying znode is just about to be released,
block number for this lock is released and znode is removed from the
hash table.
Now znode can be recycled.
[it's possible to free bitmap block and remove znode from the hash
table when last lock is released. This will result in having
referenced but completely orphaned znode]
6. Limbo
As have been mentioned above znodes with reference counter 0 are
still cached in a hash table. Once memory pressure increases they are
purged out of there [this requires something like LRU list for
efficient implementation. LRU list would also greatly simplify
implementation of coord cache that would in this case morph to just
scanning some initial segment of LRU list]. Data loaded into
unreferenced znode are flushed back to the durable storage if
necessary and memory is freed. Znodes themselves can be recycled at
this point too.
*/