Copy-on-write in LINUX -OS
Copy-on-write (sometimes referred to as "COW"), sometimes referred to as implicit sharing, is an optimization strategy used in computer programming. Copy-on-write stems from the understanding that when multiple separate tasks use initially identical copies of some information (i.e., data stored in computer memory or disk storage), treating it as local data, each task working on its own "copy of the data", that they may occasionally need to modify, then it is not necessary to immediately create separate copies of that information for each task. Instead they can all be given pointers to the same resource, with the provision that on the first occasion where they need to modify the data, only then must they first create a local copy on which to perform the modification (the original resource remains unchanged). By sharing resources this way it is possible to make significant resource savings in cases there are many separate processes all using the same resource, each with a small likelihood of having to modify it at all. Copy-on-write is the name given to the policy that whenever a task attempts to make a change to the shared information, it should first create a separate (private) copy of that information to prevent its changes from becoming visible to all the other tasks. If this policy is enforced by the operating system kernel, then the fact of being given a reference to shared information rather than a private copy can be transparent to all tasks, whether they need to modify the information or not.[1]
Copy-on-write filesystem
To address the problems associated with existing disk filesystems, the Power-Safe filesystem never overwrites live data; it does all updates using copy-on-write (COW), assembling a new view of the filesystem in unused blocks on the disk. The new view of the filesystem becomes "live" only when all the updates are safely written on the disk. Everything is COW: both metadata and user data are protected.
To see how this works, let's consider how the data is stored. A Power-Safe filesystem is divided into logical blocks, the size of which you can specify when you use mkqnx6fs to format the filesystem. Each inode includes 16 pointers to blocks. If the file is smaller than 16 blocks, the inode points to the data blocks directly. If the file is any bigger, those 16 blocks become pointers to more blocks, and so on.
The final block pointers to the real data are all in the leaves and are all at the same level. In some other filesystems—such as EXT2—a file always has some direct blocks, some indirect ones, and some double indirect, so you go to different levels to get to different parts of the file. With the Power-Safe filesystem, all the user data for a file is at the same level.
If you change some data, it's written in one or more unused blocks, and the original data remains unchanged. The list of indirect block pointers must be modified to refer to the newly used blocks, but again the filesystem copies the existing block of pointers and modifies the copy. The filesystem then updates the inode—once again by modifying a copy—to refer to the new block of indirect pointers. When the operation is complete, the original data and the pointers to it remain intact, but there's a new set of blocks, indirect pointers, and inode for the modified data:
This has several implications for the COW filesystem:
- The bitmap and inodes are treated in the same way as user files.
- Any filesystem block can be relocated, so there aren't any fixed locations, such as those for the root block or bitmap in the QNX 4 filesystem
- The filesystem must be completely self-referential.
A superblock is a global root block that contains the inodes for the system bitmap and inodes files. A Power-Safe filesystem maintains two superblocks:
- a stable superblock that reflects the original version of all the blocks
- a working superblock that reflects the modified data
The working superblock can include pointers to blocks in the stable superblock. These blocks contain data that hasn't yet been modified. The inodes and bitmap for the working superblock grow from it.
A snapshot is a consistent view of the filesystem (simply a committed superblock). To take a snapshot, the filesystem:
- Locks the filesystem to make sure that it's in a stable state; all client activity is suspended, and there must be no active operations.
- Writes all the copied blocks to disk. The order isn't important (as it is for the QNX 4 filesystem), so it can be optimized.
- Forces the data to be synchronized to disk, including flushing any hardware track cache.
- Constructs the superblock, recording the new location of the bitmap and inodes, incrementing its sequence number, and calculating a CRC.
- Writes the superblock to disk.
- Switches between the working and committed views. The old versions of the copied blocks are freed and become available for use.
To mount the disk at startup, the filesystem simply reads the superblocks from disk, validates their CRCs, and then chooses the one with the higher sequence number. There's no need to run chkfsys or replay a transaction log. The time it takes to mount the filesystem is the time it takes to read a couple of blocks
http://www.cis.upenn.edu/~jms/cw-fork.pdf
No comments:
Post a Comment