What is a File?
File are collection of data
items stored on disk. Or, it's device which can store the
information, data, music (mp3 files), picture, movie, sound, book
etc. In fact what ever you store in computer it must be inform of
file. Files are always associated with devices like hard disk ,floppy
disk etc. File is the last object in your file system tree. See
Linux/UNIX
- rules for naming file and directory namesᄃ.
What is a directory?
Directory is group of files.
Directory is divided into two types:
- Root directory - Strictly speaking, there is only one root directory in your system, which is denoted by / (forward slash). It is root of your entire file system and can not be renamed or deleted.
- Sub directory - Directory under root (/) directory is subdirectory which can be created, renamed by the user.
Directories are used to
organize your data files, programs more efficiently.
Linux supports numerous
file system types
- Ext2: This is like UNIX file system. It has the concepts of blocks, inodes and directories.
- Ext3: It is ext2 filesystem enhanced with journalling capabilities. Journalling allows fast file system recovery. Supports POSIX ACL (Access Control Lists).
- Isofs (iso9660): Used by CDROM file system.
- Sysfs: It is a ram-based filesystem initially based on ramfs. It is use to exporting kernel objects so that end user can use it easily.
- Procfs: The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime using sysctl command. For example you can find out cpuinfo with following command:
# cat /proc/cpuinfo
- Or you can enable or disable routing/forwarding of IP packets between interfaces with following command:
# cat
/proc/sys/net/ipv4/ip_forward
# echo "1" > /proc/sys/net/ipv4/ip_forward
# echo "0" > /proc/sys/net/ipv4/ip_forward
# echo "1" > /proc/sys/net/ipv4/ip_forward
# echo "0" > /proc/sys/net/ipv4/ip_forward
- NFS: Network file system allows many users or systems to share the same files by using a client/server methodology. NFS allows sharing all of the above file system.
- Linux also supports Microsoft NTFS, vfat, and many other file systems. See Linux kernel source tree Documentation/filesystem directory for list of all supported filesystem.
You can find out what type of
file systems currently mounted with mount command:
$ mount
OR
$ cat /proc/mounts
$ mount
OR
$ cat /proc/mounts
What is a UNIX/Linux File
system?
A UNIX file system is a
collection of files and directories stored. Each file system is
stored in a separate whole disk partition. The following are a few of
the file system:
- / - Special file system that incorporates the files under several directories including /dev, /sbin, /tmp etc
- /usr - Stores application programs
- /var - Stores log files, mails and other data
- /tmp - Stores temporary files
But what is in a File
system?
Again file system divided into
two categories:
- User data - stores actual data contained in files
- Metadata - stores file system structural information such as superblock, inodes, directories
Unix / Linux filesystem
blocks
The blocks used for two
different purpose:
- Most blocks stores user data aka files (user data).
- Some blocks in every file system store the file system's metadata. So what the hell is a metadata?
In simple words Metadata
describes the structure of the file system. Most common metadata
structure are superblock, inode and directories. Following paragraphs
describes each of them.
Superblock
Each file system is different
and they have type like ext2, ext3 etc. Further each file system has
size like 5 GB, 10 GB and status such as mount status. In short each
file system has a superblock, which contains information about file
system such as:
- File system type
- Size
- Status
- Information about other metadata structures
If this information lost, you
are in trouble (data loss) so Linux maintains multiple redundant
copies of the superblock in every file system. This is very important
in many emergency situation, for example you can use backup copies to
restore damaged primary super block. Following command displays
primary and backup superblock location on /dev/sda3:
# dumpe2fs /dev/hda3 | grep -i superblock
Output:
# dumpe2fs /dev/hda3 | grep -i superblock
Output:
Primary superblock at 0, Group
descriptors at 1-1
Backup superblock at 32768,
Group descriptors at 32769-32769
Backup superblock at 98304,
Group descriptors at 98305-98305
Backup superblock at 163840,
Group descriptors at 163841-163841
Backup superblock at 229376,
Group descriptors at 229377-229377
Backup superblock at 294912,
Group descriptors at 294913-294913
Surviving a Linux
Filesystem Failures
* Mistakes by Linux/UNIX Sys
admin
* Buggy device driver or utilities (especially third party utilities)
* Power outage (very rarer on production system) due to UPS failure
* Kernel bugs (that is why you don't run latest kernel on production Linux/UNIX system, most of time you need to use stable kernel release)
* Buggy device driver or utilities (especially third party utilities)
* Power outage (very rarer on production system) due to UPS failure
* Kernel bugs (that is why you don't run latest kernel on production Linux/UNIX system, most of time you need to use stable kernel release)
Due to filesystem failure:
- File system will refuse to mount
- Entire system get hangs
- Even if filesystem mount operation result into success, users may notice strange behavior when mounted such as system reboot, gibberish characters in directory listings etc
So how the hell you are gonna
Surviving a Filesystem Failures? Most of time fsck (front end to
ext2/ext3 utility) can fix the problem, first simply run e2fsck - to
check a Linux ext2/ext3 file system (assuming /home [/dev/sda3
partition] filesystem for demo purpose), first unmount /dev/sda3 then
type following command :
# e2fsck -f /dev/sda3
Where,
# e2fsck -f /dev/sda3
Where,
- -f : Force checking even if the file system seems clean.
Please note that If the
superblock
is not foundᄃ,
e2fsck will terminate with a fatal error. However Linux maintains
multiple redundant copies of the superblock in every file system, so
you can use -b {alternative-superblock} option to get rid of this
problem. The location of the backup superblock is dependent on the
filesystem's blocksize:
- For filesystems with 1k blocksizes, a backup superblock can be found at block 8193
- For filesystems with 2k blocksizes, at block 16384
- For 4k blocksizes, at block 32768.
Tip you can also try any one
of the following command(s) to determine alternative-superblock
locations:
# mke2fs -n /dev/sda3
OR
# dumpe2fs /dev/sda3|grep -i superblock
To repair file system by alternative-superblock use command as follows:
# e2fsck -f -b 8193 /dev/sda3
# mke2fs -n /dev/sda3
OR
# dumpe2fs /dev/sda3|grep -i superblock
To repair file system by alternative-superblock use command as follows:
# e2fsck -f -b 8193 /dev/sda3
However it is highly
recommended that you make backup before you run fsck command on
system, use dd command to create a backup (provided that you have
spare space under /disk2)
# dd if=/dev/sda2 of=/disk2/backup-sda2.img
# dd if=/dev/sda2 of=/disk2/backup-sda2.img
Understanding UNIX / Linux
filesystem Inodes
The inode
(index node) is a
fundamental concept in the Linux and UNIX filesystem. Each object in
the filesystem is represented by an inode. But what are the objects?
Let us try to understand it in simple words. Each and every file
under Linux (and UNIX) has following attributes:
=> File type (executable,
block special etc)
=> Permissions (read, write etc)
=> Owner
=> Group
=> File Size
=> File access, change and modification time (remember UNIX or Linux never stores file creation time, this is favorite question asked in UNIX/Linux sys admin job interview)
=> File deletion time
=> Number of links (soft/hard)
=> Extended attribute such as append only or no one can delete fileᄃ including root user (immutability)ᄃ
=> Access Control List (ACLs)
=> Permissions (read, write etc)
=> Owner
=> Group
=> File Size
=> File access, change and modification time (remember UNIX or Linux never stores file creation time, this is favorite question asked in UNIX/Linux sys admin job interview)
=> File deletion time
=> Number of links (soft/hard)
=> Extended attribute such as append only or no one can delete fileᄃ including root user (immutability)ᄃ
=> Access Control List (ACLs)
All the above information
stored in an inode. In short the inode identifies the file and its
attributes (as above) . Each inode is identified by a unique inode
number within the file system. Inode is also know as index number.
inode definition
An inode is a data structure
on a traditional Unix-style file system such as UFS or ext3. An inode
stores basic information about a regular file, directory, or other
file system object.
How do I see file inode
number?
You can use ls -i command to
see inode number of file
$ ls -i /etc/passwd
Sample Output
$ ls -i /etc/passwd
Sample Output
32820 /etc/passwd
You can also use stat command
to find out inode number and its attribute:
$ stat /etc/passwdOutput:
$ stat /etc/passwdOutput:
File: `/etc/passwd'
Size: 1988 Blocks:
8 IO Block: 4096 regular file
Device: 341h/833d Inode:
32820 Links: 1
Access: (0644/-rw-r--r--)
Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2005-11-10
01:26:01.000000000 +0530
Modify: 2005-10-27
13:26:56.000000000 +0530
Change: 2005-10-27
13:26:56.000000000 +0530
Inode application
Many commands used by system
administrators in UNIX / Linux operating systems often give inode
numbers to designate a file. Let us see he practical application of
inode number. Type the following commands:
$ cd /tmp
$ touch \"la*
$ ls -l
$ cd /tmp
$ touch \"la*
$ ls -l
Now try to remove file "la*
Understanding UNIX / Linux
filesystem directories
You use DNS (domain name
system) to translate between domain names and IP addresses.
Similarly files are referred
by file name, not by inode number. So what is the purpose of a
directory? You can groups the files according to your usage. For
example all configuration files are stored under /etc directory. So
the purpose of a directory is to make a connection between file names
and their associated inode number. Inside every directory you will
find out two directories .
(current directory) and ..
(pointer to previous directory i.e. the directory immediately above
the one I am in now). The ..
appears in every directory except for the root directory.
Directory
A directory contained inside
another directory is called a subdirectory. At the end the
directories form a tree structure. Use tree command to see directory
tree structure:
$ tree /etc | less
Again a directory has an inode just like a file. It is a specially formatted file containing records which associate each name with an inode number. Please note the following limitation of directories under ext2/3 file system:
$ tree /etc | less
Again a directory has an inode just like a file. It is a specially formatted file containing records which associate each name with an inode number. Please note the following limitation of directories under ext2/3 file system:
- There is an upper limit of 32768 subdirectories in a single directory.
- There is a "soft" upper limit of about 10-15k files in a single directory
However
according to official documentation of ext2/3 file system points that
“Using a hashed directory index (which is under development) allows
100k-1M+ files in a single directory without performance problems'.
Here are my two favorite alias commands related to directory :
$ alias ..='cd ..'
alias d='ls -l | grep -E "^d"'
$ alias ..='cd ..'
alias d='ls -l | grep -E "^d"'
Understanding UNIX / Linux
symbolic (soft) and hard links
Inodes are associated with
precisely one directory entry at a time. However, with hard links it
is possible to associate multiple directory entries with a single
inode. To create a hard link use ln command as follows:
# ln /root/file1 /root/file2
# ls -l
Above commands create a link to file1. Symbolic links refer to:
# ln /root/file1 /root/file2
# ls -l
Above commands create a link to file1. Symbolic links refer to:
A symbolic path indicating the
abstract location of another file.
Hard links refer to:
The specific location of
physical data.
Hard link vs. Soft link in
Linux or UNIX
- Hard links cannot link directories.
- Cannot cross file system boundaries.
Soft or symbolic links are
just like hard links. It allows to associate multiple filenames with
a single file. However, symbolic links allows:
- To create links between directories.
- Can cross file system boundaries.
These links behave differently
when the source of the link is moved or removed.
- Symbolic links are not updated.
- Hard links always refer to the source, even if moved or removed.
How do I create symbolic
link?
You can create symbolic link
with ln command:
$ ln -s /path/to/file1.txt /path/to/file2.txt
$ ls -ali
Above command will create a symbolic link to file1.txt.
$ ln -s /path/to/file1.txt /path/to/file2.txt
$ ls -ali
Above command will create a symbolic link to file1.txt.
Task: Symbolic link
creation and deletion
Let us create a directory
called foo, enter:
$ mkdir foo
$ cd foo
Copy /etc/resolv.conf file, enter:
$ cp /etc/resolv.conf .
View inode number, enter:
$ ls -ali
Sample output:
$ mkdir foo
$ cd foo
Copy /etc/resolv.conf file, enter:
$ cp /etc/resolv.conf .
View inode number, enter:
$ ls -ali
Sample output:
total 152
1048600 drwxr-xr-x 2 vivek
vivek 4096 2008-12-09 20:19 .
1015809 drwxrwxrwt 220 root
root 143360 2008-12-09 20:19 ..
1048601 -rwxr-xr-x 1 vivek
vivek 129 2008-12-09 20:19 resolv.conf
Now create soft link to
resolv.conf, enter:
$ ln -s resolv.conf alink.conf
$ ls -ali
Sample output:
$ ln -s resolv.conf alink.conf
$ ls -ali
Sample output:
total 152
1048600 drwxr-xr-x 2 vivek
vivek 4096 2008-12-09 20:24 .
1015809 drwxrwxrwt 220 root
root 143360 2008-12-09 20:19 ..
1048602 lrwxrwxrwx 1 vivek
vivek 11 2008-12-09 20:24 alink.conf -> resolv.conf
1048601 -rwxr-xr-x 1 vivek
vivek 129 2008-12-09 20:19 resolv.conf
The reference count of the
directory has not changed (total 152). Our symbolic (soft) link is
stored in a different inode than the text file (1048602). The
information stored in resolv.conf is accessible through the
alink.conf file. If we delete the text file resolv.conf, alink.conf
becomes a broken link and our data is lost:
$ rm resolv.conf
$ ls -ali
$ rm resolv.conf
$ ls -ali
Why isn’t it possible to
create hard links across file system boundaries?
A single inode number use to
represent file in each file system. All hard links based upon inode
number.
So linking across file system
will lead into confusing
references for UNIX
or Linux. For example, consider following scenario
* File system: /home
* Directory: /home/vivek
* Hard link: /home/vivek/file2
* Original file: /home/vivek/file1
* Directory: /home/vivek
* Hard link: /home/vivek/file2
* Original file: /home/vivek/file1
Now you create a hard link as
follows:
$ touch file1
$ ln file1 file2
$ ls -l
Output:
$ touch file1
$ ln file1 file2
$ ls -l
Output:
-rw-r--r-- 2 vivek vivek 0
2006-01-30 13:28 file1
-rw-r--r-- 2 vivek vivek 0
2006-01-30 13:28 file2
Now just see inode of both
file1 and file2:
$ ls -i file1
782263
$ ls -i file2
$ ls -i file1
782263
$ ls -i file2
The Linux file system
architecture is an interesting example of abstracting complexity.
Using a common set of API functions, a large variety of file systems
can be supported on a large variety of storage devices. Take, for
example, the read function call, which allows some number of bytes to
be read from a given file descriptor. The read function is unaware of
file system types, such as ext3 or NFS. It is also unaware of the
particular storage medium upon which the file system is mounted, such
as AT Attachment Packet Interface (ATAPI) disk, Serial-Attached SCSI
(SAS) disk, or Serial Advanced Technology Attachment (SATA) disk.
Yet, when the read function is called for an open file, the data is
returned as expected. This article explores how this is done and
investigates the major structures of the Linux file system layer.
Back
to topᄃ
I'll start with an answer to
the most basic question, the definition of a file system. A file
system is an organization of data and metadata on a storage device.
With a vague definition like that, you know that the code required to
support this will be interesting. As I mentioned, there are many
types of file systems and media. With all of this variation, you can
expect that the Linux file system interface is implemented as a
layered architecture, separating the user interface layer from the
file system implementation from the drivers that manipulate the
storage devices.
File systems as protocols
Another way to think about a
file system is as a protocol. Just as network protocols (such as IP)
give meaning to the streams of data traversing the Internet, file
systems give meaning to the data on a particular storage medium.
Associating a file system to a
storage device in Linux is a process called mounting.
The mount command is used to attach a file system to the current file
system hierarchy (root). During a mount, you provide a file system
type, a file system, and a mount point.
To illustrate the capabilities
of the Linux file system layer (and the use of mount), create a file
system in a file within the current file system. This is accomplished
first by creating a file of a given size using dd (copy a file using
/dev/zero as the source) -- in other words, a file initialized with
zeros, as shown in Listing 1.
$
dd if=/dev/zero
of=file.img bs=1k count=10000
10000+0
records in
10000+0
records out
$
|
You now have a file called
file.img that's 10MB. Use the losetup command to associate a loop
device with the file (making it look like a block device instead of
just a regular file within the file system):
$
losetup
/dev/loop0 file.img
$
|
With the file now appearing as
a block device (represented by /dev/loop0), create a file system on
the device with mke2fs. This command creates a new second ext2 file
system of the defined size, as shown in Listing 2.
$
mke2fs -c
/dev/loop0 10000
mke2fs
1.35 (28-Feb-2004)
max_blocks
1024000, rsv_groups = 1250, rsv_gdb = 39
Filesystem
label=
OS
type: Linux
Block
size=1024 (log=0)
Fragment
size=1024 (log=0)
2512
inodes, 10000 blocks
500
blocks (5.00%) reserved for the super user
...
$
|
The file.img file, represented
by the loop device (/dev/loop0), is now mounted to the mount point
/mnt/point1 using the mount command. Note the specification of the
file system as ext2. When mounted, you can treat this mount point as
a new file system by doing using an ls command, as shown in Listing
3.
$
mkdir /mnt/point1
$
mount -t ext2
/dev/loop0 /mnt/point1
$
ls /mnt/point1
lost+found
$
|
As shown in Listing 4, you can
continue this process by creating a new file within the new mounted
file system, associating it with a loop device, and creating another
file system on it.
$
dd if=/dev/zero
of=/mnt/point1/file.img bs=1k count=1000
1000+0
records in
1000+0
records out
$
losetup
/dev/loop1 /mnt/point1/file.img
$
mke2fs -c
/dev/loop1 1000
mke2fs
1.35 (28-Feb-2004)
max_blocks
1024000, rsv_groups = 125, rsv_gdb = 3
Filesystem
label=
...
$
mkdir /mnt/point2
$
mount -t ext2
/dev/loop1 /mnt/point2
$
ls /mnt/point2
lost+found
$
ls /mnt/point1
file.img
lost+found
$
|
From this simple
demonstration, it's easy to see how powerful the Linux file system
(and the loop device) can be. You can use this same approach to
create encrypted file systems with the loop device on a file. This is
useful to protect your data by transiently mounting your file using
the loop device when needed.
Back
to topᄃ
Now that you've seen file
system construction in action, I'll get back to the architecture of
the Linux file system layer. This article views the Linux file system
from two perspectives. The first view is from the perspective of the
high-level architecture. The second view digs in a little deeper and
explores the file system layer from the major structures that
implement it.
Back
to topᄃ
While the majority of the file
system code exists in the kernel (except for user-space file systems,
which I'll note later), the architecture shown in Figure 1 shows the
relationships between the major file system- related components in
both user space and the kernel.
User space contains the
applications (for this example, the user of the file system) and the
GNU C Library (glibc), which provides the user interface for the file
system calls (open, read, write, close). The system call interface
acts as a switch, funneling system calls from user space to the
appropriate endpoints in kernel space.
The VFS is the primary
interface to the underlying file systems. This component exports a
set of interfaces and then abstracts them to the individual file
systems, which may behave very differently from one another. Two
caches exist for file system objects (inodes and dentries), which
I'll define shortly. Each provides a pool of recently-used file
system objects.
Each individual file system
implementation, such as ext2, JFS, and so on, exports a common set of
interfaces that is used (and expected) by the VFS. The buffer cache
buffers requests between the file systems and the block devices that
they manipulate. For example, read and write requests to the
underlying device drivers migrate through the buffer cache. This
allows the requests to be cached there for faster access (rather than
going back out to the physical device). The buffer cache is managed
as a set of least recently used (LRU) lists. Note that you can use
the sync command to flush the buffer cache out to the storage media
(force all unwritten data out to the device drivers and,
subsequently, to the storage device).
What is a block device?
A block device is one in which
the data that moves to and from it occurs in blocks (such as disk
sectors) and supports attributes such as buffering and random access
behavior (is not required to read blocks sequentially, but can access
any block at any time). Block devices include hard drives, CD-ROMs,
and RAM disks. This is in contrast to character devices, which differ
in that they do not have a physically-addressable media. Character
devices include serial ports and tape devices, in which data is
streamed character by character.
That's the 20,000-foot view of
the VFS and file system components. Now I'll look at the major
structures that implement this subsystem.
Linux views all file systems
from the perspective of a common set of objects. These objects are
the superblock, inode, dentry, and file. At the root of each file
system is the superblock, which describes and maintains state for the
file system. Every object that is managed within a file system (file
or directory) is represented in Linux as an inode. The inode contains
all the metadata to manage objects in the file system (including the
operations that are possible on it). Another set of structures,
called dentries, is used to translate between names and inodes, for
which a directory cache exists to keep the most-recently used around.
The dentry also maintains relationships between directories and files
for traversing file systems. Finally, a VFS file represents an open
file (keeps state for the open file such as the write offset, and so
on).
The VFS acts as the root level
of the file-system interface. The VFS keeps track of the
currently-supported file systems, as well as those file systems that
are currently mounted.
File systems can be
dynamically added or removed from Linux using a set of registration
functions. The kernel keeps a list of currently-supported file
systems, which can be viewed from user space through the /proc file
system. This virtual file also shows the devices currently associated
with the file systems. To add a new file system to Linux,
register_filesystem is called. This takes a single argument defining
the reference to a file system structure (file_system_type), which
defines the name of the file system, a set of attributes, and two
superblock functions. A file system can also be unregistered.
Registering a new file system
places the new file system and its pertinent information onto a
file_systems list (see Figure 2 and linux/include/linux/mount.h).
This list defines the file systems that can be supported. You can
view this list by typing cat /proc/filesystems at the command line.
Another structure maintained
in the VFS is the mounted file systems (see Figure 3). This provides
the file systems that are currently mounted (see
linux/include/linux/fs.h). This links to the superblock structure,
which I'll explore next.
The superblock is a structure
that represents a file system. It includes the necessary information
to manage the file system during operation. It includes the file
system name (such as ext2), the size of the file system and its
state, a reference to the block device, and metadata information
(such as free lists and so on). The superblock is typically stored on
the storage medium but can be created in real time if one doesn't
exist. You can find the superblock structure (see Figure 4) in
./linux/include/linux/fs.h.
One important element of the
superblock is a definition of the superblock operations. This
structure defines the set of functions for managing inodes within the
file system. For example, inodes can be allocated with alloc_inode or
deleted with destroy_inode. You can read and write inodes with
read_inode and write_inode or sync the file system with sync_fs. You
can find the super_operations structure in
./linux/include/linux/fs.h. Each file system provides its own inode
methods, which implement the operations and provide the common
abstraction to the VFS layer.
The inode represents an object
in the file system with a unique identifier. The individual file
systems provide methods for translating a filename into a unique
inode identifier and then to an inode reference. A portion of the
inode structure is shown in Figure 5 along with a couple of the
related structures. Note in particular the inode_operations and
file_operations. Each of these structures refers to the individual
operations that may be performed on the inode. For example,
inode_operations define those operations that operate directly on the
inode and file_operations refer to those methods related to files and
directories (the standard system calls).
The most-recently used inodes
and dentries are kept in the inode and directory cache respectively.
Note that for each inode in the inode cache there is a corresponding
dentry in the directory cache. You can find the inode and dentry
structures defined in ./linux/include/linux/fs.h.
Except for the individual file
system implementations (which can be found at ./linux/fs), the bottom
of the file system layer is the buffer cache. This element keeps
track of read and write requests from the individual file system
implementations and the physical devices (through the device
drivers). For efficiency, Linux maintains a cache of the requests to
avoid having to go back out to the physical device for all requests.
Instead, the most-recently used buffers (pages) are cached here and
can be quickly provided back to the individual file systems.
Back
to topᄃ
This article spent no time
exploring the individual file systems that are available within
Linux, but it's worth note here, at least in passing. Linux supports
a wide range of file systems, from the old file systems such as
MINIX, MS-DOS, and ext2. Linux also supports the new journaling file
systems such as ext3, JFS, and ReiserFS. Additionally, Linux supports
cryptographic file systems such as CFS and virtual file system such
as /proc.
One final file system worth
noting is the Filesystem in Userspace, or FUSE. This is an
interesting project that allows you to route file system requests
through the VFS back into user space. So if you've ever toyed with
the idea of creating your own file system, this is a great way to
start.
No comments:
Post a Comment