blogger templates blogger widgets
Showing posts with label file system structure. Show all posts
Showing posts with label file system structure. Show all posts
This is part of a list of blog posts.
To browse the contents go to

Hard links and Soft links in Linux

As we saw earlier, Unix files consist of two parts: the data part and the filename part.

The data part is associated with something called an 'inode'. Inode is like an index number that helps is to find where the data is, the file permissions, etc. for the data.
concept of inode in file system

All Unix variants include at least the following attributes, which are specified in the POSIX standard.
  1. File type
  2. Number of hard links associated with the file
  3. File length in bytes
  4. Device ID (i.e., an identifier of the device containing the file)
  5. Inode number that identifies the file within the filesystem
  6. UID of the file owner
  7. User group ID of the file
  8. Timestamps that specify the inode status change time, the last access time, and the last modify time
  9. Access rights (r,w,x values for user, group and others)
  10. File mode (sticky, setuserid, setgroupid)

Hard Links: When more than one file references the same inode entry.
hard links in file system

Soft Links: When the file's data part contains a link/path to another file. The OS recognizes this as a special file and so redirects all open/read/writes to the other file.
soft links in file system

ln command
There are 3 common forms in which ln is used.
$ ln target link_name :create link to target with the link name “link_name” in current directory.
$ ln target :create link to target with the same link name.
$ ln target directory :create link to target (link name same) in the specified directory.

eipe@eipe-system:~$ cat>a
this is a file
eipe@eipe-system:~$ cat a
this is a file
eipe@eipe-system:~$ cd temp
eipe@eipe-system:~/temp$ ln ../a alink
eipe@eipe-system:~/temp$ ls
alink
eipe@eipe-system:~/temp$ cat alink
this is a file


note that the link will be displayed like a normal file. No details that the file is a link is given to the external user.

Hard links have two limitations:
  • It is not possible to create hard links for directories. (to avoid cycles)
  • Links can be created only among files included in the same filesystem
Soft links (Symbolic links) are short files that contain a pathname of another file. The pathname may refer to any file or directory located in any filesystem; it may even refer to a nonexistent file.
The unix command is the same but with -s option.

Example:

eipe@eipe-system:~$ cat>t
thisisafile
eipe@eipe-system:~$ cd temp
eipe@eipe-system:~/temp$ ln ../t thlink
eipe@eipe-system:~/temp$ ls
thlink
eipe@eipe-system:~/temp$ ln -s ../t tslink
eipe@eipe-system:~/temp$ ls
thlink tslink

eipe@eipe-system:~/temp$ ls -la
total 12
drwxr-xr-x 2 eipe eipe 4096 2010-12-03 20:09 .
drwxr-xr-x 48 eipe eipe 4096 2010-12-03 19:38 ..
-rw-r--r-- 2 eipe eipe 6 2010-12-03 14:39 thlink
lrwxrwxrwx 1 eipe eipe 4 2010-12-03 20:09 tslink -> ../t

Note that for soft(symbolic) links the number is 1 but we get a description. Also for '..' the number is 48 meaning there are 48 directories under the parent directory.
How do we know that 2 files are the same (one being a hard link of another)?
Using -i option to print inode numbers

eipe@eipe-system:~$ ls -i -l -a
1055026 -rw-r--r-- 3 eipe eipe 0 2010-12-03 20:14 t
1055062 drwxr-xr-x 2 eipe eipe 4096 2010-12-03 20:09 temp
1704500 drwxr-xr-x 2 eipe eipe 4096 2010-11-24 11:13 Templates
1708831 drwxr-xr-x 2 eipe eipe 4096 2010-11-24 23:49 .themes
1055026 -rw-r--r-- 3 eipe eipe 0 2010-12-03 20:14 thlink


Creating Symlinks the Easy way
The file managers in both GNOME and KDE provide an easy method of creating symbolic links.
  • With GNOME, holding the Ctrl+Shift keys while dragging a file will create a link rather than copying (or moving) the file.
  • With KDE, a small menu appears whenever a file is dropped, offering a choice of
    copying, moving, or linking the file.

File system part II

In unix all files and directories are under a single parent directory called root. It is denoted by forward slash '/'.


The Filesystem Hierarchy Standard (FHS) defines the main directories and their contents in Linux operating systems.

Directory sturcture:

The majority of these directories exist in all UNIX operating systems and are generally used in much the same way; however, the explanations given here are aligned with the FHS standard.

The below table describes everything you need to know about the default directory structure provided by the linux/unix systems.

/bin/ Essential command libraries that are needed in single-user mode (eg:cat, ls)
These commands are accessible to both the admin and other users.
It usually doesn't contain subdirectories.
The following commands, or symbolic links to commands, are required in /bin.
Command Description
cat To concatenate files and print on std output.
chgrp To change file group ownership
chmod To change the file access permission
chown To change the file owner and group
cp To copy files and directories
date To print or set the system date and time
dd To convert and copy a file
df To report file system disk space usage
dmesg To print or control the kernal message buffer. Kernel output goes to the kernel ring buffer and not to stdout, because stdout is ess specific. To inspect messages on the kernel ring buffer, one can use the dmesg utility.
echo To display a line of text
hostname To show or set the system's host name
kill To send signals to processes
ln To link files
login To begin a session on the system
ls To display contents of directory
mkdir To create directories
mknod To make block or character special files
more To page through lengthy text
mount/unmount To mount/unmount a file system
mv To move or rename files
ps To report processes
pwd To print name of current working directory
rm To remove files or directories
rmdir To remove empty directories
sed The sed stream editior
sh The bourne command shell
stty To change and print terminal line settings
su To change user ID
sync To flush filesystem buffers
uname To print system information


/boot/ Boot loader files
/dev/ Location of special or device files. If you do cd /dev and then ls, you'll see a lot of yellow outlined in black. These are the devices that your system uses or can use. Everything is considered a file in Linux, so your hard disk is kept track of as a file that sits there. If you're using an IDE hard drive (as opposed to SCSI), your hard drive will be known as /dev/hda./dev must contain a command named
MAKEDEV, which can create devices as needed. It may also contain a MAKEDEV.local for any local devices.
/etc/ It contains configuration files. A "configuration file" is a local file used to control the operation of a program; it must be static and cannot be an executable binary .
/etc/opt/ Configuration files for /opt
/etc/X11/ Configuration for the X Window system. This directory is necessary to allow local control if /usr is mounted read only.
/etc/sgml/ Configuration for SGML
/etc/xml/ Configuration for XML
/home/ home is a fairly standard concept, but it is clearly a site-specific filesystem. In normal configurations, each user is given a directory in /home.
User specific configuration files for applications are stored in the user’s home directory in a file that starts with the ’.’ character (a "dot file"). If an application needs to create more than one dot file then they should be placed
in a subdirectory with a name starting with a ’.’ character, (a "dot directory"). In this case the configuration files should not start with the ’.’ character. To view the files cd /home and then ls -a.
/lib/ Libraries needed for system boot and for commands in /bin and /sbin. If loadable kernal modules (LKM) are used then they are put in /lib/modules
/media/ On modern Linux systems the /media directory will
contain the mount points for removable media such USB
drives, CD-ROMs, etc. that are mounted automatically at
insertion.
/mnt/ On older Linux systems, the /mnt directory contains mount
points for removable devices that have been mounted
manually.
/opt/ /opt is reserved for the installation of add-on application (mostly commercial) software packages.
/proc/ The /proc filesystem (is a special type of filesystem) was originally developed to provide information on the processes in a system. But given the filesystem's usefulness, many elements of the kernel use it both to report information and to enable dynamic runtime configuration.
The /proc filesystem contains directories (named by processid, each representing a process) and virtual files (used to transfer information from the kernal to the user and vice versa.
http://www.ibm.com/developerworks/linux/library/l-proc.html
/root/ Home directory for the root user (optional)
/sbin/ Utilities used for system administration (and other root-only commands) are stored in
/sbin,
/usr/sbin, and
/usr/local/sbin.
/sbin contains binaries essential for booting, restoring, recovering, and/or repairing the system in addition to the binaries in /bin.
Programs executed after /usr is successfully mounted are generally placed into /usr/sbin.
Locally-installed system administration programs should be placed into /usr/local/sbin.
Common cmds found under sbin are: shutdown, reboot, fdisk, fsck, halt, init, etc.
Difference between sbin and bin: he division between /bin and /sbin was not created for security reasons or to hide the OS, but to provide a good partition between binaries that everyone uses and ones that are primarily used for administration tasks. Though for certain cmds in sbin execute permission is given only to the administrator.
/srv/ Data for services provided by this system
/tmp/ The /tmp directory must be made available for programs that require temporary files.
Programs must not assume that any files or directories in /tmp are preserved between invocations of the program.
/usr/ Secondary hierarchy for read-only user data; contains the majority of (multi-)user utilities and applications.
/usr/bin/ Non-essential command binaries (not needed in single user mode); for all users.
/usr/include/ This is where all of the system’s general-use include files (header files) for the C programming language should be placed.
/usr/lib/ Libraries for the binaries in /usr/bin/ and /usr/sbin/.
/usr/sbin/ Programs executed after /usr is successfully mounted are generally placed into /usr/sbin. It mostly contains cmds for mounting, repair, recovery,etc.
/usr/share/ The /usr/share hierarchy is for all read-only architecture independent data files.
This hierarchy is intended to be shareable among all architecture platforms of a given OS; thus, for example, a site with i386, Alpha, and PPC platforms might maintain a single /usr/share directory that is centrally-mounted.
Note, however, that /usr/share is generally not intended to be shared by different OSes or by different releases of the same OS.
/usr/share/man : Manual pages
/usr/share/doc Most packages installed on the system will include some kind of documentation. In /usr/share/doc, we will
find documentation files organized by package.
/usr/src/ Source code may be place placed in this subdirectory, only for reference purposes.
/usr/X11R6 This hierarchy is reserved for the X Window System, version 11 release 6, and related files.
/usr/local/ The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. Programs compiled from source code
are normally installed in /usr/local/bin. On a newly
installed Linux system, this tree exists, but it will be empty
until the system administrator puts something in it.
/var/ /var contains variable data files. This includes spool directories and files, administrative and logging data, and
transient and temporary files.
/var/cache/ /var/cache is intended for cached data from applications. Such data is locally generated as a result of time-consuming I/O or calculation.
/var/lib/ This hierarchy holds state information pertaining to an application or the system. State information is data that programs modify while they run, and that pertains to one specific host.
/var/lock/ Lock files for devices and other resources shared by multiple applications .
/var/log/ This directory contains miscellaneous log files. Most logs must be written to this directory or an appropriate subdirectory.
/var/mail/ The mail spool must be accessible through /var/mail and the mail spool files must take the form <username>.
/var/run/ This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process.
/var/spool/ contains data which is awaiting some kind of later processing. Data in /var/spool represents work to be done in the future (by a program, user, or administrator); often data is deleted after it has been processed.
/var/tmp/ Temporary files preserved between system reboots
/lost+found Each formatted partition or device using a Linux file system,
such as ext3, will have this directory. It is used in the case
of a partial recovery from a file system corruption event.
Unless something really bad has happened to your system,
this directory will remain empty.

Click here to read File system part III

File system part I

Need for File System (or Data Storage)
  1. A process can only utilize a limited amount of memory space, it's – virtual address space. The size of virtual address space is limited. For storing huge amounts of information a file system implemented on a storage device is necessary.
  2. When a process terminates, the information within it's address space is lost. To retain information we need file system.
  3. When information needs to be accessed by multiple processes (or applications) we need a file system. Because for a particular process it's address space is protected and is accessible only by that process.

Concept of File

Files are logical units of information created by processes. It's a container structured as a sequence of bytes.
Files are used to create an abstraction. It is to hide from the user the details of how and where the information is stored.

File Naming

Most of the file systems support names as long as 255 characters. Note that Unix distinguishes between upper and lower cases but MS-DOS don't.

File names basically consists of 2 parts: name and extension, both separated by a period.

Eg: fun.doc

The extensions and the name are for the convenience of the user and are not enforced by the OS. (Note:the kernel does not interpret the contents of a file, to the kernal it's just bytes) But certain applications and compilers insist the files they deal with are in a specific format.

Eg: In unix, a file name john.txt could be anything and doesn't convey any actual information to the computer.
It's not the case in Windows, windows take extensions seriously. Some of the Linux variants allow Users to register extensions with OS and assign programs to that extension.

File Types

The most common types of files are:

  1. Regular files
    The most common type of file. It could be text or binary, unix doesn't make any distinction between the two. The interpretation of the contents of a regular file is left to the application that is processing the file. Denoted as: -









  2. Directories
    It's a file that contains a list of file names and pointers to it's location. Denoted as: d









  3. Character special files
    It's a file that is used for unbuffered I/O access (in variable-size units) to devices. Eg: terminal, modem. Though it acts as a interface for a device driver, it appears in a file system as if it were an ordinary file. Denoted as: c









  4. Block special files
    It's a file that is used for buffered I/O access (in fixed-size units) to devices. Eg: disk drives, CD-ROM. Denoted as: b









  5. Symbolic link files
    It's a type of file that points to another file. Notice that with symbolic links, the remaining file attributes are always “rwxrwxrwx” and are dummy values. The real file attributes are those of the file the symbolic link points to. Denoted as: l









  6. Pipe and named pipe
    It's a type of file that is used for communication between processes. It's also called as FIFO. Denoted as: p









  7. Socket
    It's a file that is used for network communication between processes. Denoted as: s








Windows supports 1 and 2. Unix has all 7.
Regular files are of 2 types:
ASCII files:
These contain lines of text.
They can be easily displayed and printed.
Binary files:
Impossible to understand if printed or displayed.
They are understood only by programs that understand their internal structure. For eg: an archive is understood by programs like 7zip, winrar, tar, etc. because these files are created by programs like the same.

Note:
Every OS must recognize at least one type of file: it's own executable file.

Unix architecture

unix architecture diagram


The Unix File system Layout

unix file system diagram


Let's not go into the hardware implementation details.  View the Tutorial page for the list of Unix tutorials.

Click here to read File system part II