Introduction to design of file-systems

Table of Contents

1 Introduction

This document contains information related to design of file-systems. This is created to help as supporting slides or manual for a guest lecture to explain how file systems are designed.

2 Linux basics

For various demos we will be using a Linux machine. Some Linux basics are required to understand demos properly.

2.1 Device names

Linux operating systems have devices with names /dev/sda, /dev/sdb, etc. for serial block storage devices.

During the lecture we will use a pen drive (the smaller the better) for demo. The internal hard-disk of demo laptop is /dev/sda. The pen drive will automatically become /dev/sdb. Hence most of our demo will use device /dev/sdb

Please note that Linux has special virtual filesystems /sys, /dev and /proc. These folders do not contain real file or folders. These folders contain devices or special files which allow us to get or even change state of various systems (Power, Processes, Underlying hardware, etc.)

2.2 dd command

dd command can read blocks of data from a input device and write those blocks to output device. Simple dd usage is:

dd if=<input-file-or-device> <output-file-or-device>

This way we can use dd to copy a file a.txt to b.txt using:

dd if=a.txt of=b.txt

We can also use dd to copy all contents of a device (say /dev/sdb pen drive in our case) to a file

dd if=/dev/sdb of=pen-drive-image01.raw

This is useful is creating iso images from CD/DVDs. If CD/DVD device name is /dev/sr0, then we can create CD/DVD iso image using:

dd if=/dev/sr0 of=dvd.iso

2.3 /dev/zero device

There is special device /dev/zero. We can read infinite number of zeros from this device. This is useful for writing zeros to a file or device (in our case 1GB pen drive).

We need this as normal file-system creation or file deletion does not zeros all inode or data blocks (Will learn more about them in this lecture). Hence we can zero the device and then start our experiments to have a clean device without any junk/random values to work with.

2.4 xxd command

xxd command can read a normal file or a device and give its hexdump. We can use -s option to specify seek (From where the reading should start). We can use -l (length) to specify number of bytes to read.

2.5 Pipe operator (|)

We can use pipe operator (|) to take output from one command and give it to other command as input

2.6 more

We can use more to see information one page at a time.

3 Filesystems basics

We will first look at an a simple file-system (FAT). It has various variants 12-bit, 16-bit, Fat32, vfat, exfat etc. We will look at Fat32 file-systems with VFAT extension for purpose of our discussion to avoid confusion. This is the most common fat filesystem in use for small (<32 GB) size disks.

3.1 File Allocation Table

Similar to the way index or contents are used to store information about chapters or topics of book. FAT stores information about which files and directories are present and where is the data stored in these structures. Actual data is stored in data region and pointed by entries in FAT.

Note that FAT word is overloaded. Name of file-system is also FAT and the table that stores cluster allocation information for files and directories is also called FAT.

3.2 Directory

Note that as far as filesystem is considered directory is a special file which contains list of other files. Hence if a Directory test contains file a.txt and b.txt. That means there is a special file called test with directory flag set, with its data having names and information about a.txt, b.txt.

Here directory flag for test would be set in its inode. a.txt and b.txt information would be stored in data blocks.

3.3 Slack space

Fat32 filesystem for partitions <8GB, typically uses default cluster size of 4KB.

For example if we store 100 bytes of data in a file, then the corresponding data block with only have 100 bytes of data. Remaining 3996 bytes of space is left unused. We cannot use this unused space for any other purpose. Ideally this space should contain only zeros(0). But after using file-system for long time it is likely to have considerable junk information for previous files in the slack space area.

Space left after useful information before end of current 4096 bytes block is called slack space.

4 Partition table types

There are two types of partition tables:

MSDOS
This is the older partition table type used by Windows XP and before OS's. This has limit of maximum 2TB disk size. Further this has two types of partitions - Primary and extended. There can be maximum 4 primary partitions. If we need more partitions then one of the four partitions is created not as primary but as extended partition. We can have unlimited logical partitions inside an extended partition.

We can read MSDOS partitions used fdisk command.

GPT
These are new types of partition tables in use by modern OS by default. For working with GPT file-systems we must use parted command.

Since we will be dealing with small pen drives as part of the lecture demos, we will restrict ourselves to use of older msdos partition tables only.

5 Design and structure of FAT32 file system

Fat32 filesystem has following components:

  1. In first 512 bytes indicating various parameters size, type, volume label, volume ID, etc. of file-system.
  2. Starting at second sector for quick values on next free data cluster and no. of free clusters.
  3. Remaining reserved sectors
  4. File Allocation Table(s)
  5. Data Region

5.1 Boot Sector

  1. First three bytes are jump instruction - a short jump followed by a NOP (opstring sequence 0xEB 0x?? 0x90) or a near jump (0xE9 0x?? 0x??)
  2. Next eight bytes are OEM name
  3. FAT32 Extended BIOS Parameter Block (60 or 79 bytes). This extended BPB contains DOS 3.31BPB which in turns contains DOS 2.0 BPB. Values in these structures are described ahead.
  4. 2 bytes for bytes per logical sector. Typically 512.
  5. 1 bytes for logical sectors per cluster. Typically 8 for 4kb block.
  6. 2 byte for count of reserved logical sectors. Typically 32 for FAT32.
  7. 1 byte for number of file-allocation tables. Typically 2.
  8. 2 bytes for number of root directory entries. In Fat32 this is 0 as root directory is also stored in data region.
  9. 2 bytes for number of logical sectors. If zero use 4-byte value at later stage. (Point 15)
  10. 1 byte for media descriptor. 0xf8 for fixed disks.
  11. 2 bytes for logical sectors per fat. 0 for FAT32 as fat32 uses a 4-byte value described later. (Point 16)
  12. 2 bytes for physical sectors per track for CHS addressing.
  13. 2 bytes for no. of heads for CHS addressing
  14. 4 bytes of count of hidden sectors before FAT partition
  15. 4 bytes of total logical sectors instead of 16-bit value at point 9
  16. 4 bytes of logical sectors per fat. (Use this in case of fat32 instead of value at point 11)
  17. 2 bytes of drive description
  18. 2 byte version. Should be 0.0
  19. Cluster no. of root directory start. Typically 2.
  20. 2 byte for sector no. for FS information sector. Typically 1.
  21. 2 byte for first logical sector number of a copy of three FAT32 boot sectors. Typically 6.
  22. 12 reserved bytes. Should be 0.
  23. Ignore next 2 bytes
  24. 1 byte value of 0x29 to indicate FAT32 extended boot signature
  25. 4 bytes of Volume ID
  26. 11 bytes of Volume Label
  27. 8 bytes of FS type

5.2 FS Information Sector

  1. Four bytes signature (0x52 0x52 0x61 0x41 = "RRaA")
  2. Next 480 bytes are reserved (0x00)
  3. Four bytes signature (0x72 0x72 0x41 0x61 = "rrAa")
  4. Four bytes of last known number of free data clusters
  5. Four bytes of number of most recently allocated data cluster. 0xffffffff after format. Should start from 0x02 onwards.
  6. Another 12 reserved bytes to be set to zero
  7. Four bytes of signature (0x00 0x00 0x55 0xAA)

5.3 File allocation table.

FAT32 file allocation table has 32 bit address for each entry in little endian order. First four bits of this entry are reserved, should be set to 0 during formatting and should not be changed otherwise. (First four bits of fourth byte from left to right).

Each 32-bit value can be one of the following:

  • Decimal 0 to indicate cluster is unused
  • Decimal 1 to indicate special reserved cluster
  • Cluster number of next cluster for this entry (0200 0000 to efff ff0f) (Read as 0000 0002 to 0fff ffef)
  • Reserved (0fff ff0f to f6ff ff0f (Read as 0fff fff0 to 0fff fff6)
  • Special values to mark bad cluster (f7ffff0f) (Read as 0fff fff7)
  • Special value to mark end of cluster (f8ff ff0f to ffff ff0f) (Read as 0fff fff8 to 0fff ffff)

5.4 Data region

Data region has either file or directory contents. File contents are straight forward. Length of file comes from directory listing. Further next cluster location of files longer than 4096 bytes comes from FAT. Thus there is easy direct way to read files from data region.

Directories on the other hand contain considerable information. They use following structure:

  1. First 8 bytes is short file name padded with spaces.
    • If filename starts with 0x00 that means the entry is unused and

    there is nothing used after this.

    • If filename starts with 0xe5 that means the file has been deleted.
  2. Next 3 bytes are short file extension
  3. Next byte is file attribute
    BitSignificance
    location
    & mask
    1 (0x80)Reserved
    2 (0x40)Device
    3 (0x20)Archive
    4 (0x10)Sub-directory
    5 (0x08)Volume label
    6 (0x04)System
    7 (0x02)Hidden
    8 (0x01)Readonly
    Here 0x0f is used to indicate VFAT long file name.
  4. 1 byte for indicating whether EAS (Extended attributes are used or not)
  5. 1 byte for 10ms time from 1 to 199
  6. 2 bytes for creation time as follows
    BitsDescription
    15-11Hours(0-23)
    10-5Minutes(0-59)
    4-0Seconds (0-29)
    Seconds are recorded in 2s resolution. Thus for 8 seconds value stored will be 4. Further resolution is provided by previous byte which stores ms information.
  7. 2 bytes for creation date as follows:
    BitsDescription
    15-9Year(0=1989,max 127=2107)
    8-5Month(1-12)
    4-0Day(1-31)
  8. 2 bytes for last access date
  9. High 2 bytes of first cluster number in FAT32h
  10. 2 bytes for last modified time
  11. 2 bytes for last modified date
  12. Low 2 bytes of first cluster number in FAT32
  13. 4 bytes file size

5.5 Long filename entries

If file attribute is 0x0f then it indicates VFAT long filename, which has following structure:

LocateSizeDescription
(in bytes)
0x001Sequence Number (bit 6: last logical, first physical LFN entry,
bit 5: 0; bits 4-0: number 0x01..0x14 (0x1F), deleted entry: 0xE5)
0x0110Name characters (five UCS-2 characters)
0x0B1Attributes (always 0x0F)
0x0C1Type (always 0x00 for VFAT LFN)
0x0D1Checksum of DOS file name
0x0E12Name characters (six UCS-2 characters)
0x1A2First cluster (always 0x0000)
0x1C4Name characters (two UCS-2 characters)

6 Demo - Creating fat32 file system on a pen drive

Following steps can be used to understand little bit about fat32 file-systems:

  1. Connect the pen drive to system.
  2. Use fdisk -l to see that device has been detected. See the overall size (4GB, 8GB, etc.) as secondary verification.
  3. Zero the device to be used for experiments using:
    dd if=/dev/zero of=/dev/sdb
    #If you get impatient use "sudo killall -9 dd; sync" to terminate dd command
    

    Use this only if the pen drive has no useful information and pen drive device is indeed /dev/sdb. Many people have lost all there data by using this type of system command incorrectly.

  4. Check that at least initial blocks have been zeroed
    xxd /dev/sdb | more
    

    Example output

    0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    
  5. Create a single primary partition with fat label on the device
    fdisk /dev/sdb
    n
    p
    1
    
    
    p
    t
    1
    b
    p
    w
    fdisk -l
    

    Here:

    • n is to create a new partition
    • p for indicating that new partition should be a primary partition
    • 1 for creating first primary partition
    • two enter for default start and end sector numbers for the new first primary partition being created
    • t for changing file-system type
    • 1 for changing file-system type for 1st primary partition
    • b for setting new file-system type to be fat32
    • p for printing new partition table after creating a new partition and changing its file-system type
    • w for saving and quiting
    • fdisk -l for printing new partition table on all devices. Note that it would understand only msdos partition tables.

    Example input/output:

    [root@barjatiyarklp ~]# fdisk /dev/sdb
    Welcome to fdisk (util-linux 2.23.2).
    
    Changes will remain in memory only, until you decide to write them.
    Be careful before using the write command.
    
    Device does not contain a recognized partition table
    Building a new DOS disklabel with disk identifier 0xd0bfdbcc.
    
    Command (m for help): n
    Partition type:
       p   primary (0 primary, 0 extended, 4 free)
       e   extended
    Select (default p): p
    Partition number (1-4, default 1): 1
    First sector (2048-7821311, default 2048): 
    Using default value 2048
    Last sector, +sectors or +size{K,M,G} (2048-7821311, default 7821311): 
    Using default value 7821311
    Partition 1 of type Linux and of size 3.7 GiB is set
    
    Command (m for help): p
    
    Disk /dev/sdb: 4004 MB, 4004511744 bytes, 7821312 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk label type: dos
    Disk identifier: 0xd0bfdbcc
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1            2048     7821311     3909632   83  Linux
    
    Command (m for help): t
    Selected partition 1
    Hex code (type L to list all codes): b
    
    WARNING: If you have created or modified any DOS 6.xpartitions, please see the fdisk manual page for additionalinformation.
    
    Changed type of partition 'Linux' to 'W95 FAT32'
    
    Command (m for help): p
    
    Disk /dev/sdb: 4004 MB, 4004511744 bytes, 7821312 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk label type: dos
    Disk identifier: 0xd0bfdbcc
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1            2048     7821311     3909632    b  W95 FAT32
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
    [root@barjatiyarklp ~]# fdisk -l
    
    Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disk label type: dos
    Disk identifier: 0x34a0032f
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1   *        2048      206847      102400    7  HPFS/NTFS/exFAT
    /dev/sda2          206848   184322047    92057600    7  HPFS/NTFS/exFAT
    /dev/sda3       184322048   368642047    92160000    7  HPFS/NTFS/exFAT
    /dev/sda4       368642048   976773167   304065560    5  Extended
    /dev/sda5       368644096   471044095    51200000   83  Linux
    /dev/sda6       471046144   503814143    16384000   82  Linux swap / Solaris
    /dev/sda7       503816192   976773119   236478464   83  Linux
    
    Disk /dev/sdb: 4004 MB, 4004511744 bytes, 7821312 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk label type: dos
    Disk identifier: 0xd0bfdbcc
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1            2048     7821311     3909632    b  W95 FAT32
    
    
  6. First partition information would start at bytes 446 (1be in hex) and would contain 16 bytes of information.
    • 1 bit of 1 byte indicates whether partition is bootable. Other bits of 1 byte are unused.
    • Next 3 bytes are for CHS address of first absolute sector (1byte head,(9-8) 2 high bit cylinder, 6 low bits for sector, 8 remaining bits for cylinder)
    • Then 1 byte of partition type
    • Next 3 bytes are for CHS address of last absolute sector
    • Then 4 bytes of LBA of first absolute sector in partition
    • Then 4 bytes of number of sectors in partition

    Note that with 512 bytes per sectore this limits size of partition to 2TB and size of disk to 4TB Example partition 1 information:

    00001b0: 0000 0000 0000 0000 ccdb bfd0 0000 0021  ...............!
    00001c0: 0300 0b2a ccf9 0008 0000 0050 7700 0000  ...*.......Pw...
    

    Here, we can see at

    • 1be -> 00 Since partition is not bootable
    • 1bf-1c2 -> 0x210300 CHS address of first sector of partition, should be ignored. 0x21 is 33 in Decimal.
      • (Read later) With 62 (0x3e) sectors per track. We get 33*62 = 2046. 2046+3 = 2049 is the first sector of partition.
    • 1c3 -> 0b Partition type for fat32 we wrote using fdisk
    • 1c4-1c6 -> 2accf9 CHS address of last sector of partition, should be ignored
    • 1c7-1ca -> LBA address of start of partition -> 0000 0800 (Read right to left due to enddianness) = 2048 in decimal
    • 1cb-1ce -> Length of partition -> 00775000 = 7819264 in base 10. This is same as 7821311-2048+1.
  7. Since we have choosen 2048 sector as first sector of first primary partition, we can read values of this partition using:
    xxd -s +1048576 /dev/sdb | more
    

    Here, 1048576 is 2048 * 512 (0010 0000 in hex). Example output would be:

    [root@barjatiyarklp ~]# xxd -s +1048576 /dev/sdb | more
    0100000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................     
    
  8. Creating fat filesystem in first primary partition using:
    mkfs.fat /dev/sdb1
    
  9. Again look at first primary partition data using:
    xxd -s +1048576 /dev/sdb | more
    

    Here, 1048576 is 2048*512

    Example output would be:

    0100000: eb58 906d 6b66 732e 6661 7400 0208 2000  .X.mkfs.fat... .
    0100010: 0200 0000 00f8 0000 3e00 7c00 0000 0000  ........>.|.....
    0100020: 0050 7700 c61d 0000 0000 0000 0200 0000  .Pw.............
    0100030: 0100 0600 0000 0000 0000 0000 0000 0000  ................
    0100040: 0000 2992 685c 474e 4f20 4e41 4d45 2020  ..).h\GNO NAME  
    0100050: 2020 4641 5433 3220 2020 0e1f be77 7cac    FAT32   ...w|.
    0100060: 22c0 740b 56b4 0ebb 0700 cd10 5eeb f032  ".t.V.......^..2
    0100070: e4cd 16cd 19eb fe54 6869 7320 6973 206e  .......This is n
    0100080: 6f74 2061 2062 6f6f 7461 626c 6520 6469  ot a bootable di
    0100090: 736b 2e20 2050 6c65 6173 6520 696e 7365  sk.  Please inse
    01000a0: 7274 2061 2062 6f6f 7461 626c 6520 666c  rt a bootable fl
    01000b0: 6f70 7079 2061 6e64 0d0a 7072 6573 7320  oppy and..press 
    01000c0: 616e 7920 6b65 7920 746f 2074 7279 2061  any key to try a
    01000d0: 6761 696e 202e 2e2e 200d 0a00 0000 0000  gain ... .......
    01000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01001f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.     
    

    Here, we can see that

    AddressValueDescription
    range
    00-020xeb5890Short jump ahead additional 0x58 (88 in decimal) followed by NOP
    03-0amkfs.fatOEM Name
    Then 79 bytes of Fat32 Ext. BPB of which first 25
    bytes are DOS 3.31 BPB. DOS 3.31 BPB first 13
    bytes are DOS 2.0BPB.
    0b-0c0x0200Number of bytes per logical sector (512 in decimal)
    (From DOS 2.0BPB)
    0d0x08Logical sectors per cluster
    0e-0f0x0020Number of reserved logical sectors (32 in decimal)
    100x02Number of file allocation tables
    11-120x0000Number of root directory entries
    13-140x0000Number of logical sectors (16-bit value)
    150xf8Media descriptor
    16-170x0000Number of logical sectors per fat (16-bit value)
    18-190x003ePhysical sectors per track. (62 in decimal)
    1a-1b0x007cNumber of heads. (124 in decimal)
    1c-1f0Number of hidden sectors before FAT partition
    20-230x775000Total logical sectors (7819264 in decimal)
    24-270x1dc6Logical sectors per fat (7622 in decimal)
    28-290x0000Drive description
    2a-2b0x0000Version (0.0)
    2c-2f2Cluster no. of root directory start
    30-310x0001Logical sector no. of FS Information Sector
    32-330x0006First logical sector no. of a copy of three FAT32 boot sectors
    34-3fReserved should be (zero)
    40Ignore
    41Ignore
    420x290x29 indicates extended boot signature
    43-46Volume ID (0x92685c47)
    47-51Volume Label (NO NAME)
    52-59Filesystem type (FAT32)
  10. Look at FS Information Sector using
    xxd -s +1049088 /dev/sdb | more
    

    Here 1049088 is 2048*512+512

    The values are:

    0100200: 5252 6141 0000 0000 0000 0000 0000 0000  RRaA............
    0100210: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100220: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100230: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100240: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100250: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100260: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100270: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100280: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100290: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01002a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01002b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01002c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01002d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100300: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100310: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100330: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100340: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100350: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100360: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100370: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100380: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0100390: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01003a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01003b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01003c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01003d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    01003e0: 0000 0000 7272 4161 89e2 0e00 0200 0000  ....rrAa........
    01003f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.      
    

    Here, various signatures are matching. Here Last known number of free data clusters is 0x000ee289 which is 975,497 in decimal. Each cluster has 8 sectors so 7,803,976 sectors. Length of partition is 7,819,264 leaving 15288 sectors for Boot sector, FS information block, reserved, etc. purposes.

    Number of most recently used data cluster is set to 0x02 from where root directory should start.

  11. Look at first fat using
    xxd -s +1064960 /dev/sdb | more
    

    Here 1064960 is 2048*512+32*512 where 32 sectors were reserved.

    Note that 975,497 free clusters were mentioned in FS Information sector. These many clusters required 3,901,988 bytes of space assuming 4byte per cluster for FAT. That means almost 7622 sectors for one FAT. This value matches what we read in table before at addresses 24-27, hex value 0x1dc6 for logical sectors per fat (7622 in decimal). Since we keep two FATs we have 15244 sectors just for storing FAT.

    Total sectors for FAT, reserved etc. after removing data clusters was 15288. 15288(Total)-15244(FAT*2)-32(Reserved)=12. Thus, only four 14 sectors are unused. Minimum 8 are required to make one more usable cluster.

    Example output is:

    0104000: f8ff ff0f ffff ff0f f8ff ff0f 0000 0000  ................
    0104010: 0000 0000 0000 0000 0000 0000 0000 0000  ................      
    
  12. Lets look at first data region
    xxd -s +8869888 -l 10000 /dev/sdb | less
    

    2048(Partition offset) +32 (Reserved) +15244 (FAT) = 17324. 17324*512=8869888.

    Should be all 0 as we do not have any files or directories inside root directory either.

  13. Mount the partition on a test folder
    mkdir /mnt/test
    mount /dev/sdb1 /mnt/test
    
  14. Create a file a.txt with word hello
    echo "hello" > /mnt/test/a.txt
    
  15. Umount the pen drive
    umount /mnt/test
    
  16. Again look at first fatdata region:
    xxd -s +1064960 /dev/sdb | more
    

    The fat table might look like:

    0104000: f8ff ff0f ffff ff0f f8ff ff0f ffff ff0f  ................
    0104010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    

    This indicates that root directory at cluster 2 ends at cluster 2 and that there is something else at cluster 3 that also ends at cluster 3. Other clusters look unused.

    Look at data region using:

    xxd -s +8869888 -l 10000 /dev/sdb | less
    

    The directory entry might look like:

    0875800: 4161 002e 0074 0078 0074 000f 005d 0000  Aa...t.x.t...]..
    0875810: ffff ffff ffff ffff ffff 0000 ffff ffff  ................
    0875820: 4120 2020 2020 2020 5458 5420 0064 94b5  A       TXT .d..
    0875830: 544b 544b 0000 94b5 544b 0300 0600 0000  TKTK....TK......
    0875840: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    0875850: 0000 0000 0000 0000 0000 0000 0000 0000  ................      
    

    Here, we have two directory entries

    1. First entry file attribute at byte 12 is 0x0f. Hence it is a VFAT long file name.
      • Byte 1 contains 0x41 indicating that this is the first entry (Sequence no. 1) and also that this is the last entry (Sixth bit 0x40 is set)
      • Byte 2-0a contain first five characters of filename in 2-byte character (UCS-2) format. (a.txt)
      • Byte 0b is 0x0f
      • Byte 0c is 00
      • Byte 0d is checksum
      • Byte 0e onwards for 12 bytes there are remaining 6 UCS-2 characters. Here first character is 0x0000 and all others are 0xffff means that file name is already completed by before bytes.
      • Byte 1a onwards for 2 bytes indicates next cluster 0x0000
      • Byte 1c onwards has 2 UCS-2 characters taking 4 additional bytes.
    2. Second entry file attribute at byte 12 is 20. This indicates that it is a normal file and not a directory. Further archive bit is set indicating that file has been modified since last backup.
      • Byte 1 contains name A followed by 7 spaces for another 7 bytes
      • Byte 8-0a contain extension TXT
      • Byte 0b contains 0x20 file attribute
      • Byte 0c contains 0x00 indicating EAS extension are not used
      • Byte 0d contains 0x64 indicating 100*10ms or 1 second.
      • Byte 0e-0f contain 0xb594 (1011 0101 1001 0100) indicating:
        • 10110 Hours = 22
        • 101100 Minutes = 44
        • 10100 Seconds = 20 (40 seconds)
      • Byte 10-11 contain 0x4b54 (0100 1011 0101 0100) indicating:
        • 0100101 Years = 37 (1980+37=2017)
        • 1010 Month = 10
        • 10100 Day = 20
    3. Byte 12-13 contain 0x4b54 indicating 20-10-2017 as last access date
      • Byte 14-15 contain high two bytes of first cluster 0x0000
      • Byte 16-17 contain last modification time 0xb594 which is 22:44:40
      • Byte 18-19 contain last modification date 0x4b54 which is 20-10-2017
      • Byte 1a-1b contain low two bytes of file cluster 0x0003 which indicates cluster no. 3
      • Bytes 1c-1f contain 0x06 which indicates file size of 6 bytes

    We can verify that date, time and file size were read correctly using:

    [root@barjatiyarklp ~]# mount /dev/sdb1 /mnt/test/
    [root@barjatiyarklp ~]# ls -l /mnt/test/ --full-time -c
    total 4
    -rwxr-xr-x 1 root root 6 2017-10-20 22:44:41.000000000 +0530 a.txt
    [root@barjatiyarklp ~]# ls -l /mnt/test/ --full-time
    htotal 4
    -rwxr-xr-x 1 root root 6 2017-10-20 22:44:40.000000000 +0530 a.txt
    [root@barjatiyarklp ~]# umount /mnt/test/      
    

    Further, if try to look at third cluster (2048+32+15244+(3-2)*8)=17332. 17332*512=8873984:

    xxd -s +8873984 /dev/sdb | more
    

    You should get below output:

    0876800: 6865 6c6c 6f0a 0000 0000 0000 0000 0000  hello...........
    0876810: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    
  17. If we delete the file and try to read data region then
    mount /dev/sdb1 /mnt/test
    rm -f /mnt/test/a.txt
    umount /mnt/test      
    
    xxd -s +1064960 /dev/sdb | more
    

    Fat will look like:

    0104000: f8ff ff0f ffff ff0f f8ff ff0f 0000 0000  ................
    0104010: 0000 0000 0000 0000 0000 0000 0000 0000  ................      
    

    Second cluster (root directory) data region has:

    xxd -s +8869888 -l 10000 /dev/sdb | less      
    
    0875800: e561 002e 0074 0078 0074 000f 005d 0000  .a...t.x.t...]..
    0875810: ffff ffff ffff ffff ffff 0000 ffff ffff  ................
    0875820: e520 2020 2020 2020 5458 5420 0064 94b5  .       TXT .d..
    0875830: 544b 544b 0000 94b5 544b 0300 0600 0000  TKTK....TK......
    0875840: 0000 0000 0000 0000 0000 0000 0000 0000  ................      
    

    Here filenames starting with 0xe5 indicate that entry has been deleted

    Third cluster value is:

    xxd -s +8873984 /dev/sdb | more
    
    0876800: 6865 6c6c 6f0a 0000 0000 0000 0000 0000  hello...........
    0876810: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    

7 Take aways so far

  • 32-bit length of partition in msdos partition table with 512 byte sector limits partition size to 2TB
  • 32-bit file-size in FAT32 limits filesize to 4GB
  • When a file is deleted in fat entire contents are saved. Directory information of file simple indicates file is deleted. Thus, restoring file on FAT is possible, provided the same has not been overwritten.
  • We can learn to read FAT filesystem directory at binary level with some effort
  • Files on FAT filesystem may not necessarily get stored contiguously. This should be remembered when conducting forensic analysis sector by sector (or cluster by cluster)
  • During forensics if you can determine cluster size (4kb, 8kb, etc.). Then we can search for filetype specific magic start value near cluster start without looking at entire cluster data.

    Try:

    less /usr/share/misc/magic
    file /usr/share/pixmaps/fedora-logo.png
    

8 Ext3 file-system design

It is not possible to go to bit level design of various aspects of mature file-systems such as ext3 in a single session. Interested readers can use references to get internal details of ext3 file-system. As part of this session we will only look at high-level design

8.1 Ext3 Disk inodes

An ext3 file-system is inode based. An ext3 disk inode stores following information:

  • File owner and group owner information (uid, gid)
  • File type - There are types other than files, directories such as pipes and sockets
  • Permissions (rwx for user owner, group owner and others)
  • Access, modification and creation timestamps
  • Number of links to the file
  • 12 (Index 0:11) direct data block addresses, 1 indirect data block address, 1 double-indirect data block address and 1 triple indirect data block address

Each disk inode is fixed 128 bytes in size. Note that in-memory inodes also have additional information such as no. of open handlers to file, which are not described above.

Please refer to https://en.wikipedia.org/wiki/Ext2 for understanding block address pointers.

8.2 Directory

An ext3 directory contains following information

  • Inode number
  • Directory entry length (Typically mutiple of 4)
  • Filename length
  • File type
  • File name. If filename is not exact multiple of 4 it might get padded with 0x00 (NULL) on the right till the directory entry length is achieved.

8.3 Differences between ext3 and fat

  1. In case of FAT directory entry contains timestamps, file length, etc. In case of ext3 directory entry contains only two things - Inode number and file name. Rest all information is stored in inodes.
  2. In FAT information about next cluster in data region in file is stored in FAT. In ext3 inode contains pointers (direct, indirect, double indirect, etc.) to data blocks used by particular file.
  3. We can decide number of inodes while creating the file-system. It is not fixed upon block size or partition size parameters.
  4. In ext3 it is possible to have hard-links which is not possible in FAT. For hard-links two different directory entries with different file-names will point to the same inode. That is why inode needs to keep track of number of entries pointing to it.
  5. In ext3 it is possible to have sparse files. This is possible by having previous data block address as 0, indicating entire data-block of zeros (Typically 4k).

9 NTFS file-system design

Similar to ext3 we will only have a high-level design overview of NTFS file-system.

NTFS uses Master File table to store filesystem information. In case of MFT small files and directories (Smaller than 512 bytes) can be stored directly inside MFT without requiring to refer to a data block. This can save considerable wastage of space as slack space.

Other interesting things about NTFS file system are:

  1. NTFS file system supports compressed and encrypted files.
  2. In case NTFS we can give different types of permission to a file for each user or group (SID). For each SID we can given allow/deny along with type of access read, write, full, etc. This is different from ext3 where without POSIX ACLs typically only user owner and group owner permission can be given.

    There is also provision to inherit ACLs from parent folders. There are separate set of ACLs for auditing.

  3. NTFS uses transaction journal to keep record of changes being done to file-system. If a change is completed properly the same is recorded. Else during hard reboot NTFS can see actions under progress which did not complete properly and abort them, ensuring that file-system is always sane.
  4. NTFS similar to FAT can get fragmented and requires defragmentation to improve its speed. In contrast ext3 algorithms are designed to avoid fragmentation during allocation of disk blocks.

10 References

Following books and links were used or would be useful in gaining further insight into file-systems:

Date: 2017-10-19 Thu

Author: Saurabh Barjatiya

Org version 7.9.3f with Emacs version 24

Validate XHTML 1.0