Some random notes


Ok....this page consists of some of my learning while writing nngfs. Below are they listed

 

1)  Before writing modules you must know for what kernel version you are writing. There have been a lot of changes from kernel 2.4 to 2.6. Earlier fs specific data was kept in a pointer called u.generic_ip ( ... i am telling lie...read more ) while now it is kept in i_private field of struct inode.

  Since I was writing the code for 2.6, but didnt have a vfs_inode info field in my incore inode. So for past sometime I am trying to figure out a way to create and allocate a new inode (http://article.gmane.org/gmane.linux.kernel.kernelnewbies/24116). What i was not able to figure out how will the default allocator provide space for my fs specific inode structure coz if you see the 2.6 inode structure you only have a pointer. However 2.4 structures have the fs specific inode info in the same union as of generic_ip. So when the inode is allocated through kernel in 2.4 it means that you already have space allocated for your fs inode too and that is why i told you that i am telling lie above. So it means that you need to have vfs_inode field in your inode info and can't do away without it in 2.6 kernels.

 

2) Function ext2_block_to_path(): Fills an array of offsets where data resides.

   offsets is a list of fbn's (file block numbers), or indexes in the inode->i_data where you should find your data block.

 

eg..

for block 5

  offset[0] = 5

 

  for block = 100

   offset[0] = 12

   offset[1] = 88  which means you should lookup the 88th entry in the first indirect block which is the 12th block of inode.

 

  for block=1056 (This is file block number == (size+4095)/4096)

    offset[0] = 13

    offset[1] = 0  == (20/1024) == ( 1056-12-1024) /1024

    offset[2] = 20 ..... basically it means that goto the 13th block, read 0th entry from it.....go to that block and then read 20th entry from it.

 

The parameter 'boundary' if I understand correctly is 0 iff this is the last entry of a direct, indirect , or triple indirect block.......ie... if iblock == 11, 1023 etc. .

 

 

3)Function ext2_get_branch()

This uses the offset list from above and returns a chain consisting of blocknumber, its address, and a buffer head if required since in only

in case of indirect blocks the addresses are kept in another block, so you need to read buffer before you can read the block number. For fbn's

less than 12 you don't need a buffer.

 

eg... if offset[0] = 5, triplet will be <inode->i_data[5], &inode->i_data[5], NULL>

      if offset[0] =12, and offset[1] = 88 triplet will be consisting of indirect, so it will be:

            <inode->i_data[12], &inode->i_data[12], NULL>

            <value of 88th entry in inode->i_data[12],(int *)buffer_head_of(inode->i_data[12])+88,(int *)buffer_head_of(inode->i_data[12]))

 

For 2.6 kernels also we do the same thing , except we take a lock EXT2_I(inode)->i_meta_lock) before adding to the chain.

 

3) ext2_truncate() - The inode->i_size used in the function is the new size after truncation. Thus if you reach here from inode delete path, this size should be 0

and the block array for the truncated portion in incore inode is cleared by calling ext2_free_data(). So if you truncate a file to such an extent that it reduces to direct block pointers (n==1  condition in ext2_truncate), you need to clear the block array, else you don't.

 

 

 4) A basic hash implementation ready to be used in simple programs. Get it from here

 

 

5) Some code flow

 

                ext2_release_file() - Called when file is closed

                            |

ext2_truncate()    ---> ext2_discard_reservation() - Takes the reservation lock and frees the block reservation window by removing from rb tree.

                            |                          |

                ext2_clear_inode()          |

                                                        ---> rsv_window_remove()

 

 

ext2_free_blocks() - Free 'count' number of blocks starting from block

     |

      -->read_block_bitmap() - Read block bitmap

     |

     -->ext2_get_group_desc() - Get the group descriptor

     |

     -->ext2_clear_bit_atomic() - Free the entry

     |

     -->group_adjust_blocks() - Adjust block group free counters in descriptor with blockgroup lock held.

     |

     -->release_blocks() - Adjust the per cpu free block counters in superblock

     |

     -->DQUOT_FREE_BLOCK() - Adjust quota block updates.

 

 

 

bitmap_search_next_usable_block() - Search the bitmap till a free entry is found.

        |

        -->ext2_find_next_zero_bit()

 

find_next_usable_block() - Search for a usable block in group

 

ext2_try_to_allocate() - Attempt to allocate blocks within a given range. If a reservation window is there try to allocate from there.

 

 

ext2_add_nondir()

      |

      ---> ext2_add_link()

                |

                ---> ext2_chunk_size() - Block sized chunks

                                                                |

                                                                ---->ext2_get_page() - Get the pages of directory in which entry is being inserted.

                                                               |                 |

                                                               |                  ---->ext2_check_page() - This functions does the basic checks on the page for the directory entries.

                                                               |                                                         Such as for misaligned entries bad entry length etc. and then sets the checked bit on page.

                                                               |

                                                                ---> __ext2_write_begin() --> block_write_begin() - Basic task of block allocation and bringing partial write blocks uptodate.

                                                                                                                |

                                                                                                                ---> __grab_cache_page()

                                                                                                                |

                                                                                                                 --> find_lock_page() --> page_cache_alloc()

                                                                                                                |

                                                                                                                 ---> __block_prepare_write()

 

 

 

write_one_page() - Writes a single page and may optionally wait for it to complete. Page must be locked. Allocate a wbc structure with appropriate

    |                  mode and number of pages to write.

    |

     ---> wait_on_page_writeback() - Called if we need to wait

                              |

                               ---> wait_on_page_bit(PG_writeback)

                                                   |

                                                    ---> __wait_on_bit() - Waits on the appropriate waitqueue

 

 

 

ext2_lookup() ---> ext2_inode_by_name()

                   |            |

                   |             -->ext2_find_entry()

                   |

                    ---> iget() - Get the inode corresponding to ext2_find_entry()

 

 

 

 

ext2_create()

            |

            --> ext2_new_inode()

            |           |

            |           --> new_inode()

            |           |       |

            |           |       --> alloc_inode() -- allocates from the inode cache.

            |           |

            |           -->ext2_get_group_desc() -- Get the group descriptor in a buffer head.

            |           |

            |           -->read_inode_bitmap() -- Get the inode bitmap in buffer head.

            |           |

            |           -->ext2_find_next_zero_bit() -- Get the next zero bit for inode number to allocate.

            |           |

            |           -->Increment the percpu counters appropriately for the newly allocated inode & mark superblock and bitmap dirty.

            |           |

            |           -->Set the default fields of inode and mark inode dirty

            |           |

            |           -->ext2_preread_inode() - Preread the inode blocks of inode in anticipation that it will be written back soon.

            ---> Set address space operations depending on the superblock flags.

           |

            ---> mark_inode_dirty()

 

 

ext2_symlink() --> ext2_new_inode()

                                               |

                                               --> If namelen > inode data size

                                               |            |

                                               |              -->page_symlink() --> pagecache_write_begin()

                                               |

                                                --> else memcpy the symlink name

                                               |

                                               --> mark_inode_dirty()

                                               |

                                                --> ext2_add_nondir()

 

 

ext2_link() -->  Increment i_nlink of inode.

   |

    ---> ext2_add_nondir()

 

ext2_mkdir()

                |

                --> inode_inc_link_count() -- Increment link count of parent dir.

                |

                --> ext2_new_inode() -- Get a new inode.

                |

                --> inode_inc_link_count() -- Increment link count of dir.

                |

                --> ext2_make_empty() -- Create default entries in directory

                |

                --> ext2_add_link() -- Add the entry to directory

                |

                --> d_instantiate() -- Attach the inode to dentry.

 

a) d_splice_alias() - This is required for filesystems which are exportable . As far as I understand this is required to dcache all possible alias names for a directory. See the thread on kernel archive to understand better

 

b) See the ext3 write path flowdiagram ext3 write

 

 

Useful Stacks :

---------------------------

 

Writing a dirty inode to disk.

 

(gdb) bt

#0  ext2_update_inode (inode=0x2762d134, do_sync=0) at fs/ext2/inode.c:1312

#1  0x2883fc4b in ext2_write_inode (inode=0x2762d134, wait=0) at fs/ext2/inode.c:1416

#2  0x080d40f3 in __writeback_single_inode (inode=0x2762d134, wbc=0x27b4ce1c) at fs/fs-writeback.c:178

#3  0x080d459d in generic_sync_sb_inodes (sb=0x27b76d50, wbc=0x27b4ce1c) at fs/fs-writeback.c:501

#4  0x080d469a in sync_sb_inodes (sb=0x2762d134, wbc=0x0) at fs/fs-writeback.c:534

#5  0x080d4702 in sync_inodes_sb (sb=0x27b76d50, wait=0) at fs/fs-writeback.c:614

#6  0x080b9422 in __fsync_super (sb=0x27b76d50) at fs/super.c:251

#7  0x080b9494 in fsync_super (sb=0x27b76d50) at fs/super.c:270

#8  0x080b96cd in generic_shutdown_super (sb=0x27b76d50) at fs/super.c:294

#9  0x080b9773 in kill_block_super (sb=0x27b76d50) at fs/super.c:821

#10 0x080b9822 in deactivate_super (s=0x27b76d50) at fs/super.c:185

#11 0x080cd649 in mntput_no_expire (mnt=0x27852d08) at fs/namespace.c:639

#12 0x080cdb3f in sys_umount (name=0x8059b00 "", flags=0) at fs/namespace.c:1153

#13 0x080cdb9e in sys_oldumount (name=0x8059b00 "") at fs/namespace.c:1165

#14 0x0805a4e8 in handle_syscall (r=0x278f9a14) at arch/um/kernel/skas/syscall.c:35

#15 0x080686a2 in userspace (regs=0x278f9a14) at arch/um/os-Linux/skas/process.c:201

#16 0x0805774e in fork_handler () at arch/um/kernel/process.c:179

#17 0xa56b6b6b in ?? ()