Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
f2fs: Sync with upstream f2fs-stable 3.10.y
Browse files Browse the repository at this point in the history
Start with: "Revert "f2fs: move i_size_write in f2fs_write_end""
http://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs-stable.git/commit/?h=linux-3.10.y&id=4981df6bb4207fb61d978aa64fccf9525504cdd6

Up to: "f2fs: use file pointer for fscrypt_notsupp_process_policy"
http://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs-stable.git/commit/?h=linux-3.10.y&id=1dad3b6a0f1544f473bb59405cf48eb28a21cced

Omitted changes:
"f2fs: fix build for v3.10"
"f2fs: fix parameters of __exchange_data_block"
"f2fs: exclude special cases for f2fs_move_file_range"
"posix_acl: Clear SGID bit when setting file permissions"
"f2fs: support multiple devices"
"f2fs: fix redundant block allocation"
"scripts/tags.sh: catch 4.9-rc6"
"fs/super.c: fix race between freeze_super() and thaw_super()"

.
.
.

Revert "f2fs: move i_size_write in f2fs_write_end"

This reverts commit a2ee0a300344a6da76186129b078113354fe13d2.

When testing with generic/032 of xfstest suit, failure message will be
reported as below:

generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
    --- tests/generic/032.out	2015-01-11 16:52:27.643681072 +0800
    +++ results/generic/032.out.bad	2016-08-06 13:44:43.861330500 +0800
    @@ -1,5 +1,5 @@
     QA output created by 032
    -100 iterations
    -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
    -*
    -0100000
    +1: [768..775]: unwritten
    +Unwritten extents found!
    ...
    (Run 'diff -u tests/generic/032.out results/generic/032.out.bad'  to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests

In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.

Thread A			Thread B
- write_end
 - unlock page
				- writepages
				 - lock_page
				  - writepage
				  if page is out-of-range of file size,
				  we will skip writting the page.
 - update i_size

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: not allow to write illegal blkaddr

we came across an error as below:

[build_nat_area_bitmap:1710] nid[0x    1718] addr[0x         1c18ddc] ino[0x    1718]
[build_nat_area_bitmap:1710] nid[0x    1719] addr[0x         1c193d5] ino[0x    1719]
[build_nat_area_bitmap:1710] nid[0x    171a] addr[0x         1c1736e] ino[0x    171a]
[build_nat_area_bitmap:1710] nid[0x    171b] addr[0x        58b3ee8f] ino[0x815f92ed]
[build_nat_area_bitmap:1710] nid[0x    171c] addr[0x         fcdc94b] ino[0x49366377]
[build_nat_area_bitmap:1710] nid[0x    171d] addr[0x        7cd2facf] ino[0xb3c55300]
[build_nat_area_bitmap:1710] nid[0x    171e] addr[0x        bd4e25d0] ino[0x77c34c09]

... ...

[build_nat_area_bitmap:1710] nid[0x    1718] addr[0x         1c18ddc] ino[0x    1718]
[build_nat_area_bitmap:1710] nid[0x    1719] addr[0x         1c193d5] ino[0x    1719]
[build_nat_area_bitmap:1710] nid[0x    171a] addr[0x         1c1736e] ino[0x    171a]
[build_nat_area_bitmap:1710] nid[0x    171b] addr[0x        58b3ee8f] ino[0x815f92ed]
[build_nat_area_bitmap:1710] nid[0x    171c] addr[0x         fcdc94b] ino[0x49366377]
[build_nat_area_bitmap:1710] nid[0x    171d] addr[0x        7cd2facf] ino[0xb3c55300]
[build_nat_area_bitmap:1710] nid[0x    171e] addr[0x        bd4e25d0] ino[0x77c34c09]

One nat block may be stepped by a data block, so this patch forbid to
write if the blkaddr is illegal

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not use discard_map for hard disks

We don't need to keep discard_map, if disk does not support discard command.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: reduce batch size of fstrim

This is to reduce the batch size of fstrim to avoid long latency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add discard info to sys entry of f2fs status

This patch add discard block count to sys entry of f2fs status

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: skip new checkpoint when doing fstrim without fs change

This patch enables to do fstrim without checkpoint, if there is no fs
change.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set dirty state for filesystem only when updating meta data

We don't guarantee integrity of user data after checkpoint, since we only
guarantee meta data integrity for data consistency of filesystem.

Due to above reason, we only need to set fs as dirty when meta data is
updated, so that we can skip writing checkpoint in some case of non-meta
data is updated.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clean up foreground GC flow

This patch changes to check valid block number of one GCed section
directly instead of checking the number in all segments of section
one by one in order to clean up codes of foreground GC.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid unneeded loop in build_sit_entries

When building each sit entry in cache, firstly, we will load it from
sit page, and then check all entries in sit journal, if there is one
updated entry in journal, cover cached entry with the journaled one.

Actually, most of check operation is unneeded since we only need
to update cached entries with journaled entries in batch, so
changing the flow as below for more efficient:
1. load all sit entries into cache from sit pages;
2. update sit entries with journal.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to do f2fs_balance_fs in f2fs_map_blocks correctly

If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we
should call f2fs_balance_fs for checking and reclaiming space, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check return value of write_checkpoint during fstrim

During fstrim, if one of multiple write_checkpoint failed, break off and
return error number to caller.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove redundant judgement condition in available_free_memory

In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove unnecessary initialization

`flags' is used to save value from userspace, there is no need to
initialize it, and FS_FL_USER_VISIBLE is the mask for getflags.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix non static symbol warning

Fixes the following sparse warning:

fs/f2fs/data.c:969:12: warning:
 symbol 'f2fs_grab_bio' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to preallocate block only aligned to 4K

In write_begin(), we skip checking dnode block for preallocating block
when whole block needs to be updated since we preallocated its block in
f2fs_preallocate_blocks, for partial updated block, we will still try
to lock its node and do preallocation in write_begin(), so in
f2fs_preallocate_blocks we should not preallocate its block.

But previously, the calculation of preallocating block number is
incorrect, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c

f2fs: fix a bug when using namehash to locate dentry bucket

In the following scenario,

1) we don't have the key and doing a lookup for encrypted file,
2) and the encrypted filename is big name

we should use fname->hash as name hash value instead of what is
calculated by fname->disk_name. Because in such case,
fname->disk_name is empty.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: enable inline_dentry by default and add noinline_dentry option

Make inline_dentry as default mount option to improve space usage and
IO performance in scenario of numerous small directory.
It adds noinline_dentry mount option, instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: schedule in between two continous batch discards

In batch discard approach of fstrim will grab/release gc_mutex lock
repeatly, it makes contention of the lock becoming more intensive.

So after one batch discards were issued in checkpoint and the lock
was released, it's better to do schedule() to increase opportunity
of grabbing gc_mutex lock for other competitors.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do in batch synchronously readahead during GC

In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.

f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE

This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.

f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to do security initialization of encrypted inode with original filename

When creating new inode, security_inode_init_security will be called for
initializing security info related to the inode, and filename is passed to
security module, it helps security module such as SElinux to know which
rule or label could be applied for the inode with specified name.

Previously, if new inode is created as an encrypted one, f2fs will transfer
encrypted filename to security module which may fail the check of security
policy belong to the inode. So in order to this issue, alter to transfer
original unencrypted filename instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs crypto: avoid unneeded memory allocation in ->readdir

When decrypting dirents in ->readdir, fscrypt_fname_disk_to_usr won't
change content of original encrypted dirent, we don't need to allocate
additional buffer for storing mirror of it, so get rid of it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set encryption name flag in add inline entry path

This patch sets encryption name flag in the add inline entry path
if filename is encrypted.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix lost xattrs of directories

This patch enhances the xattr consistency of dirs from suddern power-cuts.

Possible scenario would be:
1. dir->setxattr used by per-file encryption
2. file->setxattr goes into inline_xattr
3. file->fsync

In that case, we should do checkpoint for #1.
Otherwise we'd lose dir's key information for the file given #2.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add roll-forward recovery process for encrypted dentry

Add roll-forward recovery process for encrypted dentry, so the first fsync
issued to an encrypted file does not need writing checkpoint.

This improves the performance of the following test at thousands of small
files: open -> write -> fsync -> close

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify kernel message to show encrypted names]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/dir.c
	fs/f2fs/f2fs.h

f2fs: fix to set superblock dirty correctly

tests/generic/251 of fstest suit complains us with below message:

------------[ cut here ]------------
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 7698 Comm: fstrim Tainted: G           O    4.7.0+ #21
task: e9f4e000 task.stack: e7262000
EIP: 0060:[<f89fcefe>] EFLAGS: 00010202 CPU: 2
EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
EAX: f33eb300 EBX: eecac310 ECX: 00000001 EDX: ffff0001
ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
Stack:
 00000001 a220fb7b e9f4e000 00000002 419ff2d3 b3a05151 00000002 e9f4e5d8
 e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
 e9f4e000 e9f4e000 e9f4e5d8 00000001 e9f4e000 ec409000 eecac2cc eecac288
Call Trace:
 [<c10b8154>] ? __lock_acquire+0x3c4/0x760
 [<c10b78bd>] ? mark_held_locks+0x5d/0x80
 [<f8a10632>] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
 [<f89e9f56>] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
 [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
 [<c10b4281>] ? trace_hardirqs_off_caller+0x91/0x120
 [<f89e98a0>] ? __exchange_data_block+0xd30/0xd30 [f2fs]
 [<c120b2e1>] do_vfs_ioctl+0x81/0x7f0
 [<c11d57c5>] ? kmem_cache_free+0x245/0x2e0
 [<c1217840>] ? get_unused_fd_flags+0x40/0x40
 [<c1206eec>] ? putname+0x4c/0x50
 [<c11f631e>] ? do_sys_open+0x16e/0x1d0
 [<c1001990>] ? do_fast_syscall_32+0x30/0x1c0
 [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
 [<c120baa8>] SyS_ioctl+0x58/0x80
 [<c1001a01>] do_fast_syscall_32+0xa1/0x1c0
 [<c178cc54>] sysenter_past_esp+0x45/0x74
EIP: [<f89fcefe>] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
---[ end trace 4de95d7e6b3aa7c6 ]---

The reason is: with below call stack, we will encounter BUG_ON during
doing fstrim.

Thread A				Thread B
- write_checkpoint
 - do_checkpoint
					- f2fs_write_inode
					 - update_inode_page
					  - update_inode
					   - set_page_dirty
					    - f2fs_set_node_page_dirty
					     - inc_page_count
					      - percpu_counter_inc
					      - set_sbi_flag(SBI_IS_DIRTY)
  - clear_sbi_flag(SBI_IS_DIRTY)

Thread C				Thread D
- f2fs_write_node_page
 - set_node_addr
  - __set_nat_cache_dirty
   - nm_i->dirty_nat_cnt++
					- do_vfs_ioctl
					 - f2fs_ioctl
					  - f2fs_trim_fs
					   - write_checkpoint
					    - f2fs_bug_on(nm_i->dirty_nat_cnt)

Fix it by setting superblock dirty correctly in do_checkpoint and
f2fs_write_node_page.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set dentry bits on random location in memory

This fixes pointer panic when using inline_dentry, which was triggered when
backporting to 3.10.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to detect temporary name of multimedia file

Some applications may create multimeida file with temporary name like
'*.jpg.tmp' or '*.mp4.tmp', then rename to '*.jpg' or '*.mp4'.

Now, f2fs can only detect multimedia filename with specified format:
"filename + '.' + extension", so it will make f2fs missing to detect
multimedia file with special temporary name, result in failing to set
cold flag on file.

This patch enhances detection flow for enabling lookup extension in the
middle of temporary filename.

Reported-by: Xue Liu <liuxueliu.liu@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: no need to make zeros beyond i_size

We don't need to make zeros beyond i_size, since we already wrote that through
NEW_ADDR case.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid page allocation for truncating partial inline_data

When truncating cached inline_data, we don't need to allocate a new page
all the time. Instead, it must check its page cache only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: forbid to do fstrim if fs has some error

This patch skip fstrim if sbi set SBI_NEED_FSCK flag

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check free_sections for defragmentation

Fix wrong condition check for defragmentation of a file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add common iget in add_fsync_inode

There is no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid ENOMEM during roll-forward recovery

This patch gives another chances during roll-forward recovery regarding to
-ENOMEM.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to set PageUptodate in f2fs_write_end correctly

Previously, f2fs_write_begin sets PageUptodate all the time. But, when user
tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider
that the page is able to be copied partially afterwards. In such the case,
we will lose the remaing region in the page.

This patch fixes this by setting PageUptodate in f2fs_write_end as given copied
result. In the short copy case, it returns zero to let generic_perform_write
retry copying user data again.

As a result, f2fs_write_end() works:
   PageUptodate      len      copied    return   retry
1. no                4096     4096      4096     false  -> return 4096
2. no                4096     1024      0        true   -> goto #1 case
3. yes               2048     2048      2048     false  -> return 2048
4. yes               2048     1024      1024     false  -> return 1024

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove dead code f2fs_check_acl

The macro f2fs_check_acl is defined but never used since
the initial commit, this patch removes the code that has
been dead for several years.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: handle error in recover_orphan_inode

This patch enhances the error path in recover_orphan_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: make f2fs_filetype_table static

There is no more user of f2fs_filetype_table outside of dir.c, make it
static.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to return error number of read_all_xattrs correctly

We treat all error in read_all_xattrs as a no memory error, which covers
the real reason of failure in it. Fix it by return correct errno in order
to reflect the real cause.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support IO error injection

This patch adds to support IO error injection for testing IO error
tolerance of f2fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: show dirty inode number

This patch enables showing dirty inode number in procfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: preallocate blocks for encrypted file

This patch allow preallocates data blocks for buffered aio writes
in encrypted file.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix to avoid BUG_ON]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c

f2fs: use crc and cp version to determine roll-forward recovery

Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.

There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/recovery.c

f2fs: put directory inodes before checkpoint in roll-forward recovery

Before checkpoint, we'd be better drop any inodes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to avoid race condition when updating sbi flag

Making updating of sbi flag atomic by using {test,set,clear}_bit,
otherwise in concurrency scenario, the flag could be updated incorrectly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce cp_lock to protect updating of ckpt_flags

This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: assign return value in f2fs_gc

This patch adds a return value of write_checkpoint for f2fs_gc.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: should put_page for summary page

We should call put_page for preloaded summary pages in do_garbage_collect.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid gc in cp_error case

Otherwise, we can hit
	f2fs_bug_on(sbi, !PageUptodate(sum_page));

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: handle errors during recover_orphan_inodes

This patch fixes to handle EIO during recover_orphan_inode() given the below
panic.

F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs]
------------[ cut here ]------------
RIP: 0010:[<ffffffffc0b244e3>]  [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs]
RSP: 0018:ffff92f8b7fb7c30  EFLAGS: 00010246
RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c
RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8
RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58
R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb
R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620
FS:  00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0
Stack:
 ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28
 ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0
 ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d
Call Trace:
 [<ffffffffbc27ff55>] evict+0xc5/0x1a0
 [<ffffffffbc28090d>] iput+0x1ad/0x2c0
 [<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs]
 [<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs]
 [<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0
 [<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs]
 [<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs]
 [<ffffffffbc264e49>] mount_fs+0x39/0x170
 [<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160
 [<ffffffffbc2881df>] do_mount+0x1cf/0xd00
 [<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170
 [<ffffffffbc289003>] SyS_mount+0x83/0xd0
 [<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not unnecessarily null-terminate encrypted symlink data

Null-terminating the fscrypt_symlink_data on read is unnecessary because
it is not string data --- it contains binary ciphertext.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/namei.c

f2fs: remove dirty inode pages in error path

When getting EIO while handling orphan inodes, we can get some dirty node
pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
to flush node pages. But in this case, we should prevent to do that, since
we will try again from the start.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: adjust display format of segment bit

Just adjust segment bit info printed in procfs.

Before:
1008      5|0  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1009      3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13
1010      3|1  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After:
1008      5|0  | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1009      4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef
1010      4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support configuring fault injection per superblock

Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.

It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.

>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/super.c

f2fs: remove redundant value definition

This patch remove redundant value definition in build_sit_entries

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do fault injection initialization in default_options

Do fault injection initialization in default_options to keep consistent
with other default option configurating.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to recover old fault injection config in ->remount_fs

In ->remount_fs, we didn't recover original fault injection config if
we encounter error, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support checkpoint error injection

This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance, it will be useful that it can simulate abnormal
power off by f2fs itself instead of calling godown ioctl by running apps.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove redundant io plug

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove dead variable

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce get_checkpoint_version for cleanup

There exists almost same codes when get the value of pre_version
and cur_version in function validate_checkpoint, this patch adds
get_checkpoint_version to clean up redundant codes.

Change-Id: I37ae90110b3864970f3118794b3186b6d7799705
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to commit bio cache after flushing node pages

In sync_node_pages, we won't check and commit last merged pages in private
bio cache of f2fs, as these pages were taged as writeback, someone who is
waiting for writebacking of the page will be blocked until the cache was
committed by someone else.

We need to commit node type bio cache to avoid potential deadlock or long
delay of waiting writeback.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't submit irrelevant page

While we call ->writepages, there are two cases:
a. we didn't writeout any dirty pages, since they are writebacked by other
thread concurrently.
b. we writeout dirty pages, and have already submitted bio to block layer.

In these cases, we don't need to do additional bio flushing unnecessarily,
it may split bio in cache into smaller one.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce update_ckpt_flags to clean up

This patch add update_ckpt_flags() to clean up the flow.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix overflow due to condition check order

In the last ilen case, i was already increased, resulting in accessing out-
of-boundary entry of do_replace and blkaddr.
Fix to check ilen first to exit the loop.

Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix wrong sum_page pointer in f2fs_gc

This patch fixes using a wrong pointer for sum_page in f2fs_gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: exclude free nids building and allocation

During nid allocation, it needs to exclude building and allocating flow
of free nids, this is because while building free nid cache, there are two
steps: a) load free nids from unused nat entries in NAT pages, b) update
free nid cache by checking nat journal. The two steps should be atomical,
otherwise an used nid can be allocated as free one after a) and before b).

This patch adds missing lock which covers build_free_nids in
unlock_operation and f2fs_balance_fs_bg to avoid that.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to release discard entries during checkpoint

In f2fs_fill_super, if there is any IO error occurs during recovery,
cached discard entries will be leaked, in order to avoid this, make
write_checkpoint() handle memory release by itself, besides, move
clear_prefree_segments to write_checkpoint for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give a chance to detach from dirty list

If there is no dirty pages in inode, we should give a chance to detach
the inode from global dirty list, otherwise it needs to call another
unnecessary .writepages for detaching.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add missing f2fs_balance_fs in f2fs_zero_range

f2fs_balance_fs should be called in between node page updating, otherwise
node page count will exceeded far beyond watermark of triggering
foreground garbage collection, result in facing high risk of hitting LFS
allocation failure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't miss any f2fs_balance_fs cases

In f2fs_map_blocks, let f2fs_balance_fs detects node page modification
with dn.node_changed to avoid miss some corner cases.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: be aware of extent beyond EOF in fiemap

f2fs can support fallocating blocks beyond file size without changing the
size, but ->fiemap of f2fs was restricted and can't detect these extents
fallocated past EOF, now relieve the restriction.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to update largest extent under lock

In order to avoid racing problem, make largest extent cache being updated
under lock.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix error handling in fsync_node_pages

In fsync_node_pages, if f2fs was taged with CP_ERROR_FLAG, make sure bio
cache was flushed before return.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix sparse warnings

f2fs contained a number of endianness conversion bugs.

Also, one function should have been 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/'

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clear nlink if fail to add_link

We don't need to keep incomplete created inode in cache, so if we fail to
add link into directory during new inode creation, it's better to set
nlink of inode to zero, then we can evict inode immediately. Otherwise
release of nid belong to inode will be delayed until inode cache is being
shrunk, it may cause a seemingly endless loop while allocating free nids
in time of testing generic/269 case of fstest suit.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add update_inode_page to fix kernel panic]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: split free nid list

During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.

This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.

Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clean up free nid list operations

This patch cleans up to use consistent free nid list ops.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't interrupt free nids building during nid allocation

Let build_free_nids support sync/async methods, in allocation flow of nids,
we use synchronuous method, so that we can avoid looping in alloc_nid when
free memory is low; in unblock_operations and f2fs_balance_fs_bg we use
asynchronuous method in where low memory condition can interrupt us.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid casted negative value as shrink count

This patch makes sure it returns a positive value instead of a probable
casted negative value as shrink count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: count dirty inodes to flush node pages during checkpoint

If there are a lot of dirty inodes, we need to flush all of them when doing
checkpoint. So, we need to count this for enough free space.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: call f2fs_balance_fs for setattr

If inode becomes dirty, we need to check the # of dirty inodes whether or not
further checkpoint would be required.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: declare static function for __build_free_nids

This patch avoids build warning.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use BIO_MAX_PAGES for bio allocation

We don't need to allocate bio partially in order to maximize sequential writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps

This is for backport only.

fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: keep dirty inodes selectively for checkpoint

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/acl.c
	fs/f2fs/inode.c
	fs/f2fs/namei.c

Change-Id: I61ca2023b2aa70525d1e2c21d8e87bb4d75eadf9

f2fs: make clean inodes when flushing inode page

This patch tries to make more clean inodes when flushing dirty inodes in
checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove percpu_count due to performance regression

This patch removes percpu_count usage due to performance regression in iozone.

Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/super.c

f2fs: hide a maybe-uninitialized warning

gcc is unsure about the use of last_ofs_in_node, which might happen
without a prior initialization:

fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {

As pointed out by Chao Yu, the code is actually correct as 'prealloc'
is only set if the last_ofs_in_node has been set, the two always
get updated together.

This initializes last_ofs_in_node to dn.ofs_in_node for each
new dnode at the start of the 'next_block' loop, which at that
point is a correct initialization as well. I assume that compilers
that correctly track the contents of the variables and do not
warn about the condition also figure out that they can eliminate
the extra assignment here.

Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

fs/crypto: catch up 4.9-rc2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: report error of f2fs_fill_dentries

Report error of f2fs_fill_dentries to ->iterate_shared, otherwise when
error ocurrs, user may just list part of dirents in target directory
without any hints.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/dir.c
	fs/f2fs/f2fs.h
	fs/f2fs/inline.c

f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes

This patch should fix an infinite loop case below.

F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs]
F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix.
...
[<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs]
[<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs]
[<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs]
[<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
[<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs]
[<ffffffffb41dff21>] do_writepages+0x21/0x30
[<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760
[<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190
[<ffffffffb42a0959>] write_inode_now+0x99/0xc0
[<ffffffffb4289a16>] iput+0x1f6/0x2c0
[<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs]
[<ffffffffb426c394>] ? sget_userns+0x4f4/0x530
[<ffffffffb426c692>] mount_bdev+0x182/0x1b0
[<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffb426d038>] mount_fs+0x38/0x170
[<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160
[<ffffffffb4291d9e>] do_mount+0x1be/0xd60
[<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220
[<ffffffffb4292c54>] SyS_mount+0x94/0xd0
[<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Add missing break in switch-case

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Use generic zoned block device terminology

SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Check zoned block feature for host-managed zoned block devices

The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted
 with zone alignment optimization. This is optional for host-aware
devices, but mandatory for host-managed zoned block devices.
So check that the feature is set in this latter case.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Suppress discard warning message for zoned block devices

For zoned block devices, discard is replaced by zone reset. So
do not warn if the device does not supports discard.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Always enable discard for zoned blocks devices

Zone write pointer reset acts as discard for zoned block
devices. So if the zoned block device feature is enabled,
always declare that discard is enabled, even if the device
does not actually support the command.
For the same reason, prevent the use the "nodicard" mount
option.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Do not allow adaptive mode for host-managed zoned block devices

The LFS mode is mandatory for host-managed zoned block devices as
update in place optimizations are not possible for segments in
sequential zones.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Cache zoned block devices zone type

With the zoned block device feature enabled, section discard
need to do a zone reset for sections contained in sequential
zones, and a regular discard (if supported) for sections
stored in conventional zones. Avoid the need for a costly
report zones to obtain a section zone type when discarding it
by caching the types of the device zones in the super block
information. This cache is initialized at mount time for mounts
with the zoned block device feature enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: Reset sequential zones on zoned block devices

When a zoned block device is mounted, discarding sections
contained in sequential zones must reset the zone write pointer.
For sections contained in conventional zones, the regular discard
is used if the drive supports it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/segment.c

f2fs: Trace reset zone events

Similarly to the regular discard, trace zone reset events.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: record inode updating status correctly

We should record updating status of inode only for living inode, for those
unlinked inode it needs to clear its ino cache, otherwise after the ino
was been reused, it will cause unneeded node page writing during ->fsync.

Change-Id: I8213bae8a591d7e6df1243e42ae8c19ac084afa1
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix wrong i_atime recovery

Shouldn't update in-memory i_atime with on-disk i_mtime of inode when
recovering inode.

Shuoran found this bug which is hidden for a long time, honour is belong
to him.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: assign segments correctly for direct_io

Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or
four logs, we do not use that type at all.
Let's fix it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove checkpoint in f2fs_freeze

The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
it triggers circular locking problem below due to gc_mutex for checkpoint.

======================================================
[ INFO: possible circular locking dependency detected ]
4.9.0-rc1+ #132 Tainted: G           OE
-------------------------------------------------------

1. wait for __sb_start_write() by

 [<ffffffff9845f353>] dump_stack+0x85/0xc2
 [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
 [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff98289991>] iput+0x171/0x2c0
 [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
 [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
 [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
 [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
 [<ffffffff9826ca3e>] iterate_supers+0xae/0x100
 [<ffffffff982a50b5>] sys_sync+0x55/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

2. wait for sbi->gc_mutex by

 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
 [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
 [<ffffffff9826b6ef>] freeze_super+0xcf/0x190
 [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
 [<ffffffff9827f099>] SyS_ioctl+0x79/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Revert "f2fs: do not recover from previous remained wrong dnodes"

i_times of inode will be set with current system time which can be
configured through 'date', so it's not safe to judge dnode block as
garbage data or unchanged inode depend on i_times.

Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
dnode block, so I expect recoverying invalid dnode is almost not
possible.

This reverts commit 807b1e1c8e08452948495b1a9985ab46d329e5c2.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: return directly if block has been removed from the victim

If one block has been to written to a new place, just return
in move data process. This patch check it again with holding
page lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: revert segment allocation for direct IO

Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.

Revert: 38aa0889b250 (f2fs: align direct_io'ed data to section)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: allow dio read for LFS mode

We can allow dio reads for LFS mode, while doing buffered writes for dio writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use err for f2fs_preallocate_blocks

This patch has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c
	fs/f2fs/f2fs.h
	fs/f2fs/file.c

f2fs: avoid BG_GC in f2fs_balance_fs

If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
time, all the threads would do FG_GC or BG_GC.
In this critical path, we totally don't need to do BG_GC at all.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix wrong written_valid_blocks counting

Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
Fix it to get the number from actual written block count from sit entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't wait writeback for datas during checkpoint

Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.

Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c

f2fs: fix an infinite loop when flush nodes in cp

Thread A			Thread B

- write_checkpoint
 - block_operations
   -blk_start_plug
    -sync_node_pages		- f2fs_do_sync_file
				 - fsync_node_pages
				  - f2fs_wait_on_page_writeback

Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
it start a plug list, some requests have been added to this list.
Thread B lock one dirty node page, and wait this page write back.
But this page has been in plug list of thread A with PG_writeback flag.
Thread A keep on running and its plug list has no chance to finish,
so it seems a deadlock between cp and fsync path.

This patch add a wait on page write back before set node page dirty
to avoid this problem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to account total free nid correctly

Thread A		Thread B		Thread C
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
    alloc last nid
  - f2fs_unlock_op
			- f2fs_create
			 - f2fs_new_inode
			  - f2fs_lock_op
			   - alloc_nid
			    as node count still not
			    be increased, we will
			    loop in alloc_nid
						- f2fs_write_node_pages
						 - f2fs_balance_fs_bg
						  - f2fs_sync_fs
						   - write_checkpoint
						    - block_operations
						     - f2fs_lock_all
 - f2fs_lock_op

While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.

In order to avoid that, reuse nm_i::available_nids for accounting free nids
and make nid allocation and counting being atomical during node creation.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix fdatasync

For below two cases, we can't guarantee data consistence:

a)
1. xfs_io "pwrite 0 4195328" "fsync"
2. xfs_io "pwrite 4195328 1024" "fdatasync"
3. godown
4. umount & mount
--> isize we updated before fdatasync won't be recovered

b)
1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
2. xfs_io "fpunch 4194304 4096" "fdatasync"
3. godown
4. umount & mount
--> dnode we punched before fdatasync won't be recovered

The reason is that normally fdatasync won't be aware of modification
of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
we will skip flushing node pages for above cases, result in making
fdatasynced file being lost during recovery.

Currently we have introduced DIRTY_META global list in sbi for tracking
dirty inode selectively, so in fdatasync we can choose to flush nodes
depend on dirty state of current inode in the list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not recover i_size if it's valid

If i_size is already valid during roll_forward recovery, we should not update
it according to the block alignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix wrong AUTO_RECOVER condition

If i_size is not aligned to the f2fs's block size, we should not skip inode
update during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: drop duplicate header timer.h

Drop duplicate header timer.h from segment.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix incorrect free inode count in ->statfs

While calculating inode count that we can create at most in the left space,
we should consider space which data/node blocks occupied, since we create
data/node mixly in main area. So fix the wrong calculation in ->statfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set ->owner for debugfs status file's file_operations

The struct file_operations instance serving the f2fs/status debugfs file
lacks an initialization of its ->owner.

This means that although that file might have been opened, the f2fs module
can still get removed. Any further operation on that opened file, releasing
included,  will cause accesses to unmapped memory.

Indeed, Mike Marshall reported the following:

  BUG: unable to handle kernel paging request at ffffffffa0307430
  IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
  <...>
  Call Trace:
   [] __fput+0xdf/0x1d0
   [] ____fput+0xe/0x10
   [] task_work_run+0x8e/0xc0
   [] do_exit+0x2ae/0xae0
   [] ? __audit_syscall_entry+0xae/0x100
   [] ? syscall_trace_enter+0x1ca/0x310
   [] do_group_exit+0x44/0xc0
   [] SyS_exit_group+0x14/0x20
   [] do_syscall_64+0x61/0x150
   [] entry_SYSCALL64_slow_path+0x25/0x25
  <...>
  ---[ end trace f22ae883fa3ea6b8 ]---
  Fixing recursive fault but reboot is needed!

Fix this by initializing the f2fs/status file_operations' ->owner with
THIS_MODULE.

This will allow debugfs to grab a reference to the f2fs module upon any
open on that file, thus preventing it from getting removed.

Fixes: 902829aa0b72 ("f2fs: move proc files to debugfs")
Change-Id: I0716b4c2fc2e9a391ff8beb9d3eae3ed67a66ace
Reported-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix 32-bit build

The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED
on 32-bit machines because of a 64-bit division:

fs/f2fs/f2fs.o: In function `__issue_discard_async':
extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod'

Unfortunately, the sector number is usually a 64-bit number, and
we guarantee that bdev_zone_size() returns a power-of-two number.

Fixes: 792b84b74b54 ("f2fs: support multiple devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack

We don't guarantee cp_addr is fixed by cp_version.
This is to sync with f2fs-tools.

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not activate auto_recovery for fallocated i_size

If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.

This will resolve the below scenario.

1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file

Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage

We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: call sync_fs when f2fs is idle

The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: detect wrong layout

Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect
that as well.

Refer this in f2fs-tools.

mkfs.f2fs: detect small partition by overprovision ratio and # of segments

Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: free meta pages if sanity check for ckpt is failed

This fixes missing freeing meta pages in the error case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to access nullified flush_cmd_control pointer

f2fs_sync_file()             remount_ro
 - f2fs_readonly
                               - destroy_flush_cmd_control
 - f2fs_issue_flush
   - no fcc pointer!

So, this patch doesn't free fcc in this case, but just stop its kernel thread
which sends flush commands.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix a missing size change in f2fs_setattr

This patch fix a missing size change in f2fs_setattr

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use file pointer for fscrypt_notsupp_process_policy

Change-Id: I69b5cc5213ea1a91a856077fcf3a6e878026251a
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: D. Andrei Măceș <dmaces@nd.edu>
Jaegeuk Kim authored and airend committed Jan 12, 2018
1 parent c25e394 commit d0b62d5
Showing 29 changed files with 1,691 additions and 1,042 deletions.
1 change: 1 addition & 0 deletions Documentation/filesystems/f2fs.txt
Original file line number Diff line number Diff line change
@@ -131,6 +131,7 @@ inline_dentry Enable the inline dir feature: data in new created
directory entries can be written into inode block. The
space of inode block which is used to store inline
dentries is limited to ~3.4k.
noinline_dentry Diable the inline dentry feature.
flush_merge Merge concurrent cache_flush commands as much as possible
to eliminate redundant command issues. If the underlying
device handles the cache_flush command relatively slowly,
18 changes: 9 additions & 9 deletions fs/crypto/crypto.c
Original file line number Diff line number Diff line change
@@ -129,11 +129,11 @@ struct fscrypt_ctx *fscrypt_get_ctx(struct inode *inode, gfp_t gfp_flags)
EXPORT_SYMBOL(fscrypt_get_ctx);

/**
* fscrypt_complete() - The completion callback for page encryption
* @req: The asynchronous encryption request context
* @res: The result of the encryption operation
* page_crypt_complete() - completion callback for page crypto
* @req: The asynchronous cipher request context
* @res: The result of the cipher operation
*/
static void fscrypt_complete(struct crypto_async_request *req, int res)
static void page_crypt_complete(struct crypto_async_request *req, int res)
{
struct fscrypt_completion_result *ecr = req->data;

@@ -171,18 +171,18 @@ static int do_page_crypto(struct inode *inode,

ablkcipher_request_set_callback(
req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
fscrypt_complete, &ecr);
page_crypt_complete, &ecr);

BUILD_BUG_ON(FS_XTS_TWEAK_SIZE < sizeof(index));
memcpy(xts_tweak, &index, sizeof(index));
memset(&xts_tweak[sizeof(index)], 0,
FS_XTS_TWEAK_SIZE - sizeof(index));

sg_init_table(&dst, 1);
sg_set_page(&dst, dest_page, PAGE_CACHE_SIZE, 0);
sg_set_page(&dst, dest_page, PAGE_SIZE, 0);
sg_init_table(&src, 1);
sg_set_page(&src, src_page, PAGE_CACHE_SIZE, 0);
ablkcipher_request_set_crypt(req, &src, &dst, PAGE_CACHE_SIZE,
sg_set_page(&src, src_page, PAGE_SIZE, 0);
ablkcipher_request_set_crypt(req, &src, &dst, PAGE_SIZE,
xts_tweak);
if (rw == FS_DECRYPT)
res = crypto_ablkcipher_decrypt(req);
@@ -292,7 +292,7 @@ int fscrypt_zeroout_range(struct inode *inode, pgoff_t lblk,
struct bio *bio;
int ret, err = 0;

BUG_ON(inode->i_sb->s_blocksize != PAGE_CACHE_SIZE);
BUG_ON(inode->i_sb->s_blocksize != PAGE_SIZE);

ctx = fscrypt_get_ctx(inode, GFP_NOFS);
if (IS_ERR(ctx))
83 changes: 43 additions & 40 deletions fs/crypto/fname.c
Original file line number Diff line number Diff line change
@@ -19,15 +19,12 @@
#include <linux/ratelimit.h>
#include <linux/fscrypto.h>

static u32 size_round_up(size_t size, size_t blksize)
{
return ((size + blksize - 1) / blksize) * blksize;
}

/**
* dir_crypt_complete() -
* fname_crypt_complete() - completion callback for filename crypto
* @req: The asynchronous cipher request context
* @res: The result of the cipher operation
*/
static void dir_crypt_complete(struct crypto_async_request *req, int res)
static void fname_crypt_complete(struct crypto_async_request *req, int res)
{
struct fscrypt_completion_result *ecr = req->data;

@@ -38,11 +35,11 @@ static void dir_crypt_complete(struct crypto_async_request *req, int res)
}

/**
* fname_encrypt() -
* fname_encrypt() - encrypt a filename
*
* This function encrypts the input filename, and returns the length of the
* ciphertext. Errors are returned as negative numbers. We trust the caller to
* allocate sufficient memory to oname string.
* The caller must have allocated sufficient memory for the @oname string.
*
* Return: 0 on success, -errno on failure
*/
static int fname_encrypt(struct inode *inode,
const struct qstr *iname, struct fscrypt_str *oname)
@@ -63,10 +60,9 @@ static int fname_encrypt(struct inode *inode,
if (iname->len <= 0 || iname->len > lim)
return -EIO;

ciphertext_len = (iname->len < FS_CRYPTO_BLOCK_SIZE) ?
FS_CRYPTO_BLOCK_SIZE : iname->len;
ciphertext_len = size_round_up(ciphertext_len, padding);
ciphertext_len = (ciphertext_len > lim) ? lim : ciphertext_len;
ciphertext_len = max(iname->len, (u32)FS_CRYPTO_BLOCK_SIZE);
ciphertext_len = round_up(ciphertext_len, padding);
ciphertext_len = min(ciphertext_len, lim);

if (ciphertext_len <= sizeof(buf)) {
workbuf = buf;
@@ -87,7 +83,7 @@ static int fname_encrypt(struct inode *inode,
}
ablkcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
dir_crypt_complete, &ecr);
fname_crypt_complete, &ecr);

/* Copy the input */
memcpy(workbuf, iname->name, iname->len);
@@ -108,20 +104,22 @@ static int fname_encrypt(struct inode *inode,
}
kfree(alloc_buf);
ablkcipher_request_free(req);
if (res < 0)
if (res < 0) {
printk_ratelimited(KERN_ERR
"%s: Error (error code %d)\n", __func__, res);
return res;
}

oname->len = ciphertext_len;
return res;
return 0;
}

/*
* fname_decrypt()
* This function decrypts the input filename, and returns
* the length of the plaintext.
* Errors are returned as negative numbers.
* We trust the caller to allocate sufficient memory to oname string.
/**
* fname_decrypt() - decrypt a filename
*
* The caller must have allocated sufficient memory for the @oname string.
*
* Return: 0 on success, -errno on failure
*/
static int fname_decrypt(struct inode *inode,
const struct fscrypt_str *iname,
@@ -149,7 +147,7 @@ static int fname_decrypt(struct inode *inode,
}
ablkcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
dir_crypt_complete, &ecr);
fname_crypt_complete, &ecr);

/* Initialize IV */
memset(iv, 0, FS_CRYPTO_BLOCK_SIZE);
@@ -171,7 +169,7 @@ static int fname_decrypt(struct inode *inode,
}

oname->len = strnlen(oname->name, iname->len);
return oname->len;
return 0;
}

static const char *lookup_table =
@@ -234,9 +232,8 @@ u32 fscrypt_fname_encrypted_size(struct inode *inode, u32 ilen)

if (ci)
padding = 4 << (ci->ci_flags & FS_POLICY_FLAGS_PAD_MASK);
if (ilen < FS_CRYPTO_BLOCK_SIZE)
ilen = FS_CRYPTO_BLOCK_SIZE;
return size_round_up(ilen, padding);
ilen = max(ilen, (u32)FS_CRYPTO_BLOCK_SIZE);
return round_up(ilen, padding);
}
EXPORT_SYMBOL(fscrypt_fname_encrypted_size);

@@ -282,6 +279,10 @@ EXPORT_SYMBOL(fscrypt_fname_free_buffer);
/**
* fscrypt_fname_disk_to_usr() - converts a filename from disk space to user
* space
*
* The caller must have allocated sufficient memory for the @oname string.
*
* Return: 0 on success, -errno on failure
*/
int fscrypt_fname_disk_to_usr(struct inode *inode,
u32 hash, u32 minor_hash,
@@ -290,13 +291,12 @@ int fscrypt_fname_disk_to_usr(struct inode *inode,
{
const struct qstr qname = FSTR_TO_QSTR(iname);
char buf[24];
int ret;

if (fscrypt_is_dot_dotdot(&qname)) {
oname->name[0] = '.';
oname->name[iname->len - 1] = '.';
oname->len = iname->len;
return oname->len;
return 0;
}

if (iname->len < FS_CRYPTO_BLOCK_SIZE)
@@ -306,9 +306,9 @@ int fscrypt_fname_disk_to_usr(struct inode *inode,
return fname_decrypt(inode, iname, oname);

if (iname->len <= FS_FNAME_CRYPTO_DIGEST_SIZE) {
ret = digest_encode(iname->name, iname->len, oname->name);
oname->len = ret;
return ret;
oname->len = digest_encode(iname->name, iname->len,
oname->name);
return 0;
}
if (hash) {
memcpy(buf, &hash, 4);
@@ -318,15 +318,18 @@ int fscrypt_fname_disk_to_usr(struct inode *inode,
}
memcpy(buf + 8, iname->name + iname->len - 16, 16);
oname->name[0] = '_';
ret = digest_encode(buf, 24, oname->name + 1);
oname->len = ret + 1;
return ret + 1;
oname->len = 1 + digest_encode(buf, 24, oname->name + 1);
return 0;
}
EXPORT_SYMBOL(fscrypt_fname_disk_to_usr);

/**
* fscrypt_fname_usr_to_disk() - converts a filename from user space to disk
* space
*
* The caller must have allocated sufficient memory for the @oname string.
*
* Return: 0 on success, -errno on failure
*/
int fscrypt_fname_usr_to_disk(struct inode *inode,
const struct qstr *iname,
@@ -336,7 +339,7 @@ int fscrypt_fname_usr_to_disk(struct inode *inode,
oname->name[0] = '.';
oname->name[iname->len - 1] = '.';
oname->len = iname->len;
return oname->len;
return 0;
}
if (inode->i_crypt_info)
return fname_encrypt(inode, iname, oname);
@@ -370,10 +373,10 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
if (dir->i_crypt_info) {
ret = fscrypt_fname_alloc_buffer(dir, iname->len,
&fname->crypto_buf);
if (ret < 0)
if (ret)
return ret;
ret = fname_encrypt(dir, iname, &fname->crypto_buf);
if (ret < 0)
if (ret)
goto errout;
fname->disk_name.name = fname->crypto_buf.name;
fname->disk_name.len = fname->crypto_buf.len;
77 changes: 48 additions & 29 deletions fs/crypto/keyinfo.c
Original file line number Diff line number Diff line change
@@ -75,10 +75,8 @@ static int derive_key_aes(u8 deriving_key[FS_AES_128_ECB_KEY_SIZE],
res = ecr.res;
}
out:
if (req)
ablkcipher_request_free(req);
if (tfm)
crypto_free_ablkcipher(tfm);
ablkcipher_request_free(req);
crypto_free_ablkcipher(tfm);
return res;
}

@@ -143,13 +141,44 @@ static int validate_user_key(struct fscrypt_info *crypt_info,
return res;
}

static int determine_cipher_type(struct fscrypt_info *ci, struct inode *inode,
const char **cipher_str_ret, int *keysize_ret)
{
if (S_ISREG(inode->i_mode)) {
if (ci->ci_data_mode == FS_ENCRYPTION_MODE_AES_256_XTS) {
*cipher_str_ret = "xts(aes)";
*keysize_ret = FS_AES_256_XTS_KEY_SIZE;
return 0;
}
pr_warn_once("fscrypto: unsupported contents encryption mode "
"%d for inode %lu\n",
ci->ci_data_mode, inode->i_ino);
return -ENOKEY;
}

if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) {
if (ci->ci_filename_mode == FS_ENCRYPTION_MODE_AES_256_CTS) {
*cipher_str_ret = "cts(cbc(aes))";
*keysize_ret = FS_AES_256_CTS_KEY_SIZE;
return 0;
}
pr_warn_once("fscrypto: unsupported filenames encryption mode "
"%d for inode %lu\n",
ci->ci_filename_mode, inode->i_ino);
return -ENOKEY;
}

pr_warn_once("fscrypto: unsupported file type %d for inode %lu\n",
(inode->i_mode & S_IFMT), inode->i_ino);
return -ENOKEY;
}

static void put_crypt_info(struct fscrypt_info *ci)
{
if (!ci)
return;

if (ci->ci_keyring_key)
key_put(ci->ci_keyring_key);
key_put(ci->ci_keyring_key);
crypto_free_ablkcipher(ci->ci_ctfm);
kmem_cache_free(fscrypt_info_cachep, ci);
}
@@ -160,8 +189,8 @@ int get_crypt_info(struct inode *inode)
struct fscrypt_context ctx;
struct crypto_ablkcipher *ctfm;
const char *cipher_str;
int keysize;
u8 raw_key[FS_MAX_KEY_SIZE];
u8 mode;
int res;

res = fscrypt_initialize();
@@ -184,13 +213,19 @@ int get_crypt_info(struct inode *inode)
if (res < 0) {
if (!fscrypt_dummy_context_enabled(inode))
return res;
ctx.format = FS_ENCRYPTION_CONTEXT_FORMAT_V1;
ctx.contents_encryption_mode = FS_ENCRYPTION_MODE_AES_256_XTS;
ctx.filenames_encryption_mode = FS_ENCRYPTION_MODE_AES_256_CTS;
ctx.flags = 0;
} else if (res != sizeof(ctx)) {
return -EINVAL;
}
res = 0;

if (ctx.format != FS_ENCRYPTION_CONTEXT_FORMAT_V1)
return -EINVAL;

if (ctx.flags & ~FS_POLICY_FLAGS_VALID)
return -EINVAL;

crypt_info = kmem_cache_alloc(fscrypt_info_cachep, GFP_NOFS);
if (!crypt_info)
@@ -203,27 +238,11 @@ int get_crypt_info(struct inode *inode)
crypt_info->ci_keyring_key = NULL;
memcpy(crypt_info->ci_master_key, ctx.master_key_descriptor,
sizeof(crypt_info->ci_master_key));
if (S_ISREG(inode->i_mode))
mode = crypt_info->ci_data_mode;
else if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
mode = crypt_info->ci_filename_mode;
else
BUG();

switch (mode) {
case FS_ENCRYPTION_MODE_AES_256_XTS:
cipher_str = "xts(aes)";
break;
case FS_ENCRYPTION_MODE_AES_256_CTS:
cipher_str = "cts(cbc(aes))";
break;
default:
printk_once(KERN_WARNING
"%s: unsupported key mode %d (ino %u)\n",
__func__, mode, (unsigned) inode->i_ino);
res = -ENOKEY;

res = determine_cipher_type(crypt_info, inode, &cipher_str, &keysize);
if (res)
goto out;
}

if (fscrypt_dummy_context_enabled(inode)) {
memset(raw_key, 0x42, FS_AES_256_XTS_KEY_SIZE);
goto got_key;
@@ -259,7 +278,7 @@ int get_crypt_info(struct inode *inode)
crypto_ablkcipher_clear_flags(ctfm, ~0);
crypto_tfm_set_flags(crypto_ablkcipher_tfm(ctfm),
CRYPTO_TFM_REQ_WEAK_KEY);
res = crypto_ablkcipher_setkey(ctfm, raw_key, fscrypt_key_size(mode));
res = crypto_ablkcipher_setkey(ctfm, raw_key, keysize);
if (res)
goto out;

41 changes: 29 additions & 12 deletions fs/crypto/policy.c
Original file line number Diff line number Diff line change
@@ -11,6 +11,7 @@
#include <linux/random.h>
#include <linux/string.h>
#include <linux/fscrypto.h>
#include <linux/mount.h>

static int inode_has_encryption_context(struct inode *inode)
{
@@ -92,26 +93,42 @@ static int create_encryption_context_from_policy(struct inode *inode,
return inode->i_sb->s_cop->set_context(inode, &ctx, sizeof(ctx), NULL);
}

int fscrypt_process_policy(struct inode *inode,
int fscrypt_process_policy(struct file *filp,
const struct fscrypt_policy *policy)
{
struct inode *inode = file_inode(filp);
int ret;

if (!inode_owner_or_capable(inode))
return -EACCES;

if (policy->version != 0)
return -EINVAL;

ret = mnt_want_write_file(filp);
if (ret)
return ret;

if (!inode_has_encryption_context(inode)) {
if (!inode->i_sb->s_cop->empty_dir)
return -EOPNOTSUPP;
if (!inode->i_sb->s_cop->empty_dir(inode))
return -ENOTEMPTY;
return create_encryption_context_from_policy(inode, policy);
if (!S_ISDIR(inode->i_mode))
ret = -EINVAL;
else if (!inode->i_sb->s_cop->empty_dir)
ret = -EOPNOTSUPP;
else if (!inode->i_sb->s_cop->empty_dir(inode))
ret = -ENOTEMPTY;
else
ret = create_encryption_context_from_policy(inode,
policy);
} else if (!is_encryption_context_consistent_with_policy(inode,
policy)) {
printk(KERN_WARNING
"%s: Policy inconsistent with encryption context\n",
__func__);
ret = -EOPNOTSUPP;
}

if (is_encryption_context_consistent_with_policy(inode, policy))
return 0;

printk(KERN_WARNING "%s: Policy inconsistent with encryption context\n",
__func__);
return -EINVAL;
mnt_drop_write_file(filp);
return ret;
}
EXPORT_SYMBOL(fscrypt_process_policy);

16 changes: 9 additions & 7 deletions fs/f2fs/acl.c
Original file line number Diff line number Diff line change
@@ -101,14 +101,16 @@ static struct posix_acl *f2fs_acl_from_disk(const char *value, size_t size)
return ERR_PTR(-EINVAL);
}

static void *f2fs_acl_to_disk(const struct posix_acl *acl, size_t *size)
static void *f2fs_acl_to_disk(struct f2fs_sb_info *sbi,
const struct posix_acl *acl, size_t *size)
{
struct f2fs_acl_header *f2fs_acl;
struct f2fs_acl_entry *entry;
int i;

f2fs_acl = f2fs_kmalloc(sizeof(struct f2fs_acl_header) + acl->a_count *
sizeof(struct f2fs_acl_entry), GFP_NOFS);
f2fs_acl = f2fs_kmalloc(sbi, sizeof(struct f2fs_acl_header) +
acl->a_count * sizeof(struct f2fs_acl_entry),
GFP_NOFS);
if (!f2fs_acl)
return ERR_PTR(-ENOMEM);

@@ -167,7 +169,7 @@ static struct posix_acl *__f2fs_get_acl(struct inode *inode, int type,

retval = f2fs_getxattr(inode, name_index, "", NULL, 0, dpage);
if (retval > 0) {
value = f2fs_kmalloc(retval, GFP_F2FS_ZERO);
value = f2fs_kmalloc(F2FS_I_SB(inode), retval, GFP_F2FS_ZERO);
if (!value)
return ERR_PTR(-ENOMEM);
retval = f2fs_getxattr(inode, name_index, "", value,
@@ -230,10 +232,10 @@ static int f2fs_set_acl(struct inode *inode, int type,
return -EINVAL;
}

f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, true);

if (acl) {
value = f2fs_acl_to_disk(acl, &size);
value = f2fs_acl_to_disk(F2FS_I_SB(inode), acl, &size);
if (IS_ERR(value)) {
clear_inode_flag(inode, FI_ACL_MODE);
return (int)PTR_ERR(value);
@@ -281,7 +283,7 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage,
if (error > 0)
error = f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl, ipage);

f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, true);
cleanup:
posix_acl_release(acl);
return error;
1 change: 0 additions & 1 deletion fs/f2fs/acl.h
Original file line number Diff line number Diff line change
@@ -41,7 +41,6 @@ extern int f2fs_acl_chmod(struct inode *);
extern int f2fs_init_acl(struct inode *, struct inode *, struct page *,
struct page *);
#else
#define f2fs_check_acl NULL
#define f2fs_get_acl NULL
#define f2fs_set_acl NULL

230 changes: 150 additions & 80 deletions fs/f2fs/checkpoint.c

Large diffs are not rendered by default.

200 changes: 116 additions & 84 deletions fs/f2fs/data.c

Large diffs are not rendered by default.

45 changes: 31 additions & 14 deletions fs/f2fs/debug.c
Original file line number Diff line number Diff line change
@@ -17,6 +17,7 @@
#include <linux/blkdev.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include <linux/export.h>

#include "f2fs.h"
#include "node.h"
@@ -45,15 +46,18 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
si->ndirty_data = get_pages(sbi, F2FS_DIRTY_DATA);
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
si->wb_bios = atomic_read(&sbi->nr_wb_bios);
si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA);
si->nr_wb_data = get_pages(sbi, F2FS_WB_DATA);
si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
si->rsvd_segs = reserved_segments(sbi);
si->overp_segs = overprovision_segments(sbi);
si->valid_count = valid_user_blocks(sbi);
si->discard_blks = discard_blocks(sbi);
si->valid_node_count = valid_node_count(sbi);
si->valid_inode_count = valid_inode_count(sbi);
si->inline_xattr = atomic_read(&sbi->inline_xattr);
@@ -72,7 +76,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->dirty_nats = NM_I(sbi)->dirty_nat_cnt;
si->sits = MAIN_SEGS(sbi);
si->dirty_sits = SIT_I(sbi)->dirty_sentries;
si->fnids = NM_I(sbi)->fcnt;
si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID_LIST];
si->alloc_nids = NM_I(sbi)->nid_cnt[ALLOC_NID_LIST];
si->bg_gc = sbi->bg_gc;
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
@@ -154,7 +159,9 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->base_mem += sizeof(struct sit_info);
si->base_mem += MAIN_SEGS(sbi) * sizeof(struct seg_entry);
si->base_mem += f2fs_bitmap_size(MAIN_SEGS(sbi));
si->base_mem += 3 * SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi);
si->base_mem += 2 * SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi);
if (f2fs_discard_en(sbi))
si->base_mem += SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi);
si->base_mem += SIT_VBLOCK_MAP_SIZE;
if (sbi->segs_per_sec > 1)
si->base_mem += MAIN_SECS(sbi) * sizeof(struct sec_entry);
@@ -190,7 +197,9 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->cache_mem += sizeof(struct flush_cmd_control);

/* free nids */
si->cache_mem += NM_I(sbi)->fcnt * sizeof(struct free_nid);
si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID_LIST] +
NM_I(sbi)->nid_cnt[ALLOC_NID_LIST]) *
sizeof(struct free_nid);
si->cache_mem += NM_I(sbi)->nat_cnt * sizeof(struct nat_entry);
si->cache_mem += NM_I(sbi)->dirty_nat_cnt *
sizeof(struct nat_entry_set);
@@ -230,8 +239,13 @@ static int stat_show(struct seq_file *s, void *v)
si->ssa_area_segs, si->main_area_segs);
seq_printf(s, "(OverProv:%d Resv:%d)]\n\n",
si->overp_segs, si->rsvd_segs);
seq_printf(s, "Utilization: %d%% (%d valid blocks)\n",
si->utilization, si->valid_count);
if (test_opt(si->sbi, DISCARD))
seq_printf(s, "Utilization: %u%% (%u valid blocks, %u discard blocks)\n",
si->utilization, si->valid_count, si->discard_blks);
else
seq_printf(s, "Utilization: %u%% (%u valid blocks)\n",
si->utilization, si->valid_count);

seq_printf(s, " - Node: %u (Inode: %u, ",
si->valid_node_count, si->valid_inode_count);
seq_printf(s, "Other: %u)\n - Data: %u\n",
@@ -303,20 +317,22 @@ static int stat_show(struct seq_file *s, void *v)
seq_printf(s, " - Inner Struct Count: tree: %d(%d), node: %d\n",
si->ext_tree, si->zombie_tree, si->ext_node);
seq_puts(s, "\nBalancing F2FS Async:\n");
seq_printf(s, " - inmem: %4lld, wb_bios: %4d\n",
si->inmem_pages, si->wb_bios);
seq_printf(s, " - nodes: %4lld in %4d\n",
seq_printf(s, " - inmem: %4d, wb_cp_data: %4d, wb_data: %4d\n",
si->inmem_pages, si->nr_wb_cp_data, si->nr_wb_data);
seq_printf(s, " - nodes: %4d in %4d\n",
si->ndirty_node, si->node_pages);
seq_printf(s, " - dents: %4lld in dirs:%4d (%4d)\n",
seq_printf(s, " - dents: %4d in dirs:%4d (%4d)\n",
si->ndirty_dent, si->ndirty_dirs, si->ndirty_all);
seq_printf(s, " - datas: %4lld in files:%4d\n",
seq_printf(s, " - datas: %4d in files:%4d\n",
si->ndirty_data, si->ndirty_files);
seq_printf(s, " - meta: %4lld in %4d\n",
seq_printf(s, " - meta: %4d in %4d\n",
si->ndirty_meta, si->meta_pages);
seq_printf(s, " - imeta: %4d\n",
si->ndirty_imeta);
seq_printf(s, " - NATs: %9d/%9d\n - SITs: %9d/%9d\n",
si->dirty_nats, si->nats, si->dirty_sits, si->sits);
seq_printf(s, " - free_nids: %9d\n",
si->fnids);
seq_printf(s, " - free_nids: %9d, alloc_nids: %9d\n",
si->free_nids, si->alloc_nids);
seq_puts(s, "\nDistribution of User Blocks:");
seq_puts(s, " [ valid | invalid | free ]\n");
seq_puts(s, " [");
@@ -364,6 +380,7 @@ static int stat_open(struct inode *inode, struct file *file)
}

static const struct file_operations stat_fops = {
.owner = THIS_MODULE,
.open = stat_open,
.read = seq_read,
.llseek = seq_lseek,
155 changes: 89 additions & 66 deletions fs/f2fs/dir.c
Original file line number Diff line number Diff line change
@@ -37,7 +37,7 @@ static unsigned int bucket_blocks(unsigned int level)
return 4;
}

unsigned char f2fs_filetype_table[F2FS_FT_MAX] = {
static unsigned char f2fs_filetype_table[F2FS_FT_MAX] = {
[F2FS_FT_UNKNOWN] = DT_UNKNOWN,
[F2FS_FT_REG_FILE] = DT_REG,
[F2FS_FT_DIR] = DT_DIR,
@@ -136,7 +136,7 @@ struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname,

/* show encrypted name */
if (fname->hash) {
if (de->hash_code == fname->hash)
if (de->hash_code == cpu_to_le32(fname->hash))
goto found;
} else if (de_name.len == name->len &&
de->hash_code == namehash &&
@@ -172,7 +172,10 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
int max_slots;
f2fs_hash_t namehash;

namehash = f2fs_dentry_hash(&name);
if(fname->hash)
namehash = cpu_to_le32(fname->hash);
else
namehash = f2fs_dentry_hash(&name);

nbucket = dir_buckets(level, F2FS_I(dir)->i_dir_level);
nblock = bucket_blocks(level);
@@ -212,31 +215,17 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
return de;
}

/*
* Find an entry in the specified directory with the wanted name.
* It returns the page where the entry was found (as a parameter - res_page),
* and the entry itself. Page is returned mapped and unlocked.
* Entry is guaranteed to be valid.
*/
struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
struct qstr *child, struct page **res_page)
struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir,
struct fscrypt_name *fname, struct page **res_page)
{
unsigned long npages = dir_blocks(dir);
struct f2fs_dir_entry *de = NULL;
unsigned int max_depth;
unsigned int level;
struct fscrypt_name fname;
int err;

err = fscrypt_setup_filename(dir, child, 1, &fname);
if (err) {
*res_page = ERR_PTR(err);
return NULL;
}

if (f2fs_has_inline_dentry(dir)) {
*res_page = NULL;
de = find_in_inline_dir(dir, &fname, res_page);
de = find_in_inline_dir(dir, fname, res_page);
goto out;
}

@@ -256,11 +245,35 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,

for (level = 0; level < max_depth; level++) {
*res_page = NULL;
de = find_in_level(dir, level, &fname, res_page);
de = find_in_level(dir, level, fname, res_page);
if (de || IS_ERR(*res_page))
break;
}
out:
return de;
}

/*
* Find an entry in the specified directory with the wanted name.
* It returns the page where the entry was found (as a parameter - res_page),
* and the entry itself. Page is returned mapped and unlocked.
* Entry is guaranteed to be valid.
*/
struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
const struct qstr *child, struct page **res_page)
{
struct f2fs_dir_entry *de = NULL;
struct fscrypt_name fname;
int err;

err = fscrypt_setup_filename(dir, child, 1, &fname);
if (err) {
*res_page = ERR_PTR(err);
return NULL;
}

de = __f2fs_find_entry(dir, &fname, res_page);

fscrypt_free_filename(&fname);
return de;
}
@@ -299,8 +312,8 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de,
f2fs_dentry_kunmap(dir, page);
set_page_dirty(page);

dir->i_mtime = dir->i_ctime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(dir);
dir->i_mtime = dir->i_ctime = current_time(dir);
f2fs_mark_inode_dirty_sync(dir, false);
f2fs_put_page(page, 1);
}

@@ -375,7 +388,8 @@ static int make_empty_dir(struct inode *inode,
}

struct page *init_inode_metadata(struct inode *inode, struct inode *dir,
const struct qstr *name, struct page *dpage)
const struct qstr *new_name, const struct qstr *orig_name,
struct page *dpage)
{
struct page *page;
int err;
@@ -400,7 +414,7 @@ struct page *init_inode_metadata(struct inode *inode, struct inode *dir,
if (err)
goto put_error;

err = f2fs_init_security(inode, dir, name, page);
err = f2fs_init_security(inode, dir, orig_name, page);
if (err)
goto put_error;

@@ -417,8 +431,8 @@ struct page *init_inode_metadata(struct inode *inode, struct inode *dir,
set_cold_node(inode, page);
}

if (name)
init_dent_inode(name, page);
if (new_name)
init_dent_inode(new_name, page);

/*
* This file should be checkpointed during fsync.
@@ -451,8 +465,8 @@ void update_parent_metadata(struct inode *dir, struct inode *inode,
f2fs_i_links_write(dir, true);
clear_inode_flag(inode, FI_NEW_INODE);
}
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(dir);
dir->i_mtime = dir->i_ctime = current_time(dir);
f2fs_mark_inode_dirty_sync(dir, false);

if (F2FS_I(dir)->i_current_depth != current_depth)
f2fs_i_depth_write(dir, current_depth);
@@ -496,14 +510,15 @@ void f2fs_update_dentry(nid_t ino, umode_t mode, struct f2fs_dentry_ptr *d,
de->ino = cpu_to_le32(ino);
set_de_type(de, mode);
for (i = 0; i < slots; i++) {
test_and_set_bit_le(bit_pos + i, (void *)d->bitmap);
__set_bit_le(bit_pos + i, (void *)d->bitmap);
/* avoid wrong garbage data for readdir */
if (i)
(de + i)->name_len = 0;
}
}

int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name,
const struct qstr *orig_name,
struct inode *inode, nid_t ino, umode_t mode)
{
unsigned int bit_pos;
@@ -530,7 +545,7 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name,

start:
#ifdef CONFIG_F2FS_FAULT_INJECTION
if (time_to_inject(FAULT_DIR_DEPTH))
if (time_to_inject(F2FS_I_SB(dir), FAULT_DIR_DEPTH))
return -ENOSPC;
#endif
if (unlikely(current_depth == MAX_DIR_HASH_DEPTH))
@@ -569,7 +584,8 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name,

if (inode) {
down_write(&F2FS_I(inode)->i_sem);
page = init_inode_metadata(inode, dir, new_name, NULL);
page = init_inode_metadata(inode, dir, new_name,
orig_name, NULL);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto fail;
@@ -599,6 +615,26 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name,
return err;
}

int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname,
struct inode *inode, nid_t ino, umode_t mode)
{
struct qstr new_name;
int err = -EAGAIN;

new_name.name = fname_name(fname);
new_name.len = fname_len(fname);

if (f2fs_has_inline_dentry(dir))
err = f2fs_add_inline_entry(dir, &new_name, fname->usr_fname,
inode, ino, mode);
if (err == -EAGAIN)
err = f2fs_add_regular_entry(dir, &new_name, fname->usr_fname,
inode, ino, mode);

f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
return err;
}

/*
* Caller should grab and release a rwsem by calling f2fs_lock_op() and
* f2fs_unlock_op().
@@ -607,24 +643,15 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name,
struct inode *inode, nid_t ino, umode_t mode)
{
struct fscrypt_name fname;
struct qstr new_name;
int err;

err = fscrypt_setup_filename(dir, name, 0, &fname);
if (err)
return err;

new_name.name = fname_name(&fname);
new_name.len = fname_len(&fname);

err = -EAGAIN;
if (f2fs_has_inline_dentry(dir))
err = f2fs_add_inline_entry(dir, &new_name, inode, ino, mode);
if (err == -EAGAIN)
err = f2fs_add_regular_entry(dir, &new_name, inode, ino, mode);
err = __f2fs_do_add_link(dir, &fname, inode, ino, mode);

fscrypt_free_filename(&fname);
f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
return err;
}

@@ -634,7 +661,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
int err = 0;

down_write(&F2FS_I(inode)->i_sem);
page = init_inode_metadata(inode, dir, NULL, NULL);
page = init_inode_metadata(inode, dir, NULL, NULL, NULL);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto fail;
@@ -656,7 +683,7 @@ void f2fs_drop_nlink(struct inode *dir, struct inode *inode)

if (S_ISDIR(inode->i_mode))
f2fs_i_links_write(dir, false);
inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_time(inode);

f2fs_i_links_write(inode, false);
if (S_ISDIR(inode->i_mode)) {
@@ -703,8 +730,8 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
kunmap(page); /* kunmap - pair of f2fs_find_entry */
set_page_dirty(page);

dir->i_ctime = dir->i_mtime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(dir);
dir->i_ctime = dir->i_mtime = current_time(dir);
f2fs_mark_inode_dirty_sync(dir, false);

if (inode)
f2fs_drop_nlink(dir, inode);
@@ -715,6 +742,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
ClearPagePrivate(page);
ClearPageUptodate(page);
inode_dec_dirty_pages(dir);
remove_dirty_inode(dir);
}
f2fs_put_page(page, 1);
}
@@ -757,7 +785,7 @@ bool f2fs_empty_dir(struct inode *dir)
return true;
}

bool f2fs_fill_dentries(struct file *file, void *dirent, filldir_t filldir,
int f2fs_fill_dentries(struct file *file, void *dirent, filldir_t filldir,
struct f2fs_dentry_ptr *d, unsigned int n, unsigned int bit_pos,
struct fscrypt_str *fstr)
{
@@ -787,20 +815,13 @@ bool f2fs_fill_dentries(struct file *file, void *dirent, filldir_t filldir,

if (f2fs_encrypted_inode(d->inode)) {
int save_len = fstr->len;
int ret;

de_name.name = f2fs_kmalloc(de_name.len, GFP_NOFS);
if (!de_name.name)
return false;

memcpy(de_name.name, d->filename[bit_pos], de_name.len);
int err;

ret = fscrypt_fname_disk_to_usr(d->inode,
err = fscrypt_fname_disk_to_usr(d->inode,
(u32)de->hash_code, 0,
&de_name, fstr);
kfree(de_name.name);
if (ret < 0)
return true;
if (err)
return err;

de_name = *fstr;
fstr->len = save_len;
@@ -811,12 +832,12 @@ bool f2fs_fill_dentries(struct file *file, void *dirent, filldir_t filldir,
le32_to_cpu(de->ino), d_type);
if (over) {
file->f_pos += bit_pos - start_bit_pos;
return true;
return 1;
}

bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de->name_len));
}
return false;
return 0;
}

static int f2fs_readdir(struct file *file, void *dirent, filldir_t filldir)
@@ -860,18 +881,21 @@ static int f2fs_readdir(struct file *file, void *dirent, filldir_t filldir)
dentry_page = get_lock_data_page(inode, n, false);
if (IS_ERR(dentry_page)) {
err = PTR_ERR(dentry_page);
if (err == -ENOENT)
if (err == -ENOENT) {
err = 0;
continue;
else
} else {
goto out;
}
}

dentry_blk = kmap(dentry_page);

make_dentry_ptr(inode, &d, (void *)dentry_blk, 1);

if (f2fs_fill_dentries(file, dirent, filldir, &d, n,
bit_pos, &fstr)) {
err = f2fs_fill_dentries(file, dirent, filldir, &d, n,
bit_pos, &fstr);
if (err) {
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
break;
@@ -882,10 +906,9 @@ static int f2fs_readdir(struct file *file, void *dirent, filldir_t filldir)
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
}
err = 0;
out:
fscrypt_fname_free_buffer(&fstr);
return err;
return err < 0 ? err : 0;
}

static int f2fs_dir_open(struct inode *inode, struct file *filp)
2 changes: 1 addition & 1 deletion fs/f2fs/extent_cache.c
Original file line number Diff line number Diff line change
@@ -172,7 +172,7 @@ static void __drop_largest_extent(struct inode *inode,

if (fofs < largest->fofs + largest->len && fofs + len > largest->fofs) {
largest->len = 0;
f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, true);
}
}

331 changes: 223 additions & 108 deletions fs/f2fs/f2fs.h

Large diffs are not rendered by default.

91 changes: 48 additions & 43 deletions fs/f2fs/file.c
Original file line number Diff line number Diff line change
@@ -94,8 +94,6 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma,
if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
f2fs_wait_on_encrypted_page_writeback(sbi, dn.data_blkaddr);

/* if gced page is attached, don't write to cold segment */
clear_cold_data(page);
out:
f2fs_update_time(sbi, REQ_TIME);
return block_page_mkwrite_return(err);
@@ -133,7 +131,7 @@ static inline bool need_do_checkpoint(struct inode *inode)

if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
need_cp = true;
else if (file_enc_name(inode) && need_dentry_mark(sbi, inode->i_ino))
else if (is_sbi_flag_set(sbi, SBI_NEED_CP))
need_cp = true;
else if (file_wrong_pino(inode))
need_cp = true;
@@ -208,7 +206,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
}

/* if the inode is dirty, let's recover all the time */
if (!datasync && !f2fs_skip_inode_update(inode)) {
if (!f2fs_skip_inode_update(inode, datasync)) {
f2fs_write_inode(inode, NULL);
goto go_write;
}
@@ -262,7 +260,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
}

if (need_inode_block_update(sbi, ino)) {
f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, true);
f2fs_write_inode(inode, NULL);
goto sync_nodes;
}
@@ -540,7 +538,7 @@ static int truncate_partial_data_page(struct inode *inode, u64 from,
return 0;

if (cache_only) {
page = f2fs_grab_cache_page(mapping, index, false);
page = find_lock_page(mapping, index);
if (page && PageUptodate(page))
goto truncate_out;
f2fs_put_page(page, 1);
@@ -648,8 +646,8 @@ int f2fs_truncate(struct inode *inode)
if (err)
return err;

inode->i_mtime = inode->i_ctime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(inode);
inode->i_mtime = inode->i_ctime = current_time(inode);
f2fs_mark_inode_dirty_sync(inode, false);
return 0;
}

@@ -696,6 +694,7 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = d_inode(dentry);
int err;
bool size_changed = false;

err = inode_change_ok(inode, attr);
if (err)
@@ -711,7 +710,6 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
err = f2fs_truncate(inode);
if (err)
return err;
f2fs_balance_fs(F2FS_I_SB(inode), true);
} else {
/*
* do not trim all blocks after i_size if target size is
@@ -725,8 +723,10 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
if (err)
return err;
}
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_mtime = inode->i_ctime = current_time(inode);
}

size_changed = true;
}

__setattr_copy(inode, attr);
@@ -739,7 +739,12 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
}
}

f2fs_mark_inode_dirty_sync(inode);
/* file size may changed here */
f2fs_mark_inode_dirty_sync(inode, size_changed);

/* inode change will produce dirty node pages flushed by checkpoint */
f2fs_balance_fs(F2FS_I_SB(inode), true);

return err;
}

@@ -986,7 +991,7 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode,
new_size = (dst + i) << PAGE_SHIFT;
if (dst_inode->i_size < new_size)
f2fs_i_size_write(dst_inode, new_size);
} while ((do_replace[i] || blkaddr[i] == NULL_ADDR) && --ilen);
} while (--ilen && (do_replace[i] || blkaddr[i] == NULL_ADDR));

f2fs_put_dnode(&dn);
} else {
@@ -1237,6 +1242,9 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len,
ret = f2fs_do_zero_range(&dn, index, end);
f2fs_put_dnode(&dn);
f2fs_unlock_op(sbi);

f2fs_balance_fs(sbi, dn.node_changed);

if (ret)
goto out;

@@ -1332,15 +1340,15 @@ static int expand_inode_data(struct inode *inode, loff_t offset,
pgoff_t pg_end;
loff_t new_size = i_size_read(inode);
loff_t off_end;
int ret;
int err;

ret = inode_newsize_ok(inode, (len + offset));
if (ret)
return ret;
err = inode_newsize_ok(inode, (len + offset));
if (err)
return err;

ret = f2fs_convert_inline_inode(inode);
if (ret)
return ret;
err = f2fs_convert_inline_inode(inode);
if (err)
return err;

f2fs_balance_fs(sbi, true);

@@ -1352,12 +1360,12 @@ static int expand_inode_data(struct inode *inode, loff_t offset,
if (off_end)
map.m_len++;

ret = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO);
if (ret) {
err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO);
if (err) {
pgoff_t last_off;

if (!map.m_len)
return ret;
return err;

last_off = map.m_lblk + map.m_len - 1;

@@ -1371,7 +1379,7 @@ static int expand_inode_data(struct inode *inode, loff_t offset,
if (!(mode & FALLOC_FL_KEEP_SIZE) && i_size_read(inode) < new_size)
f2fs_i_size_write(inode, new_size);

return ret;
return err;
}

#ifndef FALLOC_FL_COLLAPSE_RANGE
@@ -1421,8 +1429,10 @@ static long f2fs_fallocate(struct file *file, int mode,
}

if (!ret) {
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(inode);
inode->i_mtime = inode->i_ctime = current_time(inode);
f2fs_mark_inode_dirty_sync(inode, false);
if (mode & FALLOC_FL_KEEP_SIZE)
file_set_keep_isize(inode);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
}

@@ -1480,7 +1490,7 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
struct f2fs_inode_info *fi = F2FS_I(inode);
unsigned int flags = fi->i_flags & FS_FL_USER_VISIBLE;
unsigned int flags;
unsigned int oldflags;
int ret;

@@ -1513,7 +1523,7 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg)
fi->i_flags = flags;
inode_unlock(inode);

inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_time(inode);
f2fs_set_inode_flags(inode);
out:
mnt_drop_write_file(filp);
@@ -1783,21 +1793,14 @@ static int f2fs_ioc_set_encryption_policy(struct file *filp, unsigned long arg)
{
struct fscrypt_policy policy;
struct inode *inode = file_inode(filp);
int ret;

if (copy_from_user(&policy, (struct fscrypt_policy __user *)arg,
sizeof(policy)))
return -EFAULT;

ret = mnt_want_write_file(filp);
if (ret)
return ret;

f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
ret = fscrypt_process_policy(inode, &policy);

mnt_drop_write_file(filp);
return ret;
return fscrypt_process_policy(filp, &policy);
}

static int f2fs_ioc_get_encryption_policy(struct file *filp, unsigned long arg)
@@ -1878,7 +1881,7 @@ static int f2fs_ioc_gc(struct file *filp, unsigned long arg)
mutex_lock(&sbi->gc_mutex);
}

ret = f2fs_gc(sbi, sync);
ret = f2fs_gc(sbi, sync, true);
out:
mnt_drop_write_file(filp);
return ret;
@@ -1987,7 +1990,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi,
* avoid defragment running in SSR mode when free section are allocated
* intensively
*/
if (has_not_enough_free_secs(sbi, sec_num)) {
if (has_not_enough_free_secs(sbi, 0, sec_num)) {
err = -EAGAIN;
goto out;
}
@@ -2156,14 +2159,16 @@ static ssize_t f2fs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
inode_lock(inode);
ret = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
if (!ret) {
ret = f2fs_preallocate_blocks(inode, pos, count,
int err = f2fs_preallocate_blocks(inode, pos, count,
iocb->ki_filp->f_flags & O_DIRECT);
if (!ret) {
blk_start_plug(&plug);
ret = __generic_file_aio_write(iocb, iov, nr_segs,
&iocb->ki_pos);
blk_finish_plug(&plug);
if (err) {
inode_unlock(inode);
return err;
}
blk_start_plug(&plug);
ret = __generic_file_aio_write(iocb, iov, nr_segs,
&iocb->ki_pos);
blk_finish_plug(&plug);
}
inode_unlock(inode);

123 changes: 77 additions & 46 deletions fs/f2fs/gc.c
Original file line number Diff line number Diff line change
@@ -47,6 +47,11 @@ static int gc_thread_func(void *data)
continue;
}

#ifdef CONFIG_F2FS_FAULT_INJECTION
if (time_to_inject(sbi, FAULT_CHECKPOINT))
f2fs_stop_checkpoint(sbi, false);
#endif

/*
* [GC triggering condition]
* 0. GC is not conducted currently.
@@ -77,7 +82,7 @@ static int gc_thread_func(void *data)
stat_inc_bggc_count(sbi);

/* if return value is not zero, no victim was selected */
if (f2fs_gc(sbi, test_opt(sbi, FORCE_FG_GC)))
if (f2fs_gc(sbi, test_opt(sbi, FORCE_FG_GC), true))
wait_ms = gc_th->no_gc_sleep_time;

trace_f2fs_background_gc(sbi->sb, wait_ms,
@@ -96,7 +101,7 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
dev_t dev = sbi->sb->s_bdev->bd_dev;
int err = 0;

gc_th = f2fs_kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
gc_th = f2fs_kmalloc(sbi, sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
if (!gc_th) {
err = -ENOMEM;
goto out;
@@ -270,7 +275,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
{
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
struct victim_sel_policy p;
unsigned int secno, max_cost, last_victim;
unsigned int secno, last_victim;
unsigned int last_segment = MAIN_SEGS(sbi);
unsigned int nsearched = 0;

@@ -280,7 +285,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
select_policy(sbi, gc_type, type, &p);

p.min_segno = NULL_SEGNO;
p.min_cost = max_cost = get_max_cost(sbi, &p);
p.min_cost = get_max_cost(sbi, &p);

if (p.max_search == 0)
goto out;
@@ -423,10 +428,10 @@ static int check_valid_map(struct f2fs_sb_info *sbi,
static void gc_node_segment(struct f2fs_sb_info *sbi,
struct f2fs_summary *sum, unsigned int segno, int gc_type)
{
bool initial = true;
struct f2fs_summary *entry;
block_t start_addr;
int off;
int phase = 0;

start_addr = START_BLOCK(sbi, segno);

@@ -439,16 +444,24 @@ static void gc_node_segment(struct f2fs_sb_info *sbi,
struct node_info ni;

/* stop BG_GC if there is not enough free sections. */
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0))
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0))
return;

if (check_valid_map(sbi, segno, off) == 0)
continue;

if (initial) {
if (phase == 0) {
ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
META_NAT, true);
continue;
}

if (phase == 1) {
ra_node_page(sbi, nid);
continue;
}

/* phase == 2 */
node_page = get_node_page(sbi, nid);
if (IS_ERR(node_page))
continue;
@@ -469,10 +482,8 @@ static void gc_node_segment(struct f2fs_sb_info *sbi,
stat_inc_node_blk_count(sbi, 1, gc_type);
}

if (initial) {
initial = false;
if (++phase < 3)
goto next_step;
}
}

/*
@@ -533,7 +544,8 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
return true;
}

static void move_encrypted_block(struct inode *inode, block_t bidx)
static void move_encrypted_block(struct inode *inode, block_t bidx,
unsigned int segno, int off)
{
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
@@ -553,6 +565,9 @@ static void move_encrypted_block(struct inode *inode, block_t bidx)
if (!page)
return;

if (!check_valid_map(F2FS_I_SB(inode), segno, off))
goto out;

set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, bidx, LOOKUP_NODE);
if (err)
@@ -632,14 +647,18 @@ static void move_encrypted_block(struct inode *inode, block_t bidx)
f2fs_put_page(page, 1);
}

static void move_data_page(struct inode *inode, block_t bidx, int gc_type)
static void move_data_page(struct inode *inode, block_t bidx, int gc_type,
unsigned int segno, int off)
{
struct page *page;

page = get_lock_data_page(inode, bidx, true);
if (IS_ERR(page))
return;

if (!check_valid_map(F2FS_I_SB(inode), segno, off))
goto out;

if (gc_type == BG_GC) {
if (PageWriteback(page))
goto out;
@@ -659,8 +678,10 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type)
retry:
set_page_dirty(page);
f2fs_wait_on_page_writeback(page, DATA, true);
if (clear_page_dirty_for_io(page))
if (clear_page_dirty_for_io(page)) {
inode_dec_dirty_pages(inode);
remove_dirty_inode(inode);
}

set_cold_data(page);

@@ -669,8 +690,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type)
congestion_wait(BLK_RW_ASYNC, HZ/50);
goto retry;
}

clear_cold_data(page);
}
out:
f2fs_put_page(page, 1);
@@ -703,31 +722,38 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
struct node_info dni; /* dnode info for the data */
unsigned int ofs_in_node, nofs;
block_t start_bidx;
nid_t nid = le32_to_cpu(entry->nid);

/* stop BG_GC if there is not enough free sections. */
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0))
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0))
return;

if (check_valid_map(sbi, segno, off) == 0)
continue;

if (phase == 0) {
ra_node_page(sbi, le32_to_cpu(entry->nid));
ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
META_NAT, true);
continue;
}

if (phase == 1) {
ra_node_page(sbi, nid);
continue;
}

/* Get an inode by ino with checking validity */
if (!is_alive(sbi, entry, &dni, start_addr + off, &nofs))
continue;

if (phase == 1) {
if (phase == 2) {
ra_node_page(sbi, dni.ino);
continue;
}

ofs_in_node = le16_to_cpu(entry->ofs_in_node);

if (phase == 2) {
if (phase == 3) {
inode = f2fs_iget(sb, dni.ino);
if (IS_ERR(inode) || is_bad_inode(inode))
continue;
@@ -752,7 +778,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
continue;
}

/* phase 3 */
/* phase 4 */
inode = find_gc_inode(gc_list, dni.ino);
if (inode) {
struct f2fs_inode_info *fi = F2FS_I(inode);
@@ -772,9 +798,9 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
start_bidx = start_bidx_of_node(nofs, inode)
+ ofs_in_node;
if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
move_encrypted_block(inode, start_bidx);
move_encrypted_block(inode, start_bidx, segno, off);
else
move_data_page(inode, start_bidx, gc_type);
move_data_page(inode, start_bidx, gc_type, segno, off);

if (locked) {
up_write(&fi->dio_rwsem[WRITE]);
@@ -785,7 +811,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
}
}

if (++phase < 4)
if (++phase < 5)
goto next_step;
}

@@ -811,7 +837,7 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
struct blk_plug plug;
unsigned int segno = start_segno;
unsigned int end_segno = start_segno + sbi->segs_per_sec;
int seg_freed = 0;
int sec_freed = 0;
unsigned char type = IS_DATASEG(get_seg_entry(sbi, segno)->type) ?
SUM_TYPE_DATA : SUM_TYPE_NODE;

@@ -830,15 +856,16 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,

for (segno = start_segno; segno < end_segno; segno++) {

if (get_valid_blocks(sbi, segno, 1) == 0)
continue;

/* find segment summary of victim */
sum_page = find_get_page(META_MAPPING(sbi),
GET_SUM_BLOCK(sbi, segno));
f2fs_bug_on(sbi, !PageUptodate(sum_page));
f2fs_put_page(sum_page, 0);

if (get_valid_blocks(sbi, segno, 1) == 0 ||
!PageUptodate(sum_page) ||
unlikely(f2fs_cp_error(sbi)))
goto next;

sum = page_address(sum_page);
f2fs_bug_on(sbi, type != GET_SUM_TYPE((&sum->footer)));

@@ -857,7 +884,7 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
gc_type);

stat_inc_seg_count(sbi, type, gc_type);

next:
f2fs_put_page(sum_page, 0);
}

@@ -867,22 +894,20 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,

blk_finish_plug(&plug);

if (gc_type == FG_GC) {
while (start_segno < end_segno)
if (get_valid_blocks(sbi, start_segno++, 1) == 0)
seg_freed++;
}
if (gc_type == FG_GC &&
get_valid_blocks(sbi, start_segno, sbi->segs_per_sec) == 0)
sec_freed = 1;

stat_inc_call_count(sbi->stat_info);

return seg_freed;
return sec_freed;
}

int f2fs_gc(struct f2fs_sb_info *sbi, bool sync)
int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background)
{
unsigned int segno;
int gc_type = sync ? FG_GC : BG_GC;
int sec_freed = 0, seg_freed;
int sec_freed = 0;
int ret = -EINVAL;
struct cp_control cpc;
struct gc_inode_list gc_list = {
@@ -901,7 +926,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync)
goto stop;
}

if (gc_type == BG_GC && has_not_enough_free_secs(sbi, sec_freed)) {
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, sec_freed, 0)) {
gc_type = FG_GC;
/*
* If there is no victim and no prefree segment but still not
@@ -910,31 +935,37 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync)
*/
if (__get_victim(sbi, &segno, gc_type) ||
prefree_segments(sbi)) {
write_checkpoint(sbi, &cpc);
ret = write_checkpoint(sbi, &cpc);
if (ret)
goto stop;
segno = NULL_SEGNO;
} else if (has_not_enough_free_secs(sbi, 0)) {
write_checkpoint(sbi, &cpc);
} else if (has_not_enough_free_secs(sbi, 0, 0)) {
ret = write_checkpoint(sbi, &cpc);
if (ret)
goto stop;
}
} else if (gc_type == BG_GC && !background) {
/* f2fs_balance_fs doesn't need to do BG_GC in critical path. */
goto stop;
}

if (segno == NULL_SEGNO && !__get_victim(sbi, &segno, gc_type))
goto stop;
ret = 0;

seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type);

if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec)
if (do_garbage_collect(sbi, segno, &gc_list, gc_type) &&
gc_type == FG_GC)
sec_freed++;

if (gc_type == FG_GC)
sbi->cur_victim_sec = NULL_SEGNO;

if (!sync) {
if (has_not_enough_free_secs(sbi, sec_freed))
if (has_not_enough_free_secs(sbi, sec_freed, 0))
goto gc_more;

if (gc_type == FG_GC)
write_checkpoint(sbi, &cpc);
ret = write_checkpoint(sbi, &cpc);
}
stop:
mutex_unlock(&sbi->gc_mutex);
41 changes: 24 additions & 17 deletions fs/f2fs/inline.c
Original file line number Diff line number Diff line change
@@ -136,8 +136,10 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page)
fio.old_blkaddr = dn->data_blkaddr;
write_data_page(dn, &fio);
f2fs_wait_on_page_writeback(page, DATA, true);
if (dirty)
if (dirty) {
inode_dec_dirty_pages(dn->inode);
remove_dirty_inode(dn->inode);
}

/* this converted inline_data should be recovered. */
set_inode_flag(dn->inode, FI_APPEND_WRITE);
@@ -418,12 +420,12 @@ static int f2fs_add_inline_entries(struct inode *dir,
}

new_name.name = d.filename[bit_pos];
new_name.len = de->name_len;
new_name.len = le16_to_cpu(de->name_len);

ino = le32_to_cpu(de->ino);
fake_mode = get_de_type(de) << S_SHIFT;

err = f2fs_add_regular_entry(dir, &new_name, NULL,
err = f2fs_add_regular_entry(dir, &new_name, NULL, NULL,
ino, fake_mode);
if (err)
goto punch_dentry_pages;
@@ -444,8 +446,8 @@ static int f2fs_move_rehashed_dirents(struct inode *dir, struct page *ipage,
struct f2fs_inline_dentry *backup_dentry;
int err;

backup_dentry = f2fs_kmalloc(sizeof(struct f2fs_inline_dentry),
GFP_F2FS_ZERO);
backup_dentry = f2fs_kmalloc(F2FS_I_SB(dir),
sizeof(struct f2fs_inline_dentry), GFP_F2FS_ZERO);
if (!backup_dentry) {
f2fs_put_page(ipage, 1);
return -ENOMEM;
@@ -487,17 +489,17 @@ static int f2fs_convert_inline_dir(struct inode *dir, struct page *ipage,
return f2fs_move_rehashed_dirents(dir, ipage, inline_dentry);
}

int f2fs_add_inline_entry(struct inode *dir, const struct qstr *name,
struct inode *inode, nid_t ino, umode_t mode)
int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name,
const struct qstr *orig_name,
struct inode *inode, nid_t ino, umode_t mode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
struct page *ipage;
unsigned int bit_pos;
f2fs_hash_t name_hash;
size_t namelen = name->len;
struct f2fs_inline_dentry *dentry_blk = NULL;
struct f2fs_dentry_ptr d;
int slots = GET_DENTRY_SLOTS(namelen);
int slots = GET_DENTRY_SLOTS(new_name->len);
struct page *page = NULL;
int err = 0;

@@ -518,18 +520,21 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *name,

if (inode) {
down_write(&F2FS_I(inode)->i_sem);
page = init_inode_metadata(inode, dir, name, ipage);
page = init_inode_metadata(inode, dir, new_name,
orig_name, ipage);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto fail;
}
if (f2fs_encrypted_inode(dir))
file_set_enc_name(inode);
}

f2fs_wait_on_page_writeback(ipage, NODE, true);

name_hash = f2fs_dentry_hash(name);
name_hash = f2fs_dentry_hash(new_name);
make_dentry_ptr(NULL, &d, (void *)dentry_blk, 2);
f2fs_update_dentry(ino, mode, &d, name, name_hash, bit_pos);
f2fs_update_dentry(ino, mode, &d, new_name, name_hash, bit_pos);

set_page_dirty(ipage);

@@ -562,14 +567,14 @@ void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, struct page *page,
inline_dentry = inline_data_addr(page);
bit_pos = dentry - inline_dentry->dentry;
for (i = 0; i < slots; i++)
test_and_clear_bit_le(bit_pos + i,
__clear_bit_le(bit_pos + i,
&inline_dentry->dentry_bitmap);

set_page_dirty(page);
f2fs_put_page(page, 1);

dir->i_ctime = dir->i_mtime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(dir);
dir->i_ctime = dir->i_mtime = current_time(dir);
f2fs_mark_inode_dirty_sync(dir, false);

if (inode)
f2fs_drop_nlink(dir, inode);
@@ -608,6 +613,7 @@ int f2fs_read_inline_dir(struct file *file, void *dirent, filldir_t filldir,
struct f2fs_inline_dentry *inline_dentry = NULL;
struct page *ipage = NULL;
struct f2fs_dentry_ptr d;
int err;

if (pos >= NR_INLINE_DENTRY)
return 0;
@@ -622,11 +628,12 @@ int f2fs_read_inline_dir(struct file *file, void *dirent, filldir_t filldir,

make_dentry_ptr(inode, &d, (void *)inline_dentry, 2);

if (!f2fs_fill_dentries(file, dirent, filldir, &d, 0, bit_pos, fstr))
err = f2fs_fill_dentries(file, dirent, filldir, &d, 0, bit_pos, fstr);
if (!err)
file->f_pos = NR_INLINE_DENTRY;

f2fs_put_page(ipage, 1);
return 0;
return err < 0 ? err : 0;
}

int f2fs_inline_data_fiemap(struct inode *inode,
64 changes: 51 additions & 13 deletions fs/f2fs/inode.c
Original file line number Diff line number Diff line change
@@ -11,17 +11,19 @@
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/buffer_head.h>
#include <linux/backing-dev.h>
#include <linux/writeback.h>

#include "f2fs.h"
#include "node.h"

#include <trace/events/f2fs.h>

void f2fs_mark_inode_dirty_sync(struct inode *inode)
void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync)
{
if (f2fs_inode_dirtied(inode))
if (f2fs_inode_dirtied(inode, sync))
return;

mark_inode_dirty_sync(inode);
}

@@ -42,7 +44,7 @@ void f2fs_set_inode_flags(struct inode *inode)
inode->i_flags |= S_NOATIME;
if (flags & FS_DIRSYNC_FL)
inode->i_flags |= S_DIRSYNC;
f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, false);
}

static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri)
@@ -234,9 +236,24 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
return ERR_PTR(ret);
}

struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino)
{
struct inode *inode;
retry:
inode = f2fs_iget(sb, ino);
if (IS_ERR(inode)) {
if (PTR_ERR(inode) == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, HZ/50);
goto retry;
}
}
return inode;
}

int update_inode(struct inode *inode, struct page *node_page)
{
struct f2fs_inode *ri;
struct extent_tree *et = F2FS_I(inode)->extent_tree;

f2fs_inode_synced(inode);

@@ -252,11 +269,13 @@ int update_inode(struct inode *inode, struct page *node_page)
ri->i_size = cpu_to_le64(i_size_read(inode));
ri->i_blocks = cpu_to_le64(inode->i_blocks);

if (F2FS_I(inode)->extent_tree)
set_raw_extent(&F2FS_I(inode)->extent_tree->largest,
&ri->i_ext);
else
if (et) {
read_lock(&et->lock);
set_raw_extent(&et->largest, &ri->i_ext);
read_unlock(&et->lock);
} else {
memset(&ri->i_ext, 0, sizeof(ri->i_ext));
}
set_raw_inline(inode, ri);

ri->i_atime = cpu_to_le64(inode->i_atime.tv_sec);
@@ -320,7 +339,7 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
* We need to balance fs here to prevent from producing dirty node pages
* during the urgent cleaning time when runing out of free sections.
*/
if (update_inode_page(inode))
if (update_inode_page(inode) && wbc && wbc->nr_to_write)
f2fs_balance_fs(sbi, true);
return 0;
}
@@ -354,10 +373,13 @@ void f2fs_evict_inode(struct inode *inode)
goto no_delete;

#ifdef CONFIG_F2FS_FAULT_INJECTION
if (time_to_inject(FAULT_EVICT_INODE))
if (time_to_inject(sbi, FAULT_EVICT_INODE))
goto no_delete;
#endif

remove_ino_entry(sbi, inode->i_ino, APPEND_INO);
remove_ino_entry(sbi, inode->i_ino, UPDATE_INO);

set_inode_flag(inode, FI_NO_ALLOC);
i_size_write(inode, 0);
retry:
@@ -368,6 +390,8 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_lock_op(sbi);
err = remove_inode_page(inode);
f2fs_unlock_op(sbi);
if (err == -ENOENT)
err = 0;
}

/* give more chances, if ENOMEM case */
@@ -386,10 +410,12 @@ void f2fs_evict_inode(struct inode *inode)
invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
if (xnid)
invalidate_mapping_pages(NODE_MAPPING(sbi), xnid, xnid);
if (is_inode_flag_set(inode, FI_APPEND_WRITE))
add_ino_entry(sbi, inode->i_ino, APPEND_INO);
if (is_inode_flag_set(inode, FI_UPDATE_WRITE))
add_ino_entry(sbi, inode->i_ino, UPDATE_INO);
if (inode->i_nlink) {
if (is_inode_flag_set(inode, FI_APPEND_WRITE))
add_ino_entry(sbi, inode->i_ino, APPEND_INO);
if (is_inode_flag_set(inode, FI_UPDATE_WRITE))
add_ino_entry(sbi, inode->i_ino, UPDATE_INO);
}
if (is_inode_flag_set(inode, FI_FREE_NID)) {
alloc_nid_failed(sbi, inode->i_ino);
clear_inode_flag(inode, FI_FREE_NID);
@@ -407,6 +433,18 @@ void handle_failed_inode(struct inode *inode)
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct node_info ni;

/*
* clear nlink of inode in order to release resource of inode
* immediately.
*/
clear_nlink(inode);

/*
* we must call this to avoid inode being remained as dirty, resulting
* in a panic when flushing dirty inodes in gdirty_list.
*/
update_inode_page(inode);

/* don't make bad inode, since it becomes a regular file. */
unlock_new_inode(inode);

33 changes: 18 additions & 15 deletions fs/f2fs/namei.c
Original file line number Diff line number Diff line change
@@ -46,7 +46,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)

inode->i_ino = ino;
inode->i_blocks = 0;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
inode->i_generation = sbi->s_next_generation++;

err = insert_inode_locked(inode);
@@ -91,18 +91,23 @@ static int is_multimedia_file(const unsigned char *s, const char *sub)
{
size_t slen = strlen(s);
size_t sublen = strlen(sub);
int i;

/*
* filename format of multimedia file should be defined as:
* "filename + '.' + extension".
* "filename + '.' + extension + (optional: '.' + temp extension)".
*/
if (slen < sublen + 2)
return 0;

if (s[slen - sublen - 1] != '.')
return 0;
for (i = 1; i < slen - sublen; i++) {
if (s[i] != '.')
continue;
if (!strncasecmp(s + i + 1, sub, sublen))
return 1;
}

return !strncasecmp(s + slen - sublen, sub, sublen);
return 0;
}

/*
@@ -177,7 +182,7 @@ static int f2fs_link(struct dentry *old_dentry, struct inode *dir,

f2fs_balance_fs(sbi, true);

inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_time(inode);
ihold(inode);

set_inode_flag(inode, FI_INC_LINK);
@@ -457,7 +462,7 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
ostr.name = sd->encrypted_path;
ostr.len = disk_link.len;
err = fscrypt_fname_usr_to_disk(inode, &istr, &ostr);
if (err < 0)
if (err)
goto err_out;

sd->len = cpu_to_le16(ostr.len);
@@ -649,7 +654,7 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry,

f2fs_set_link(new_dir, new_entry, new_page, old_inode);

new_inode->i_ctime = CURRENT_TIME;
new_inode->i_ctime = current_time(new_inode);
down_write(&F2FS_I(new_inode)->i_sem);
if (old_dir_entry)
f2fs_i_links_write(new_inode, false);
@@ -703,8 +708,8 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry,
file_set_enc_name(old_inode);
up_write(&F2FS_I(old_inode)->i_sem);

old_inode->i_ctime = CURRENT_TIME;
f2fs_mark_inode_dirty_sync(old_inode);
old_inode->i_ctime = current_time(old_inode);
f2fs_mark_inode_dirty_sync(old_inode, false);

f2fs_delete_entry(old_entry, old_page, old_dir, NULL);

@@ -750,7 +755,6 @@ static void *f2fs_encrypted_follow_link(struct dentry *dentry,
struct fscrypt_str pstr = FSTR_INIT(NULL, 0);
struct fscrypt_symlink_data *sd;
struct inode *inode = d_inode(dentry);
loff_t size = min_t(loff_t, i_size_read(inode), PAGE_SIZE - 1);
u32 max_size = inode->i_sb->s_blocksize;
int res;

@@ -760,9 +764,8 @@ static void *f2fs_encrypted_follow_link(struct dentry *dentry,

cpage = read_mapping_page(inode->i_mapping, 0, NULL);
if (IS_ERR(cpage))
return cpage;
return ERR_CAST(cpage);
caddr = kmap(cpage);
caddr[size] = 0;

/* Symlink is encrypted */
sd = (struct fscrypt_symlink_data *)caddr;
@@ -785,7 +788,7 @@ static void *f2fs_encrypted_follow_link(struct dentry *dentry,
goto errout;

res = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr);
if (res < 0)
if (res)
goto errout;

/* this is broken symlink case */
@@ -797,7 +800,7 @@ static void *f2fs_encrypted_follow_link(struct dentry *dentry,
paddr = pstr.name;

/* Null-terminate the name */
paddr[res] = '\0';
paddr[pstr.len] = '\0';
nd_set_link(nd, paddr);

kunmap(cpage);
254 changes: 162 additions & 92 deletions fs/f2fs/node.c

Large diffs are not rendered by default.

88 changes: 55 additions & 33 deletions fs/f2fs/node.h
Original file line number Diff line number Diff line change
@@ -169,14 +169,15 @@ static inline void next_free_nid(struct f2fs_sb_info *sbi, nid_t *nid)
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *fnid;

spin_lock(&nm_i->free_nid_list_lock);
if (nm_i->fcnt <= 0) {
spin_unlock(&nm_i->free_nid_list_lock);
spin_lock(&nm_i->nid_list_lock);
if (nm_i->nid_cnt[FREE_NID_LIST] <= 0) {
spin_unlock(&nm_i->nid_list_lock);
return;
}
fnid = list_entry(nm_i->free_nid_list.next, struct free_nid, list);
fnid = list_entry(nm_i->nid_list[FREE_NID_LIST].next,
struct free_nid, list);
*nid = fnid->nid;
spin_unlock(&nm_i->free_nid_list_lock);
spin_unlock(&nm_i->nid_list_lock);
}

/*
@@ -229,6 +230,37 @@ static inline void set_to_next_nat(struct f2fs_nm_info *nm_i, nid_t start_nid)
f2fs_change_bit(block_off, nm_i->nat_bitmap);
}

static inline nid_t ino_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le32_to_cpu(rn->footer.ino);
}

static inline nid_t nid_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le32_to_cpu(rn->footer.nid);
}

static inline unsigned int ofs_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
unsigned flag = le32_to_cpu(rn->footer.flag);
return flag >> OFFSET_BIT_SHIFT;
}

static inline __u64 cpver_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le64_to_cpu(rn->footer.cp_ver);
}

static inline block_t next_blkaddr_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le32_to_cpu(rn->footer.next_blkaddr);
}

static inline void fill_node_footer(struct page *page, nid_t nid,
nid_t ino, unsigned int ofs, bool reset)
{
@@ -259,40 +291,30 @@ static inline void fill_node_footer_blkaddr(struct page *page, block_t blkaddr)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(F2FS_P_SB(page));
struct f2fs_node *rn = F2FS_NODE(page);
size_t crc_offset = le32_to_cpu(ckpt->checksum_offset);
__u64 cp_ver = le64_to_cpu(ckpt->checkpoint_ver);

rn->footer.cp_ver = ckpt->checkpoint_ver;
if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG)) {
__u64 crc = le32_to_cpu(*((__le32 *)
((unsigned char *)ckpt + crc_offset)));
cp_ver |= (crc << 32);
}
rn->footer.cp_ver = cpu_to_le64(cp_ver);
rn->footer.next_blkaddr = cpu_to_le32(blkaddr);
}

static inline nid_t ino_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le32_to_cpu(rn->footer.ino);
}

static inline nid_t nid_of_node(struct page *node_page)
static inline bool is_recoverable_dnode(struct page *page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le32_to_cpu(rn->footer.nid);
}

static inline unsigned int ofs_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
unsigned flag = le32_to_cpu(rn->footer.flag);
return flag >> OFFSET_BIT_SHIFT;
}

static inline unsigned long long cpver_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le64_to_cpu(rn->footer.cp_ver);
}
struct f2fs_checkpoint *ckpt = F2FS_CKPT(F2FS_P_SB(page));
size_t crc_offset = le32_to_cpu(ckpt->checksum_offset);
__u64 cp_ver = cur_cp_version(ckpt);

static inline block_t next_blkaddr_of_node(struct page *node_page)
{
struct f2fs_node *rn = F2FS_NODE(node_page);
return le32_to_cpu(rn->footer.next_blkaddr);
if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG)) {
__u64 crc = le32_to_cpu(*((__le32 *)
((unsigned char *)ckpt + crc_offset)));
cp_ver |= (crc << 32);
}
return cp_ver == cpver_of_node(page);
}

/*
170 changes: 64 additions & 106 deletions fs/f2fs/recovery.c
Original file line number Diff line number Diff line change
@@ -68,15 +68,17 @@ static struct fsync_inode_entry *get_fsync_inode(struct list_head *head,
return NULL;
}

static struct fsync_inode_entry *add_fsync_inode(struct list_head *head,
struct inode *inode)
static struct fsync_inode_entry *add_fsync_inode(struct f2fs_sb_info *sbi,
struct list_head *head, nid_t ino)
{
struct inode *inode;
struct fsync_inode_entry *entry;

entry = kmem_cache_alloc(fsync_entry_slab, GFP_F2FS_ZERO);
if (!entry)
return NULL;
inode = f2fs_iget_retry(sbi->sb, ino);
if (IS_ERR(inode))
return ERR_CAST(inode);

entry = f2fs_kmem_cache_alloc(fsync_entry_slab, GFP_F2FS_ZERO);
entry->inode = inode;
list_add_tail(&entry->list, head);

@@ -96,48 +98,41 @@ static int recover_dentry(struct inode *inode, struct page *ipage,
struct f2fs_inode *raw_inode = F2FS_INODE(ipage);
nid_t pino = le32_to_cpu(raw_inode->i_pino);
struct f2fs_dir_entry *de;
struct qstr name;
struct fscrypt_name fname;
struct page *page;
struct inode *dir, *einode;
struct fsync_inode_entry *entry;
int err = 0;
char *name;

entry = get_fsync_inode(dir_list, pino);
if (!entry) {
dir = f2fs_iget(inode->i_sb, pino);
if (IS_ERR(dir)) {
err = PTR_ERR(dir);
goto out;
}

entry = add_fsync_inode(dir_list, dir);
if (!entry) {
err = -ENOMEM;
iput(dir);
entry = add_fsync_inode(F2FS_I_SB(inode), dir_list, pino);
if (IS_ERR(entry)) {
dir = ERR_CAST(entry);
err = PTR_ERR(entry);
goto out;
}
}

dir = entry->inode;

if (file_enc_name(inode))
return 0;

name.len = le32_to_cpu(raw_inode->i_namelen);
name.name = raw_inode->i_name;
memset(&fname, 0, sizeof(struct fscrypt_name));
fname.disk_name.len = le32_to_cpu(raw_inode->i_namelen);
fname.disk_name.name = raw_inode->i_name;

if (unlikely(name.len > F2FS_NAME_LEN)) {
if (unlikely(fname.disk_name.len > F2FS_NAME_LEN)) {
WARN_ON(1);
err = -ENAMETOOLONG;
goto out;
}
retry:
de = f2fs_find_entry(dir, &name, &page);
de = __f2fs_find_entry(dir, &fname, &page);
if (de && inode->i_ino == le32_to_cpu(de->ino))
goto out_unmap_put;

if (de) {
einode = f2fs_iget(inode->i_sb, le32_to_cpu(de->ino));
einode = f2fs_iget_retry(inode->i_sb, le32_to_cpu(de->ino));
if (IS_ERR(einode)) {
WARN_ON(1);
err = PTR_ERR(einode);
@@ -156,18 +151,24 @@ static int recover_dentry(struct inode *inode, struct page *ipage,
} else if (IS_ERR(page)) {
err = PTR_ERR(page);
} else {
err = __f2fs_add_link(dir, &name, inode,
err = __f2fs_do_add_link(dir, &fname, inode,
inode->i_ino, inode->i_mode);
}
if (err == -ENOMEM)
goto retry;
goto out;

out_unmap_put:
f2fs_dentry_kunmap(dir, page);
f2fs_put_page(page, 0);
out:
if (file_enc_name(inode))
name = "<encrypted>";
else
name = raw_inode->i_name;
f2fs_msg(inode->i_sb, KERN_NOTICE,
"%s: ino = %x, name = %s, dir = %lx, err = %d",
__func__, ino_of_node(ipage), raw_inode->i_name,
__func__, ino_of_node(ipage), name,
IS_ERR(dir) ? 0 : dir->i_ino, err);
return err;
}
@@ -179,13 +180,15 @@ static void recover_inode(struct inode *inode, struct page *page)

inode->i_mode = le16_to_cpu(raw->i_mode);
f2fs_i_size_write(inode, le64_to_cpu(raw->i_size));
inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
inode->i_atime.tv_sec = le64_to_cpu(raw->i_atime);
inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
inode->i_atime.tv_nsec = le32_to_cpu(raw->i_atime_nsec);
inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);

F2FS_I(inode)->i_advise = raw->i_advise;

if (file_enc_name(inode))
name = "<encrypted>";
else
@@ -195,37 +198,9 @@ static void recover_inode(struct inode *inode, struct page *page)
ino_of_node(page), name);
}

static bool is_same_inode(struct inode *inode, struct page *ipage)
{
struct f2fs_inode *ri = F2FS_INODE(ipage);
struct timespec disk;

if (!IS_INODE(ipage))
return true;

disk.tv_sec = le64_to_cpu(ri->i_ctime);
disk.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
if (timespec_compare(&inode->i_ctime, &disk) > 0)
return false;

disk.tv_sec = le64_to_cpu(ri->i_atime);
disk.tv_nsec = le32_to_cpu(ri->i_atime_nsec);
if (timespec_compare(&inode->i_atime, &disk) > 0)
return false;

disk.tv_sec = le64_to_cpu(ri->i_mtime);
disk.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
if (timespec_compare(&inode->i_mtime, &disk) > 0)
return false;

return true;
}

static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
{
unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
struct curseg_info *curseg;
struct inode *inode;
struct page *page = NULL;
block_t blkaddr;
int err = 0;
@@ -242,17 +217,14 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)

page = get_tmp_page(sbi, blkaddr);

if (cp_ver != cpver_of_node(page))
if (!is_recoverable_dnode(page))
break;

if (!is_fsync_dnode(page))
goto next;

entry = get_fsync_inode(head, ino_of_node(page));
if (entry) {
if (!is_same_inode(entry->inode, page))
goto next;
} else {
if (!entry) {
if (IS_INODE(page) && is_dent_dnode(page)) {
err = recover_inode_page(sbi, page);
if (err)
@@ -263,23 +235,15 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
* CP | dnode(F) | inode(DF)
* For this case, we should not give up now.
*/
inode = f2fs_iget(sbi->sb, ino_of_node(page));
if (IS_ERR(inode)) {
err = PTR_ERR(inode);
entry = add_fsync_inode(sbi, head, ino_of_node(page));
if (IS_ERR(entry)) {
err = PTR_ERR(entry);
if (err == -ENOENT) {
err = 0;
goto next;
}
break;
}

/* add this fsync inode to the list */
entry = add_fsync_inode(head, inode);
if (!entry) {
err = -ENOMEM;
iput(inode);
break;
}
}
entry->blkaddr = blkaddr;

@@ -363,7 +327,7 @@ static int check_index_in_prev_nodes(struct f2fs_sb_info *sbi,

if (ino != dn->inode->i_ino) {
/* Deallocate previous index in the node page */
inode = f2fs_iget(sbi->sb, ino);
inode = f2fs_iget_retry(sbi->sb, ino);
if (IS_ERR(inode))
return PTR_ERR(inode);
} else {
@@ -431,10 +395,15 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
end = start + ADDRS_PER_PAGE(page, inode);

set_new_dnode(&dn, inode, NULL, NULL, 0);

retry_dn:
err = get_dnode_of_data(&dn, start, ALLOC_NODE);
if (err)
if (err) {
if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, HZ/50);
goto retry_dn;
}
goto out;
}

f2fs_wait_on_page_writeback(dn.node_page, NODE, true);

@@ -458,7 +427,8 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
continue;
}

if ((start + 1) << PAGE_SHIFT > i_size_read(inode))
if (!file_keep_isize(inode) &&
(i_size_read(inode) <= (start << PAGE_SHIFT)))
f2fs_i_size_write(inode, (start + 1) << PAGE_SHIFT);

/*
@@ -485,11 +455,16 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
if (err)
goto err;
}

retry_prev:
/* Check the previous node page having this index */
err = check_index_in_prev_nodes(sbi, dest, &dn);
if (err)
if (err) {
if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, HZ/50);
goto retry_prev;
}
goto err;
}

/* write dummy data page */
f2fs_replace_block(sbi, &dn, src, dest,
@@ -506,15 +481,16 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
f2fs_put_dnode(&dn);
out:
f2fs_msg(sbi->sb, KERN_NOTICE,
"recover_data: ino = %lx, recovered = %d blocks, err = %d",
inode->i_ino, recovered, err);
"recover_data: ino = %lx (i_size: %s) recovered = %d, err = %d",
inode->i_ino,
file_keep_isize(inode) ? "keep" : "recover",
recovered, err);
return err;
}

static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list,
struct list_head *dir_list)
{
unsigned long long cp_ver = cur_cp_version(F2FS_CKPT(sbi));
struct curseg_info *curseg;
struct page *page = NULL;
int err = 0;
@@ -534,7 +510,7 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list,

page = get_tmp_page(sbi, blkaddr);

if (cp_ver != cpver_of_node(page)) {
if (!is_recoverable_dnode(page)) {
f2fs_put_page(page, 1);
break;
}
@@ -626,38 +602,20 @@ int recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
}

clear_sbi_flag(sbi, SBI_POR_DOING);
if (err) {
bool invalidate = false;

if (test_opt(sbi, LFS)) {
update_meta_page(sbi, NULL, blkaddr);
invalidate = true;
} else if (discard_next_dnode(sbi, blkaddr)) {
invalidate = true;
}

/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META))
sync_meta_pages(sbi, META, LONG_MAX);
if (err)
set_ckpt_flags(sbi, CP_ERROR_FLAG);
mutex_unlock(&sbi->cp_mutex);

/* invalidate temporary meta page */
if (invalidate)
invalidate_mapping_pages(META_MAPPING(sbi),
blkaddr, blkaddr);
/* let's drop all the directory inodes for clean checkpoint */
destroy_fsync_dnodes(&dir_list);

set_ckpt_flags(sbi->ckpt, CP_ERROR_FLAG);
mutex_unlock(&sbi->cp_mutex);
} else if (need_writecp) {
if (!err && need_writecp) {
struct cp_control cpc = {
.reason = CP_RECOVERY,
};
mutex_unlock(&sbi->cp_mutex);
err = write_checkpoint(sbi, &cpc);
} else {
mutex_unlock(&sbi->cp_mutex);
}

destroy_fsync_dnodes(&dir_list);
kmem_cache_destroy(fsync_entry_slab);
return ret ? ret: err;
}
263 changes: 167 additions & 96 deletions fs/f2fs/segment.c

Large diffs are not rendered by default.

37 changes: 15 additions & 22 deletions fs/f2fs/segment.h
Original file line number Diff line number Diff line change
@@ -17,6 +17,8 @@
#define DEF_RECLAIM_PREFREE_SEGMENTS 5 /* 5% over total segments */
#define DEF_MAX_RECLAIM_PREFREE_SEGMENTS 4096 /* 8GB in maximum */

#define F2FS_MIN_SEGMENTS 9 /* SB + 2 (CP + SIT + NAT) + SSA + MAIN */

/* L: Logical segment # in volume, R: Relative segment # in main area */
#define GET_L2R_SEGNO(free_i, segno) (segno - free_i->start_segno)
#define GET_R2L_SEGNO(free_i, segno) (segno + free_i->start_segno)
@@ -101,8 +103,6 @@
(((sector_t)blk_addr) << F2FS_LOG_SECTORS_PER_BLOCK)
#define SECTOR_TO_BLOCK(sectors) \
(sectors >> F2FS_LOG_SECTORS_PER_BLOCK)
#define MAX_BIO_BLOCKS(sbi) \
((int)min((int)max_hw_blocks(sbi), BIO_MAX_PAGES))

/*
* indicate a block allocation direction: RIGHT and LEFT.
@@ -470,26 +470,28 @@ static inline bool need_SSR(struct f2fs_sb_info *sbi)
{
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);

if (test_opt(sbi, LFS))
return false;

return free_sections(sbi) <= (node_secs + 2 * dent_secs +
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
reserved_sections(sbi) + 1);
}

static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi, int freed)
static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
int freed, int needed)
{
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);

node_secs += get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);

if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
return false;

return (free_sections(sbi) + freed) <= (node_secs + 2 * dent_secs +
reserved_sections(sbi));
return (free_sections(sbi) + freed) <=
(node_secs + 2 * dent_secs + imeta_secs +
reserved_sections(sbi) + needed);
}

static inline bool excess_prefree_segs(struct f2fs_sb_info *sbi)
@@ -586,8 +588,8 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)

static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
{
f2fs_bug_on(sbi, blk_addr < SEG0_BLKADDR(sbi)
|| blk_addr >= MAX_BLKADDR(sbi));
BUG_ON(blk_addr < SEG0_BLKADDR(sbi)
|| blk_addr >= MAX_BLKADDR(sbi));
}

/*
@@ -693,13 +695,6 @@ static inline bool sec_usage_check(struct f2fs_sb_info *sbi, unsigned int secno)
return false;
}

static inline unsigned int max_hw_blocks(struct f2fs_sb_info *sbi)
{
struct block_device *bdev = sbi->sb->s_bdev;
struct request_queue *q = bdev_get_queue(bdev);
return SECTOR_TO_BLOCK(queue_max_sectors(q));
}

/*
* It is very important to gather dirty pages and write at once, so that we can
* submit a big bio without interfering other data writes.
@@ -717,7 +712,7 @@ static inline int nr_pages_to_skip(struct f2fs_sb_info *sbi, int type)
else if (type == NODE)
return 8 * sbi->blocks_per_seg;
else if (type == META)
return 8 * MAX_BIO_BLOCKS(sbi);
return 8 * BIO_MAX_PAGES;
else
return 0;
}
@@ -734,11 +729,9 @@ static inline long nr_pages_to_write(struct f2fs_sb_info *sbi, int type,
return 0;

nr_to_write = wbc->nr_to_write;

desired = BIO_MAX_PAGES;
if (type == NODE)
desired = 2 * max_hw_blocks(sbi);
else
desired = MAX_BIO_BLOCKS(sbi);
desired <<= 1;

wbc->nr_to_write = desired;
return desired - nr_to_write;
10 changes: 6 additions & 4 deletions fs/f2fs/shrinker.c
Original file line number Diff line number Diff line change
@@ -21,14 +21,16 @@ static unsigned int shrinker_run_no;

static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
{
return NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;

return count > 0 ? count : 0;
}

static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
{
if (NM_I(sbi)->fcnt > MAX_FREE_NIDS)
return NM_I(sbi)->fcnt - MAX_FREE_NIDS;
return 0;
long count = NM_I(sbi)->nid_cnt[FREE_NID_LIST] - MAX_FREE_NIDS;

return count > 0 ? count : 0;
}

static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
288 changes: 204 additions & 84 deletions fs/f2fs/super.c

Large diffs are not rendered by default.

45 changes: 27 additions & 18 deletions fs/f2fs/xattr.c
Original file line number Diff line number Diff line change
@@ -152,7 +152,7 @@ static int f2fs_xattr_advise_set(struct dentry *dentry, const char *name,
return -EINVAL;

F2FS_I(inode)->i_advise |= *(char *)value;
f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, true);
return 0;
}

@@ -265,18 +265,20 @@ static struct f2fs_xattr_entry *__find_xattr(void *base_addr, int index,
return entry;
}

static void *read_all_xattrs(struct inode *inode, struct page *ipage)
static int read_all_xattrs(struct inode *inode, struct page *ipage,
void **base_addr)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_xattr_header *header;
size_t size = PAGE_SIZE, inline_size = 0;
void *txattr_addr;
int err;

inline_size = inline_xattr_size(inode);

txattr_addr = kzalloc(inline_size + size, GFP_F2FS_ZERO);
if (!txattr_addr)
return NULL;
return -ENOMEM;

/* read from inline xattr */
if (inline_size) {
@@ -287,8 +289,10 @@ static void *read_all_xattrs(struct inode *inode, struct page *ipage)
inline_addr = inline_xattr_addr(ipage);
} else {
page = get_node_page(sbi, inode->i_ino);
if (IS_ERR(page))
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto fail;
}
inline_addr = inline_xattr_addr(page);
}
memcpy(txattr_addr, inline_addr, inline_size);
@@ -302,8 +306,10 @@ static void *read_all_xattrs(struct inode *inode, struct page *ipage)

/* The inode already has an extended attribute block. */
xpage = get_node_page(sbi, F2FS_I(inode)->i_xattr_nid);
if (IS_ERR(xpage))
if (IS_ERR(xpage)) {
err = PTR_ERR(xpage);
goto fail;
}

xattr_addr = page_address(xpage);
memcpy(txattr_addr + inline_size, xattr_addr, PAGE_SIZE);
@@ -317,10 +323,11 @@ static void *read_all_xattrs(struct inode *inode, struct page *ipage)
header->h_magic = cpu_to_le32(F2FS_XATTR_MAGIC);
header->h_refcount = cpu_to_le32(1);
}
return txattr_addr;
*base_addr = txattr_addr;
return 0;
fail:
kzfree(txattr_addr);
return NULL;
return err;
}

static inline int write_all_xattrs(struct inode *inode, __u32 hsize,
@@ -414,9 +421,9 @@ int f2fs_getxattr(struct inode *inode, int index, const char *name,
if (len > F2FS_NAME_LEN)
return -ERANGE;

base_addr = read_all_xattrs(inode, ipage);
if (!base_addr)
return -ENOMEM;
error = read_all_xattrs(inode, ipage, &base_addr);
if (error)
return error;

entry = __find_xattr(base_addr, index, len, name);
if (IS_XATTR_LAST_ENTRY(entry)) {
@@ -450,9 +457,9 @@ ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
int error = 0;
size_t rest = buffer_size;

base_addr = read_all_xattrs(inode, NULL);
if (!base_addr)
return -ENOMEM;
error = read_all_xattrs(inode, NULL, &base_addr);
if (error)
return error;

list_for_each_xattr(entry, base_addr) {
const struct xattr_handler *handler =
@@ -504,9 +511,9 @@ static int __f2fs_setxattr(struct inode *inode, int index,
if (size > MAX_VALUE_LEN(inode))
return -E2BIG;

base_addr = read_all_xattrs(inode, ipage);
if (!base_addr)
return -ENOMEM;
error = read_all_xattrs(inode, ipage, &base_addr);
if (error)
return error;

/* find entry with wanted name. */
here = __find_xattr(base_addr, index, len, name);
@@ -582,13 +589,15 @@ static int __f2fs_setxattr(struct inode *inode, int index,

if (is_inode_flag_set(inode, FI_ACL_MODE)) {
inode->i_mode = F2FS_I(inode)->i_acl_mode;
inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_time(inode);
clear_inode_flag(inode, FI_ACL_MODE);
}
if (index == F2FS_XATTR_INDEX_ENCRYPTION &&
!strcmp(name, F2FS_XATTR_NAME_ENCRYPTION_CONTEXT))
f2fs_set_encrypted_inode(inode);
f2fs_mark_inode_dirty_sync(inode);
f2fs_mark_inode_dirty_sync(inode, true);
if (!error && S_ISDIR(inode->i_mode))
set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP);
exit:
kzfree(base_addr);
return error;
1 change: 1 addition & 0 deletions include/linux/f2fs_fs.h
Original file line number Diff line number Diff line change
@@ -104,6 +104,7 @@ struct f2fs_super_block {
/*
* For checkpoint
*/
#define CP_CRC_RECOVERY_FLAG 0x00000040
#define CP_FASTBOOT_FLAG 0x00000020
#define CP_FSCK_FLAG 0x00000010
#define CP_ERROR_FLAG 0x00000008
4 changes: 2 additions & 2 deletions include/linux/fscrypto.h
Original file line number Diff line number Diff line change
@@ -273,7 +273,7 @@ extern void fscrypt_restore_control_page(struct page *);
extern int fscrypt_zeroout_range(struct inode *, pgoff_t, sector_t,
unsigned int);
/* policy.c */
extern int fscrypt_process_policy(struct inode *,
extern int fscrypt_process_policy(struct file *,
const struct fscrypt_policy *);
extern int fscrypt_get_policy(struct inode *, struct fscrypt_policy *);
extern int fscrypt_has_permitted_context(struct inode *, struct inode *);
@@ -344,7 +344,7 @@ static inline int fscrypt_notsupp_zeroout_range(struct inode *i, pgoff_t p,
}

/* policy.c */
static inline int fscrypt_notsupp_process_policy(struct inode *i,
static inline int fscrypt_notsupp_process_policy(struct file *f,
const struct fscrypt_policy *p)
{
return -EOPNOTSUPP;
21 changes: 21 additions & 0 deletions include/trace/events/f2fs.h
Original file line number Diff line number Diff line change
@@ -1072,6 +1072,27 @@ TRACE_EVENT(f2fs_issue_discard,
(unsigned long long)__entry->blklen)
);

TRACE_EVENT(f2fs_issue_reset_zone,

TP_PROTO(struct super_block *sb, block_t blkstart),

TP_ARGS(sb, blkstart),

TP_STRUCT__entry(
__field(dev_t, dev)
__field(block_t, blkstart)
),

TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->blkstart = blkstart;
),

TP_printk("dev = (%d,%d), reset zone at block = 0x%llx",
show_dev(__entry),
(unsigned long long)__entry->blkstart)
);

TRACE_EVENT(f2fs_issue_flush,

TP_PROTO(struct super_block *sb, unsigned int nobarrier,

0 comments on commit d0b62d5

Please sign in to comment.