bcachefs: document btree iter flags #287

dlrobertson · 2021-07-13T01:21:53Z

Add documentation commets to the btree iter flag definitions.

Signed-off-by: Dan Robertson [email protected]

…dvance The way btree iterators work internally has been changing, particularly with the iter->real_pos changes, and bch2_btree_iter_next() is no longer hyper optimized - it's just advance followed by peek, so it's more efficient to just call advance where we're not using the return value of bch2_btree_iter_next(). Signed-off-by: Kent Overstreet <[email protected]>

btree node iterators need to obey the regular btree node invarionts w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have less that it needs to check. Signed-off-by: Kent Overstreet <[email protected]>

This means bch2_btree_iter_traverse_one() can be made more efficient. Signed-off-by: Kent Overstreet <[email protected]>

Since we're no longer doing next() immediately followed by peek(), this optimization isn't doing anything anymore. Signed-off-by: Kent Overstreet <[email protected]>

This just gives some internal helpers some better names. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

Ideally we'll be getting rid of peek_with_updates(), but the callers will need to be checked. Signed-off-by: Kent Overstreet <[email protected]>

peek() has to update iter->real_pos - there's no need for bch2_btree_iter_set_pos() to update it as well. Signed-off-by: Kent Overstreet <[email protected]>

More prep work for snapshots. Signed-off-by: Kent Overstreet <[email protected]>

It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <[email protected]>

It had some silly redundancies. Signed-off-by: Kent Overstreet <[email protected]>

External (to the btree iterator code) users of bch2_btree_iter_traverse expect that on success the iterator will be pointed at iter->pos and have that position locked - but since we split iter->pos and iter->real_pos, that means it has to update iter->real_pos if necessary. Internal users don't expect it to modify iter->real_pos, so we need two separate functions. Signed-off-by: Kent Overstreet <[email protected]>

This adds a mode to six locks where readers use percpu counters - avoiding writing to shared cachelines. The algorithm is the same as the existing percpu-rwsemaphore's slowpath algorithm: taking a read lock still requires a memory barrier to check if we raced with another thread taking a write lock, but this means that taking a write lock doesn't incur the cost of an RCU barrier. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

The default was 1/256th of the device and capped at 512MB, which is fairly tiny these days. Signed-off-by: Kent Overstreet <[email protected]>

Bkey noops were introduced to deal with trimming inline data extents in place in the btree: if the u64s field of a bkey was 0, that u64 was a noop and we'd start looking for the next bkey immediately after it. But extent handling has been lifted above the btree - we no longer modify existing extents in place in the btree, and the compatibilty code for old style extent btree nodes is gone, so we can completely drop this code. Signed-off-by: Kent Overstreet <[email protected]>

On btree node split, we weren't ensuring the min_key of the new larger node packs in the new format for this node. This triggers some painful slowpaths in the bset.c aux search tree code - this patch fixes that by calculating a new format for the new node with the new min_key. Signed-off-by: Kent Overstreet <[email protected]>

We weren't packing the min/max keys, which was a major oversight and completely disabled generating bkey_floats for adjacent nodes. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

…d to When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to unlock after a successful commit, but it was calling bch2_trans_cond_resched() - oops. Signed-off-by: Kent Overstreet <[email protected]>

Since we now make sure to always generate packed bkey formats that can pack the min_key of a btree node, this path should actually never happen. Signed-off-by: Kent Overstreet <[email protected]>

The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <[email protected]>

This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <[email protected]>

This patch adds two new inode fields, bi_dir and bi_dir_offset, that point back to the inode's dirent. Since we're only adding fields for a single backpointer, files that have been hardlinked won't necessarily have valid backpointers: we also add a new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has ever had multiple links to it. That's ok, because we only really need this functionality for directories, which can never have multiple hardlinks - when we add subvolumes, we'll need a way to enemurate and print subvolumes, and this will let us reconstruct a path to a subvolume root given a subvolume root inode. Signed-off-by: Kent Overstreet <[email protected]>

For snapshots, when we allocate a new inode we want to allocate an inode number that isn't in use in any other subvolume. We won't be able to use ITER_SLOTS for this, inode allocation needs to change to use BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <[email protected]>

Since move.c isn't aware of what subvolume we're in, we can't use the standard inode lookup code - fortunately, we're just using it for reading IO options. Signed-off-by: Kent Overstreet <[email protected]>

This adds a new watermark for the journal reclaim when flushing btree key cache entries - it should try and stay ahead of where foreground threads doing transaction commits will enter direct journal reclaim. Signed-off-by: Kent Overstreet <[email protected]>

This is specifically to speed up bch2_inode_rm(), so that we're not traversing iterators we're done with. Signed-off-by: Kent Overstreet <[email protected]>

koverstreet added 30 commits July 6, 2021 13:03

bcachefs: Iterators are now always consistent with iter->real_pos

62f970f

This means bch2_btree_iter_traverse_one() can be made more efficient. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Kill btree_iter_peek_uptodate()

590d39a

Since we're no longer doing next() immediately followed by peek(), this optimization isn't doing anything anymore. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Internal btree iterator renaming

6015114

This just gives some internal helpers some better names. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Improve iter->real_pos handling

ed60a42

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates()

ac25f77

Ideally we'll be getting rid of peek_with_updates(), but the callers will need to be checked. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Update iter->real_pos lazily

2ac6561

peek() has to update iter->real_pos - there's no need for bch2_btree_iter_set_pos() to update it as well. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Include snapshot field in bch2_bpos_to_text

7954483

More prep work for snapshots. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Add an .invalid method for bch2_btree_ptr_v2

e093c27

It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Improve inode deletion code

43bdc71

It had some silly redundancies. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Use pcpu mode of six locks for interior nodes

1f8a56d

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Increase default journal size

8e392f1

The default was 1/256th of the device and capped at 512MB, which is fairly tiny these days. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix building of aux search trees

e0ace9f

We weren't packing the min/max keys, which was a major oversight and completely disabled generating bkey_floats for adjacent nodes. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix packed bkey format calculation for new btree roots

f705ee5

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Simplify btree_node_iter_init_pack_failed()

de257da

Since we now make sure to always generate packed bkey formats that can pack the min_key of a btree node, this path should actually never happen. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Add a mechanism for running callbacks at trans commit time

c49a165

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Don't use bch2_inode_find_by_inum() in move.c

bc50aa9

Since move.c isn't aware of what subvolume we're in, we can't use the standard inode lookup code - fortunately, we're just using it for reading IO options. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Free iterator in bch2_btree_delete_range_trans()

7aa784e

This is specifically to speed up bch2_inode_rm(), so that we're not traversing iterators we're done with. Signed-off-by: Kent Overstreet <[email protected]>

koverstreet force-pushed the master branch 4 times, most recently from b58e70f to 5cf488d Compare September 10, 2024 15:01

koverstreet force-pushed the master branch from b72bcc9 to 87a3e08 Compare September 21, 2024 19:33

koverstreet force-pushed the master branch 8 times, most recently from dbf1b55 to b49ac4a Compare October 9, 2024 20:58

koverstreet force-pushed the master branch 7 times, most recently from 8044a9c to 1807267 Compare October 14, 2024 21:46

koverstreet force-pushed the master branch 8 times, most recently from 2c3ed03 to a5f81a0 Compare October 31, 2024 09:40

koverstreet force-pushed the master branch 2 times, most recently from e4b827e to b3a7824 Compare November 4, 2024 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bcachefs: document btree iter flags #287

bcachefs: document btree iter flags #287

dlrobertson commented Jul 13, 2021 •

edited

Loading

bcachefs: document btree iter flags #287

Are you sure you want to change the base?

bcachefs: document btree iter flags #287

Conversation

dlrobertson commented Jul 13, 2021 • edited Loading

dlrobertson commented Jul 13, 2021 •

edited

Loading