Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Drop Table #897

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Adds Drop Table #897

wants to merge 17 commits into from

Conversation

redixhumayun
Copy link
Contributor

@redixhumayun redixhumayun commented Feb 5, 2025

This PR adds support for DROP TABLE and addresses issue #894

It depends on #785 being merged in because it requires the implementation of free_page.
EDIT: The PR above has been merged.

It adds the following:

  • an implementation for the DropTable AST instruction via a method called translate_drop_table
  • a couple of new instructions - Destroy and DropTable. The former is to modify physical b-tree pages and the latter is to modify in-memory structures like the schema hash table.
  • btree_destroy on BTreeCursor to walk the tree of pages for this table and place it in free list.
  • state machine traversal for both btree_destroy and clear_overflow_pages to ensure performant, correct code.
  • unit & tcl tests
  • modifies the Null instruction to follow SQLite semantics and accept a second register. It will set all registers in this range to null. This is required for DROP TABLE.

Bytecode Compatibility

The screenshots below have a comparison of the bytecodes generated via SQLite & Limbo.

Limbo has the same instruction set except for the subroutines which involve opening an ephemeral table, copying over the triggers from the sqlite_schema table and then re-inserting them back into the sqlite_schema table.

This is because OpenEphemeral is still a WIP and is being tracked at #768

Screenshot 2025-02-09 at 7 05 03 PM
Screenshot 2025-02-09 at 7 05 35 PM

@redixhumayun redixhumayun marked this pull request as draft February 5, 2025 15:11
@diegoreis42
Copy link
Contributor

diegoreis42 commented Feb 6, 2025

Very nice @redixhumayun! I don't understand things below the btree yet but I can give my 2 cents about other things. Idk if you ran EXPLAIN to take a look in the bytecodes, in my machine it produces:

CREATE TABLE a (a);
EXPLAIN DROP TABLE a (a);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     37    0                    0   Start at 37
1     Null           0     1     0                    0   r[1]=NULL
2     OpenWrite      0     1     0     5              0   root=1 iDb=0; sqlite_master
3     Rewind         0     11    0                    0
4       Column         0     2     2                    0   r[2]= cursor 0 column 2
5       Ne             3     10    2     BINARY-8       82  if r[2]!=r[3] goto 10
6       Column         0     0     2                    0   r[2]= cursor 0 column 0
7       Eq             4     10    2     BINARY-8       82  if r[2]==r[4] goto 10
8       Rowid          0     5     0                    0   r[5]=sqlite_master.rowid
9       Delete         0     0     0                    2
10    Next           0     4     0                    1
11    Destroy        2     2     0                    0
12    Null           0     6     7                    0   r[6..7]=NULL
13    OpenEphemeral  2     0     6                    0   nColumn=0
14    IfNot          2     22    1                    0
15    OpenRead       1     1     0     4              0   root=1 iDb=0; sqlite_master
16    Rewind         1     22    0                    0
17      Column         1     3     13                   0   r[13]= cursor 1 column 3
18      Ne             2     21    13    BINARY-8       84  if r[13]!=r[2] goto 21
19      Rowid          1     7     0                    0   r[7]= rowid of 1
20      Insert         2     6     7                    0   intkey=r[7] data=r[6]
21    Next           1     17    0                    1
22    OpenWrite      1     1     0     5              0   root=1 iDb=0; sqlite_master
23    Rewind         2     34    0                    0
24      Rowid          2     7     0                    0   r[7]= rowid of 2
25      NotExists      1     33    7                    0   intkey=r[7]
26      Column         1     0     8                    0   r[8]= cursor 1 column 0
27      Column         1     1     9                    0   r[9]= cursor 1 column 1
28      Column         1     2     10                   0   r[10]= cursor 1 column 2
29      Integer        2     11    0                    0   r[11]=2
30      Column         1     4     12                   0   r[12]= cursor 1 column 4
31      MakeRecord     8     5     15    BBBDB          0   r[15]=mkrec(r[8..12])
32      Insert         1     15    7                    0   intkey=r[7] data=r[15]
33    Next           2     24    0                    0
34    DropTable      0     0     0     a              0
35    SetCookie      0     1     2                    0
36    Halt           0     0     0                    0
37    Transaction    0     1     1     0              1   usesStmtJournal=1
38    String8        0     3     0     a              0   r[3]='a'
39    String8        0     4     0     trigger        0   r[4]='trigger'
40    Goto           0     1     0                    0

There are some bytecodes that afaik we don't have implemented yet, like SetCookie, Destroy and DropTable, for OpenEphemeral there is this an open PR #768, but honestly idk how it is used in this case.

@redixhumayun
Copy link
Contributor Author

@diegoreis42 Hey, thanks for taking a look. Yeah, I took a look at the SQLite bytecode before I started working on this. Although, after your comment I went back fixed a few things.

Yeah, I'm skipping the instructions that don't exist like OpenEphemeral, SetCookie, Destroy etc. However, this should still take care of removing the metadata and freeing up the b-tree pages correctly.

Screenshot 2025-02-06 at 10 57 24 PM

@redixhumayun
Copy link
Contributor Author

@diegoreis42 @PThorpe92 @jussisaurio Btw how closely are we shooting to match the bytecodes across Limbo & SQLite? Because even for CREATE TABLE, I can see that the byte codes differ quite a bit.

@PThorpe92
Copy link
Contributor

Hey @redixhumayun, I haven't taken a look at this quite yet, but to answer your question, with the exception of the obvious places we will differ like OpenWriteAsync InsertAsync / ..Await, we are aiming for compatibility with sqlite as much as possible. The reason for any statements currently not matching sqlite's output is just due to the amount of opcodes still not implemented and not for any intended difference in design.

I'll have a look at this in the morning 👍

@redixhumayun redixhumayun marked this pull request as ready for review February 9, 2025 13:44
@alpaylan
Copy link
Contributor

alpaylan commented Feb 9, 2025

In case it uncovers any bugs;
#949

}
}
} else {
// if the node is a leaf node, clear overflow pages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not for now but we also need to check overflow pages on interior index pages because they have variable size payload.

self.payload_overflow_threshold_min(contents.page_type()),
self.usable_space(),
)?;
return_if_io!(self.clear_overflow_pages(&cell));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this returns Locked then we will try to run the traversal again for every overflow page we try to load which is quite inefficient and I don't know if it's safe. It looks like we will need a state machine to keep track of all pages that are being deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 Regarding performance concerns, all pages should be in the page cache if the routine resumes, right?. Yes, traversal would be repeated.

I don't understand why it wouldn't be safe though. Could you explain?

Should I go ahead and add a state machine implementation to this routine? Guessing I need to add a new variant to CursorState and use that to track?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 Regarding performance concerns, all pages should be in the page cache if the routine resumes, right?. Yes, traversal would be repeated.

There are no guarantees of a page being in cache unless you pin them. For example, there are no set_dirty or pager.add_dirty calls which marks them as pinned. But even though if we were to pin all of these pages, imagine the case where you drop a table which has 50GB worth of pages, now this means that would have to hold ALL pages in memory, but this is a problem that happens regardless of this destroy stuff anyways. Good thing about a state machine is that you can resume where you started, in this case maybe the PageStack could be enough to ensure we don't traverse again from root to child page every time.

I don't understand why it wouldn't be safe though. Could you explain?

For example you call clear_overflow_pages, which clears the first overflow page, and then it will try to load the second overflow page, but this page could return I/O because it's not in cache. This means that first page is now "freed" and somehow we need to traverse again "first page -> second page" but first page is not there now? This might be also a problem with current clear overflow pages because it should behave like a state machine with DFS.

Copy link
Contributor Author

@redixhumayun redixhumayun Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 I've addressed your comments in f2e11a1 and 4eaeb18.
Can you take a look?

It adds a state machine to keep track of Destroy operations and also adds another state machine to keep track of ClearOverflowPage operations.
The reason for two separate state machines is because clear_overflow_pages has two separate call sites (one from btree_destroy and one from delete) and I don't want to complicate the code by linking the state machine used by btree_destroy to the state machine used in clear_overflow_pages

penberg added a commit that referenced this pull request Feb 11, 2025
I wanted to assist the current development in an up-to-date fashion,
this PR adds drop table(which is being implemented currently in
#897) testing support to the
generator.
Unfortunately, we don't have feature flags in the simulator yet, so the
users should manually fix the generation probability in
`simulator/generation/plan.rs#L644` and
`simulator/generation/property.rs#L629`.

Closes #949
@penberg penberg added this to the 0.1.0 milestone Feb 14, 2025
@redixhumayun redixhumayun force-pushed the main branch 2 times, most recently from c892d70 to f2e11a1 Compare February 16, 2025 08:09
redixhumayun added a commit to redixhumayun/limbo that referenced this pull request Feb 16, 2025
this commit adds the final touches to the state machine for btree_destroy and also introduces a state machine for clear_overflow_pages
this addresses the issue of IO interrupting an operation in progress and solves the following problems:
* performance issues with repeating an operation
* missing page issues with clear overflow pages
redixhumayun added a commit to redixhumayun/limbo that referenced this pull request Feb 18, 2025
this commit adds the final touches to the state machine for btree_destroy and also introduces a state machine for clear_overflow_pages
this addresses the issue of IO interrupting an operation in progress and solves the following problems:
* performance issues with repeating an operation
* missing page issues with clear overflow pages
…LE to be implemented

this is the initial commit is for the implementation of DROP TABLE. It adds support for the DROP TABLE instruction and adds a DropBTree instruction. It also implements the btree_drop method in btree.rs which makes use of free_page method which will be implemented via PR tursodatabase#785
the command for drop table translation has been updated so that it more closely matches the semantics of SQLite's drop table command.

there are a few more things missing like ephemeral tables, destroy etc.
added required functionality for destroy. minor cleanup of print statements
correctly emitting constant instructions and also calling Destroy instead of DropTable instruction for removing btree pages
modified the Null instruction to more closely match SQLite semantics. Allows passing in a second register and all registers from r1..r2 area set to null
the instruction DropTable has been modified to correctly match SQLite semantics
added helper methods to Schema to remove table and indices from in-memory structures
completed the implementation for DropTable using that
modified the btree_destroy subroutine to do an iterative DFS and use the stack cell counters to keep track of whether children have been removed. adds a unit test.
added missing instructions to drop the indices for a table physically from btree pages in a loop
this commit adds the final touches to the state machine for btree_destroy and also introduces a state machine for clear_overflow_pages
this addresses the issue of IO interrupting an operation in progress and solves the following problems:
* performance issues with repeating an operation
* missing page issues with clear overflow pages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants