Skip to content
This repository has been archived by the owner on Apr 15, 2023. It is now read-only.

Commit

Permalink
Todo list moved to github issues and projects
Browse files Browse the repository at this point in the history
  • Loading branch information
chotchki committed Aug 21, 2021
1 parent 0edd6b6 commit ebf78e1
Showing 1 changed file with 0 additions and 138 deletions.
138 changes: 0 additions & 138 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,144 +31,6 @@ Benchmark to aid in profiling
* You can create tables, insert data and query single tables.
* Data is persisted to disk, not crash safe and the on disk format is NOT stable.

## Current TODO List - Subject to constant change!

**TODO**
Implement Free Space Maps so mutation of data doesn't need a linear scan/parse non stop.

Did more thinking, I should implement postgres's streams concept so that I don't need to do lookups to find associated metadata on an object.
I thought I was going to get to use Uuid+page offset. I think its now going to be uuid+page offset+ type.

struct PageId + enum PageType should do it (done).

So I have a fully ready free space map but I can't avoid the locking issue anymore despite it also being the next item on the todo list.

**TODO**
Implement page level locks that are ordered to avoid deadlocking.

Acceptance Criteria:
* Should be able to update a row either inside a page or not without loosing commits.
* This is independent of transaction control so I think this sits below/in row manager.

**TODO**
Write a getting started section on the feophant.com website.

**TODO**

Add support for defining a primary key on a table. This implies the following functionality:
* Index support through the stack down to the page level.
* The concept of unique indexes.
* Transactional support for indexes.
* Failure of a statement on constraint violation. Unsure if I'll end up with a general constraint system from this.

Based on reading this really means implementing Btree indexes. They don't seem to be that bad to understand/implement.

First and most important question, how should the index layers work?
Are they transactional? (I don't think so until I implement a visability map)
How should the low level layer function?
Should I have an Index config struct I pass around or just a table + columns + unique or not + type
Index Config it is

Index Manager -> for a given table
IO Manager -> Handle Page Load / Store / Update

Implemented the formats but I think I need to add locking to the I/O manager.
At a minimum I need to support a get for update, update and release lock.
I'm not sure I understand how this should work :(. I think need to commit to another layer.

Back to indexes for now. I need to make a decision on how to handle them hitting the file system.
Postgres uses a series of OIDs to map onto disk.

I've been using uuids, I think I'm going to continue that. That would also solve the postgres fork approach.

Next up implementing the index manager to add entries to the index.

I'm having a hard time figuring this out, I might work to do the operations on the tree before I keep messing with the serialization protocols. I'm just worries they are directly linked.

Got further into the index manager. Unfortunately I need a lock manager to let it even pass the smell test. Time to go on a wild goose chase again! (This project is great for someone with ADHD to have fun on!)

The lock manager design/code is done but I'm not happy with using a rwlock to protect a tag. I really want to have the lock protect the content but that needs a way for me to support writeback. I think I need to build out two more things, a WAL mechanism and a buffer manager.

I guess I need to commit to doing this for reals. However I am worried about reaching a point of partially working for a while like when I did the type fixing. We'll see how this goes.

For now, the index implementation is now on hold until I get an integrated I/O subsystem and a stubbed out WAL.

**TODO**


Implement where clauses, will likely need to have to start tracing columns from analyizing through to later stages.


**TODO**

Implement support for running a fuzzer against the code base to ensure we are keeping the code at a high quality.

**TODO**

Implement delete for tuples

**TODO**
Implement the beginning parts of a WAL so that I can get to crash safety.

**TODO**
Defer parsing rows off disk until they are actually needed. I feel like I parse too early however any work on this should wait until I can really profile this.

**TODO**

pgbench setup can run successfully

**TODO**
Implement support for parameterized queries.

**TODO**

Ensure data about table structures is thread safe in the face of excessive Arc usage.

See where I can pass read only data by reference instead of uisng Arc everywhere

**TODO**

Support a row with more than 4kb of text in it.

**TODO**

Implement sorting.

**TODO**

Implement column aliasing

**TODO**

Implement subselect.

**TODO**

Implement Updates.

**1.0 Release Criteria**

* pgbench can run successfully
* ~~Pick a new distinct name, rename everything~~ Done
* Pick a license
* Setup fuzz testing
* Persist to disk with moderate crash safety
* Be prepared to actually use it


### Longer Term TODO

This is stuff that I should get to but aren't vital to getting to a minimal viable product.
* Right now the main function runs the server from primitives. The Tokio Tower layer will probably do it better.
* The codec that parses the network traffic is pretty naive. You could make the server allocate 2GB of data for a DDOS easily.
* * We should either add state to the codec or change how it parses to produce chunked requests. That means that when the 2GB offer is reached the server can react and terminate before we accept too much data. Its a little more nuanced than that, 2GB input might be okay but we should make decisions based on users and roles.
* There is an extension that removes the need to lock tables to repack / vaccum. Figure out how it works!
* * https://github.com/reorg/pg_repack
* Investigate if the zheap table format would be better to implement.
** Until I get past a WAL implementation and planner costs I don't think its worth it.
** Since I extended the size of transaction IDs, I probably have a larger issue on my hands than normal postgres.
*** Reading into the zheap approach I'm thinking that I might have some space saving options availible for me. In particular if a tuple is frozen so its always availible I could remove the xmin/xmax and pack more into the page. Need more thinking however my approach of questioning the storage efficency of each part of data seems to be worth it.

## Postgres Divergance

Its kinda pointless to blindly reproduce what has already been done so I'm making the following changes to the db server design vs Postgres.
Expand Down

0 comments on commit ebf78e1

Please sign in to comment.