Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V7 Event Store Enhancement Brainstorming -- Please join the discussion! #2888

Closed
wants to merge 1 commit into from

Conversation

jeremydmiller
Copy link
Member

at @oskardudycz 's recommendation, this single markdown file is meant to capture comments and discussions about possible enhancements to the Marten event sourcing functionality in the V7 release timeframe. All feedback is welcome here!

Comment on lines +40 to +41
* **Maybe** it would be valuable to shard the active events and event streams on the event sequence number ranges. That would be beneficial to the async daemon, if
harmful to live aggregations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some thoughts about this, If the streams table was updated to include a "first_event_sequence_id" and "last_event_sequence_id" then these could be used with live projections to drastically cut back on the number of partitions scanned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting suggestion. I'll have to think on that one a bit. Might say that's an opt in behavior

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elexisvenator could you expand on why would it impact the number of partition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With @elexisvenator 's suggestion, doing a live aggregation of any kind could use the cached "lowest" & "highest" sequence numbers for the stream as part of the WHERE in the event fetching. That would let postgres limit the ranged partitions to look through.

Copy link
Collaborator

@oskardudycz oskardudycz Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just decreasing the issue. For bigger databases when you need to query more than one partition then the performance will be highly degraded. So you should do all you can to just query a single partition. Scan between partitions is really costly in Postgres.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can get costly yes, but a low number of partitions wont make a huge impact. (eg <20, depending on each partitions size). The other challenges with this approach are:

  • managing the provisioning of new partitions
  • Queries based on timestamps - we can be smarter here with some form of prequery because we know that timestamps increase as sequence id increases but out of the box postgres will have a bad day.
  • Unique constraints on event id, these simply cannot exist in any partitioned system, so will affect the partition-on-archived approach as well

want to make it as easy as possible to walk the sequence number values within a single table
* **Maybe** it would be valuable to shard the active events and event streams on the event sequence number ranges. That would be beneficial to the async daemon, if
harmful to live aggregations
* This is going to be an ugly migration, which is why is was cut from V4. If you apply this after the fact, you'll have to have some down time. You'll have to copy the existing mt_streams & mt_events tables off to the side, then drop both tables, create the new table partitions and the virtual table that points to the partitions, then copy in all the events back into the virtual table so postgres can sort the actual records around
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to slice on tenant, but this has similarly headache-inducing implications.
The only was this would work is to give every tenant their own highwatermark and treat them like they are in separate dbs even though they are conjoined.

This approach would benefit tasks like offboarding and even potential performance management (eg move hot tenants to their own hardware)

Overall, there is probably more to gain from partitioning on archived or eventsequence id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing it on the tenant would be awful for the daemon, that's why I was pooh-poohing that before. Like you say, you'd need a separate sequence and high water mark for each. That's a bit more complexity

Comment on lines +35 to +37
* My personal proposal is to use native PostgreSQL sharding on the `is_archived` column to effect a "hot storage, cold storage" approach. This should be paired with
advice to users to be more aggressive about archiving event streams that are obsolete/stale/not active as a way to improve throughput over all. The daemon and every
other place where we query events automatically uses `is_archived = false` filters anyway
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this would encourage the use of is_archived far more than it is currently being used by marten consumers. Might need to consider implications for things like projections.

For example, I currently only use is_archived as a soft-delete that will eventually be hard deleted (as part of restreaming usually). If it becomes a "move your old stuff here" feature then I will have challenges with projection consistency. In particular I would want to make sure that a new projection I create is backfilled with archived values as well, or at least configure my projection so that can be possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts as well. That's mostly inspired by Eventuous's "hot / cold storage" idea

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my write-up on the log compaction process:

And EventStoreDB scavenging process docs: https://developers.eventstore.com/server/v23.10/operations.html#scavenging-events

TLDR, we should have a formal archival process that works as a background process. It has to be either on-demand or semi-automatic. We should ensure that all archived events are published and processed via async daemon.

The detaching partition should be the last step, first people should be able to move the events somewhere else depending on the strategy.

I think that eventually we should support all scenarios I described. The basic building blocks should be part of Marten. So archival, also partial archival of stream, tombstoning, and hosted service giving option to plug in custom archival logic. The more advanced and precise scenarios should be part of CritterStackPro.

Copy link
Contributor

@ericgreenmix ericgreenmix Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of this feature.

We should ensure that all archived events are published and processed via async daemon.

Yes please. This is currently an issue with my applications being able to use archival. The async daemon doesn't pick up any event that was just archived, so I can't move the async projection into a new state (or delete that document) on that initial archival event.

Comment on lines +98 to +103
This is really just sticking a "projection version number" on the Projection subclasses that would potentially be used as a suffix on the document
tables for the projected documents (the document alias). This would enable blue/green deployment of the same projection so that nodes on the previous version can be
using the previous revision/definition of the projection while other nodes are running the newer model.

Obviously there's some serious concern about just how the new projection gets rebuilt in time. I think the "catch up" mode that this document
introduces as an "Async strategy for FetchForWriting()" is necessary to make this viable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to apply this to inline projections as well would be invaluable (though that last step of consistency could be a challenge).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly for the inline projections. I think we'd fudge the lines between inline and async here. With a potential client that didn't pan out, I was calling this the "catch up mode". I think we have to encourage folks to use FetchForWriting() everywhere instead of LoadAsync() or AggregateStreamAsync() unless you're doing time travel.


By no means will this be a true equivalent of the projection load balancing work planned for "CritterStackPro"

## Dynamic Database per Tenant Multi-Tenancy model
Copy link
Contributor

@elexisvenator elexisvenator Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to see, and keen to try it though I dont know how to deal with the per-tenant db scaling issues.

Context:
In one aws region we have ~300 tenants. At this time our app uses conjoined tenancy on an "r6g.xl" instance size. To split them up would either mean creating ~300 rds instances. By default the instance limit is 40 per region, and even ignoring that the cost per instance would skyrocket as they would be forced to be massively over-provisioned instances (smallest viable size is "small", due to below issue).

The alternative is to group tenants together onto a smaller set of instances, say 50 per rds instance. This introduces a new problem where the memory overhead of connections starts to add up and consume all memory in the database. The memory consumed (by default) would be the mem allocated per connection (8mb) * max connections by app (100) * tenants in server (50) * nodes in app (2-4), resulting in >80gb memory used in this situation. This puts scaling limits on your application and forces the db to be over-provisioned.

A possible workaround is Amazon RDS Proxy. Previously this wasn't a good solution as the postgres support was always several major versions behind current, but now it seems to be in sync with postgres 16. It only really helps with the node scaling and not the tenant scaling as different dbs on the same server require a different connection. There are a bunch of other restrictions that will necessitate testing though.

This feels like a "postgres" or "aws" problem rather than a "marten" problem, but it would be interesting to see how other clients tackle it given this work is sponsored by clients.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that one option is to also add an option per schema or mention in other places partitioning per tenant or some other way to allow other shared contexts (or mixed option).

I think that a database per tenant is useful when:

  • we have a limited (and short) number of tenants,
  • the legal requirement to prove data isolation,
  • ops team to maintain it.

When databases number grows, as you described it gets extremely hard and costly from the DevOps perspective to maintain that. I wouldn't consider a database per tenant as a viable solution for a big SaaS platform.

Comment on lines +142 to +143
* Is this done as a "hot" subscription that starts wherever the event high water mark is already when a new one is added?
* Do we support "cold" subscriptions that would cause a new subscription to start from the the beginning at event #1?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally both?

We have a version of this heavily based on Oskar's example. By default the subscription is cold. If we want to rebuild or make a new one and find that a cold subscription is undesirable, we add a filter at the start of the subscription based on event timestamp and let it catch up skipping older events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @oskardudycz is going to agree with that. He's brought that up before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, see: https://github.com/JasperFx/marten/pull/2888/files#r1442806806

See also my write-up on how ESDB subscription works: https://event-driven.io/en/persistent_vs_catch_up_eventstoredb_subscriptions_in_action/. We should enable at least the same set of basic functionalities.

Copy link
Contributor

@elexisvenator elexisvenator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my comments above:

IEventSession - love the idea, though personally wouldnt use it due to current patterns we have

Poor man's load balancing - Makes sense, especially the multitenant bit. Personally I'm waiting for the pro version ;)

FetchForReading - like the consistency of the apis for this, and that it can handle the future versioning/zero downtime magic.

Async FetchForWriting - 🥳 To me this feels critical for marten to remain performant in a large (volume) complex (variety of events/projections) system, as it allows for nearly all use cases for inline projections to be replaced with async.


* Is this done as a "hot" subscription that starts wherever the event high water mark is already when a new one is added?
* Do we support "cold" subscriptions that would cause a new subscription to start from the the beginning at event #1?
* How is `ISubscription` handled in rebuilds? Do we have specific replays? How do we help users keep from doing this accidentally when they're rebuilding projections?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would exclude subscriptions by default and handle them in a different way. At least for us, the only scenario we'd need to "rebuild" a subscription is if there's a catastrophic failure of the downstream system (ie a third party provider that sends notification has an outage), in which case we'd like the ability to reprocess the events from the last x hours.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to enable users to manage the sequence from which they subscribe. So they should be able to tell "sent me the events from this position". We used such procedures for, e.g. disaster recovery strategy when our messaging system died or the retention policy removed messages. It's also useful when we have a bug in the system, and we need to republish messages.


I'm going to allow Oskar address this one:-)

From my side, I think the `ISessionListener` approach we have today is weak and should be a little more "outbox'd" such that it only fires off when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there are actually two things here. From what I see, async session listeners are usually used when people want to publish the changes to read models outside the world, so as I described in my blog article: https://event-driven.io/en/publishing_read_model_changes_from_marten/.

Some people are fine with just forwarding the change to the read model; this scenario is fine for building a local copy of data between modules.

The other scenario is enriching events. For cases like "shopping cart confirmation", external processes can be triggered (e.g. "issuing invoice"). Internal events should be granular so they won't contain all the information. Some people want to enrich it either based on the updated read model or with an aggregating stream and send new events (so as I explained in https://event-driven.io/en/internal_external_events/).

Those two scenarios don't necessarily have to be implemented the same way. For the read model propagation, we may use logical replication, so use it as I prototyped here: https://github.com/oskardudycz/PostgresOutboxPatternWithCDC.NET.

For events enrichment, I think that we allow appending new events in the projections/subscriptions, but then defer calling external systems with projections based on the enriched events.

I also believe that having support for that will enable lightweight saga processing.

The biggest question here is what will be part of Wolverine and what will be part of Marten. I'm voting that the stuff I mentioned should be part of vanilla Marten. The logical replication part could be a plugin (could be part of CritterStackPro).

@@ -0,0 +1,174 @@
# Event Store Improvements for V7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote, it'd be easier to comment if the auto-formatting in Rider would be disabled as it's randomly breaking lines by the characters limit.

**Gotta do this in a way such that users can opt into whether the new events should be emitted during projection rebuilds**


## Sharding the Event Store tables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should also give the possibility to scan per tenant and stream type. This could speed up a lot of processing, but of course, it would mean that we don't have global ordering between partitions. Still, in many cases this should be good enough to speed up performance and allow easier database size management.


## IEventSession interface/service

What if we had a new interface specifically for folks who really only use the event sourcing at different areas of the code that got you straight to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't see that as super useful. Imho, it adds another layer of indirection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on this


## Document by document type identity map behavior

Today, document sessions either have the identity map behavior for all document types or no identity map for all types. I think especially around the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that maybe it'd be better to treat that as an internal option for projection usage. Making that public would make the number of permutations even higher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've just got to have the capability is all. It's actually not going to be that big of a deal mechanically. Really just an enabler step

low hanging fruit because all you'd do is switch up the cached `IDocumentStorage` objects for a certain document type.


## Ensure that aggregates updated "Inline" are coming from identity map
Copy link
Collaborator

@oskardudycz oskardudycz Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some value in that, but I see that as an edge case. In general, I'm against promoting the snapshot usage, so I'm not super positive about that. I'm not sure how many people really need such performance optimisation. It creates a pit of further issues like migrations, upcasting inline projections, and managing stuff being out of sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read everything you've had to say about this, but don't reach the same conclusions. I think we're just about to invest in a whole lot of functionality in v7 that I think will address the migration problems with at least single stream aggregations. And besides, we've already got requests for this on the Wolverine + Marten side of things.

Copy link
Collaborator

@oskardudycz oskardudycz Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I understand why people want it, but we should start limiting the use cases we don't want to support. This is one of such features that, IMHO, will create more issues than benefits in the long term, both for users and us. I think that we should encourage people to solve that stuff differently.

Could you expand on what migration problems it will solve and how?


This is really just sticking a "projection version number" on the Projection subclasses that would potentially be used as a suffix on the document
tables for the projected documents (the document alias). This would enable blue/green deployment of the same projection so that nodes on the previous version can be
using the previous revision/definition of the projection while other nodes are running the newer model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it still be based on the async daemon id? So deployment would need:

  • new document class,
  • new projection,
  • new versioned API?

What if one would like to reuse the same document when:

  • the structure didn't change, but there was a bug in apply method,
  • Did they provide a new version of the document that's backwards compatible? Would they need to type document alias?

I think that we should give also an option to run it in the same node. So deploying a new version of the application that can handle both the old projection and the new one. That's needed for monoliths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, versioning only makes sense if:

  • The document remains the same shape
  • The changes to the apply methods would result different content in the document (eg adding an apply for an event that is already in use)
  • OR: the devs have decided that a projection needs to be rebuilt for some undetectable reason (maybe they did a db restore or some old bug caused consistency problems)

If the document changes shape, then versioning would either require returning a different document type based on whether the new version has caught up (not a thing that C# supports) or migrating the old document shape to the new one which is worse because the resulting document is at best inaccurate at worst invalid until the rebuild is complete anyway. It would be better to just have both deployed so that developers could land a change that switches their business logic to the new one once it is ready.

Finally, there are some situations where you would want to change the projection, but not rebuild/version the projection:

  • Adding an apply for a new event that is not yet in your system.
  • Removing an apply for an event that has been removed from your system (eg through restreaming, or versioning to another supported event)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that using the same document is the wrong way of doing migration.

The biggest benefit of having a weak schema (as we have with JSON) is having non-breaking migration. Some changes require a big revamp of the schema, and for those, it's better to have new documents, but for smaller ones like new fields, it's perfectly fine to add a new field as nullable keeping it backwards and forward compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that adding a new field with a sensible default value can be done with the same document, though only when that value doesnt need to be backfilled to be accurate... unless you are ok with it being inaccurate until the backing projection table is rebuilt... so many caveats


## "Poor Man's Load Balancing" of the Async Daemon

See Oskar's post [How to scale out Marten](https://event-driven.io/en/scaling_out_marten/). I'm voting to just bake this work directly into Marten as a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think about keeping it as sharding, as I described or adding also something more to that? The presented approach assumed that each async daemon instance has a different set of projections. It could be useful to think if we could give option to specify the projection set of active projections through the environment variables or config, then one can use the same artefact (e.g. docker image) but provide different settings to decide on what to handle.

## Dynamic Database per Tenant Multi-Tenancy model

A JasperFx client is sponsoring this work, so it's absolutely in scope. This time there'll need to be a "master" database that has a table
to track tenants and tenant databases. The async daemon should be able to discover new tenant databases at runtime and try to spin up a new daemon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would new daemon be a new process or new hosted service sharing the host?




## FetchForReading alternative
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on the use case? I'm having a hard time understanding the need for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example wolverine code I ended up writing yesterday:

public record GetGroupResult(GroupId Id, string Name, IReadOnlyList<string> Rules);
public static class GetGroupEndpoint
{
    public static async Task<Group?> LoadAsync(
        string groupId,
        IDocumentSession session,
        ILookupService lookupService,
        CancellationToken token)
    {
        return await lookupService.LookupGroupIdForRead<Group>(session, groupId, ProjectionLifecycle.Live, token);
    }

    [Tags("Groups")]
    [WolverineGet("group/{groupId}")]
    public static GetGroupResult Get([Required] Group group)
    {
        return new GetGroupResult(group.GroupId, group.Name, group.Rules);
    }
}

Where the lookupService is implemented as:

public async Task<TAggregate?> LookupGroupIdForRead<TAggregate>(
    IQuerySession session, 
    string group, 
    ProjectionLifecycle aggregateLifecycle,
    CancellationToken token = default) where TAggregate : class
{
    var lookup = await LookupGroupId(session, group, token);
    if (lookup is null)
    {
        return null;
    }

    if (aggregateLifecycle == ProjectionLifecycle.Live)
    {
        return await session.Events.AggregateStreamAsync<TAggregate>(lookup.Value.streamKey, token: token);
    }

    return await session.LoadAsync<TAggregate>(lookup.Value.streamKey, token);
}

public async Task<(GroupId logicalId, string streamKey)?> LookupGroupId(
    IQuerySession session, 
    string group, 
    CancellationToken token = default)
{
    if (!GroupId.TryBuild(session.GetTenant(), group, out var logicalId))
    {
        return null;
    }

    // TODO: Cache this as needed
    var streamKey = await session.Query<IdLookupProjection>()
        .Where(l => l.LogicalId.Equals(logicalId.Value))
        .Select(l => l.StreamKey)
        .FirstOrDefaultAsync(token);

    if (streamKey is null)
    {
        return null;
    }

    return (logicalId, streamKey);
}

Other ways I could have done this:

  • Use FetchForWriting and be scared of optimistic concurrency messing with a get that should have no side effects
  • Use reflection into StoreOptions to do the same checks that FetchForWriting does
  • Create multiple service methods for each aggregate type.

This way is just neater, and removes the need for users to know the difference like they dont need to know when using [Aggregate]

switch between projection lifecycles. This will be almost absolutely necessary when we go to the "zero downtime" projection
rebuilds

## Async strategy for FetchForWriting()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,174 @@
# Event Store Improvements for V7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other ideas that we should include in considerations:

  • give users possibility to include their own event type mapping policy (e.g. by strict namespace). That would require changes potentially breaking, as we have event type mapping distributed around the whole codebase, but I think that this is must-have for monoliths,
  • potentially adding upcasters,
  • open telemetry.

of GitHub issues marked for the V7 release.


## Raise events from asynchronous projection updates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an awesome feature. This would make async projections even more usable in Marten, as having a clean way to hook into async projection updates is something I have wanted to be able to do a number of times.


I think this is going to require some additional options for "versioned documents". Instead of the `Guid`-based strategy we just today for optimistic versioning, we'd
need an `int` based model where the stream version is also embedded into the document storage for the projected document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Tagging a random line near the bottom so that discussion threads are possible)

Additional possible idea: Remove the need for multi-stream projections to use the same ID type as is configured for the event store.

Reasoning here is that if you use guids for stream keys and you have to combine streams in some way for a multi-stream projection, there is a good chance that the resulting grain of the projection is not going to be a guid.
The only reason I am not using guid stream keys for events is so that multi-stream projections can support string ids. The limit feels arbitrary and I suspect is not their as intentional design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is a limitation, or I don't know why you think that exists. You're limited by the types that can be used as an identity.

Or did you mean the single stream projections?


By no means will this be a true equivalent of the projection load balancing work planned for "CritterStackPro"

## Dynamic Database per Tenant Multi-Tenancy model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have it as I was building workarounds for this several times.


## "Poor Man's Load Balancing" of the Async Daemon

See Oskar's post [How to scale out Marten](https://event-driven.io/en/scaling_out_marten/). I'm voting to just bake this work directly into Marten as a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having support for this in the free version would be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I meant by the "poor man's load balancing". It will be in the free core, but it'll be less capable than the paid model. We do need to make a living here man!

For the Wolverine integration, I'd like to be able to enroll in Wolverine's outbox as part of processing a page of events in the daemon to send out messages using the
aggregation data as part of the outgoing message.

More generically, we need a good way to emit new events that will be processed as appends within the same daemon transaction. This might be a little tricky and will
Copy link
Contributor

@ElanHasson ElanHasson Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse my lack of understanding here, I am new to the platform.

I imagine async processing, refers to inline vs async projections processing.

Does the daemon transaction you're referring to mean the upsert+event publishing?

Is aync processing doing 1 txn per identity/grouping?

* My personal proposal is to use native PostgreSQL sharding on the `is_archived` column to effect a "hot storage, cold storage" approach. This should be paired with
advice to users to be more aggressive about archiving event streams that are obsolete/stale/not active as a way to improve throughput over all. The daemon and every
other place where we query events automatically uses `is_archived = false` filters anyway
* We probably shouldn't be sharding on anything that would impact the async daemon's "high water detection" that uses the event sequence. In other words, we
Copy link
Contributor

@ElanHasson ElanHasson Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises the question of if the high water mark and the progression checkpoint should be dimensioned by $group.

Is it possible to have a distinct sequence per $group?

$group here refers to what you're partitioning on. As sharding is typically highly dependent on your use case and how it's accessed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to do that, yes. That's also why the flexible sharding strategies won't happen in the 7.0 release though too:(

@jeremydmiller jeremydmiller deleted the event-store-plans branch March 20, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants