Replies: 7 comments 12 replies
-
I've actually been looking into async mongo specifically for tyring to move as much code over to async. As you've mentioned one option is As for inter-napp communication, the direction I've been thinking of moving this is into a having generic references to objects or collections that we can then attempt to resolve actions on, similar to graphQL. For example, you want to add a flow to a table on a particular switch, you would ask the switch for the references to its tables, then use the table reference to insert the flow. At the top level, to access a reference you would use the NApp's reference, provided by the controller. Optionally you could create a wrapper class for the references which would automatically make the calls to the resolvers when trying to access an attribute. |
Beta Was this translation helpful? Give feedback.
-
Great summary. I was only aware of the out of order events since I hit it a couple of times with testing. |
Beta Was this translation helpful? Give feedback.
-
Front-endVue 2
Usability
Code maintenance
|
Beta Was this translation helpful? Give feedback.
-
Back-endThread and asyncio platform usage history
Thread race conditions
@listen_to('.*.switch.interface.link_up')
def on_interface_link_up(self, event):
interface = event.content['interface']
with self._intfs_lock[interface.id]:
if (
interface.id in self._intfs_updated_at
and self._intfs_updated_at[interface.id] > event.timestamp
):
return
self._intfs_updated_at[interface.id] = event.timestamp
self.handle_link_up(interface)
@listen_to('.*.switch.interface.link_down')
def on_interface_link_down(self, event):
interface = event.content['interface']
with self._intfs_lock[interface.id]:
if (
interface.id in self._intfs_updated_at
and self._intfs_updated_at[interface.id] > event.timestamp
):
return
self._intfs_updated_at[interface.id] = event.timestamp
self.handle_link_down(interface)
a) @alisten_to('.*.switch.interface.[link_up|link_down]')
async def on_interface_link_event(self, event):
interface = event.content['interface']
if event.name.endswith("link_down"):
self.handle_link_down(interface)
elif event.name.endswith("link_up"):
self.handle_link_up(interface)
self.do_notify(event) a2) @alisten_to('.*.switch.interface.[link_up|link_down]')
async def on_interface_link_event(self, event):
interface = event.content['interface']
async with self._intfs_lock[interface.id]:
if event.name.endswith("link_down"):
await self.handle_link_down(interface)
elif event.name.endswith("link_up"):
await self.handle_link_up(interface)
await self.do_notify(event) b) @listen_to('.*.switch.interface.[link_up|link_down]', pool='dynamic_single')
def on_interface_link_event(self, event):
interface = event.content['interface']
if event.name.endswith("link_down"):
self.handle_link_down(interface)
elif event.name.endswith("link_up"):
self.handle_link_up(interface)
self.do_notify(event) Other examples:
Kytos-ng is a majestic monolith with plugins NAppsContext:
Tech debt:
Typical NApp high level code architecture organization:
Move topology links to kytos core
Kytos event bus is meant to be pub sub fanoutKytos event bus is meant to be pusb sub fanout subscription like.
Migrate to MongoDB 7, MongoDB 5 EOL by late 2024.
Migrate ES to 9.X from 7.17
Upgrade Python dependencies
Maintain core projects on PyPI
NApps installation and marketplace
e2e (end-to-end) testing
Related things that aren't necessarily tech debt, but they would be great to have:
New documentation
Final thoughtsWhen it comes to addressing the tech debts they can be thought from two main ways:
Finally, if that's being replaced is IO based, stress testing should also be done to ensure that the performance and reliability is at least or par or better. Tech debts being mitigated should be planned for too, similar to other features or fixes, especially if their effort scope is significant. |
Beta Was this translation helpful? Give feedback.
-
@viniarck , That's a great compilation for our next all-hands meeting. You covered well the software development deficits. One thing we need to consider is new functionalities coming for our network operators and their impact on our software workflow. For instance, we need to decouple from OpenFlow and support more generic southbound protocols (including SONIC). We need network functionalities as well, such as port mirroring, INT, point-to-multipoint, and support for ACL. All of those lead to support for multi-tables and inter-napps flow dependency and manipulation. Also, we should consider authentication/authorization, better topology monitoring (BFD), new applications and integration with FPGAs and smartnics, and faster path protection. |
Beta Was this translation helpful? Give feedback.
-
I don't know if I should have made another discussion, but I want to share some ideas as to how we can make the UI easier to maintain for the future. Front-endRemoval of the Dynamic Loading of ComponentsI think that loading the components dynamically does more harm than good. We should move all of the UI into the main UI repo and load it all at once. We can keep the vue3-sfc-loader as a side component in case we ever need it for the future or even use it to test components. We could also use Vue 3 async components and distribute components as needed. Dev ServerThere is no quick way to test components within the main UI repo. To test changes made to the main UI repo, you need to recompile the UI and place the build within the Kytos Mocking the backendI read it from these two sources, but in short you can do it by using the proxy option within the webpack config. TestingCurrently, it is very time-consuming and error-prone to just click all of the UI buttons individually. Especially since sometimes they perform a request, and this can branch out, fail, or succeed. This is why it would be good to have unit testing for these little cases. I have not yet tested it, but I'm trying to get a working prototype for unit tests with the Vitest testing framework. Formatter and LinterAfter finishing implementing the testing framework, we can also add a linter and formatter to keep the code well organized. Additional NotesSome of the components may need refactoring. |
Beta Was this translation helpful? Give feedback.
-
Other tech_debt ones in the back-end worth highlighting too: #532 (this is priority_major as of Feb 2025) |
Beta Was this translation helpful? Give feedback.
-
This is just a summary of existing major tech debts on
kytos-ng
and which future potential epics to start discussing about them and address them. Major tech debts:Front-end
Back-end
@listen_to
potential out of order events execution is getting cumbersome and tricky to maintain@alisten_to
or@listen_to(pool="dynamic")
if you can afford, however the current MongoDB client withpymongo
is sync.@listen_to
as@alisten_to
when there's no blocking dependency down (includingthreading.Lock
instances, those need to beasyncio.Lock
, although managingasyncio.Lock
is much easier to reason about since all the yield/await points are explicit so certain non suspending that functions that get called might not even need to have a lock mid iteration),asyncio
is not a silver bullet though, but compared with the threading use case it should be not only lighter in terms of resources but easier to maintain and reason.Keep moving more towards
asyncio
, try to refactor existing synchronous threadpool based endpoints to async. This is related to 1, but for HTTP endpoints that front-end and other API clients use.NApps aren't microservices and
kytosd
can't be stateful clustered easily without a massive refactoring that would take years. Kytos-ng can be categorized as a majestic monolith, which is great too. A Napp will run in a shared Python runtime, it's just a dynamic lodead "plugin", it's a Python package that gets loaded dynamically. There's room to simplify service-to-service / NApp-to-NApp communication in certain use cases that's a request is being used, concrete example:mef_eline
when trying to find paths viapathfinder
uvicorn/starlette
app instance is shared accross all NApps, so there's no full fault-tolerance in terms of process/HTTP/database. Also, many NApp depends on shared resources on core such asInterface
,Switch
andLink
(on topology) and operation on these are using shared memory. The original intention was great to try to make a NApp as a microservice-like via HTTP for service to service communication, but at the moment we get all the downsides of this type of communication without much of the benefits. If one day this were to be a full blown real microservices architecture it could be done if it were to be business justifiable, but it would need a team much bigger, and all NApps to core communication via HTTP too (this is what uonos is doing with gRPC), and much harder and deployment orchestration, and since we don't have any scaling issues that can justify this, we have room to simplify service-to-service communication by not using HTTP to service-to-service and only use it for the front-end (or external 3rd party clients) and service-to-service/NApp-to-NApp via Pythoncallables
and/or via the event bus when it makes sense. We don't have any major bottleneck in production use cases,uvicorn/starlette
theoretically handle dozens of thousands requests/sec, and MongoDB can scale horizontally also handling dozens of thousands of ops/sec.callables
either adef func(*args, **kwargs)
orasync def func(*args, **kwargs)
that can be called by any other NApp withPydantic
model validations or trying to reuse validations fromopenapi-core
. A NApp that needs to call other NApp would try to get whether or not the callable has been registered as start using it. There's also the fact that NApps get started in certain order, but long as you start to call it after you get a loaded event it should work without exceptions, and then with rate limiting in the future we can also plug rate limit here.pathfinder
. This can work too, but it's a lot of extra code and leg work for virtually no benefit at all, not to mention asynchronously error handling on a different code path. (This can be solved with the proposed simplification on point 2)Start support an additional MongoDB driver such as
motor
once it's been confirmed it's prod grade at least equal or superior performance (this is related to point 1 on this list).Migrate to MongoDB 7, MongoDB 5 EOL in late 2024.
e2e testing
cc'ing @jab1982 for his information, and the core team @italovalcy @Ktmi @Alopalao @rmotitsuki. If you guys also have noticed any major paint point feel free to add in the comments, and have any other suggestion, as we can afford to try to mitigate certain tech, the first step is to start mapping just so they can be potentially considered and prioritized one day. Once they are, it'll be further categorized in a epic and further broken down into tasks.
Beta Was this translation helpful? Give feedback.
All reactions