-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ON HOLD] #14375: Optimize usage of unordered_map in tt::Cluster #14446
Conversation
@sminakov-tt is the IndexMap here similar to the SlotMap introduced by @eyonland and @patrickroberts ? fyi @pgkeller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sminakov-tt do you have a rough idea on how many elements are usually there in these containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great to move away from map
what does indexmap buy us vs just using a vector?
do we really need optionals and has_value? when will correct code not have a value when accessed?
Sorry for the clang-tidy noise on this PR. |
@blozano-tt I actually quite liked the clang-tidy comments, I think they are reasonable, so I fixed all of them. |
@davorchap From what I can see, SlotMap is very different in both purpose and implementation compared to IndexMap. IndexMap is just an api wrapper over std::vector<std::optional> to mimic std::map api. |
@ayerofieiev-tt I currently believe that each index is roughly in range [0; numberOfChipsInTheCluster) |
@pgkeller
And there are also some iterations, which are supposed to only iterate over existing elements, for example:
This is why I think we need to have the optionals there, and IndexMap is just a simple wrapper to preserve map-like api. |
@sminakov-tt awesome! I'm glad you liked them. I'm happy to enable more checks in the main line incrementally. The mistake I made last time, was enable everything at once. |
@@ -0,0 +1,121 @@ | |||
// SPDX-FileCopyrightText: © 2024 Tenstorrent AI ULC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file go in tt_metal/tt_stl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aliuTT could you provide feedback on the ethernet metadata structures
@@ -264,28 +265,28 @@ class Cluster { | |||
|
|||
// There is one device driver per PCIe card. This map points id of the MMIO device points to the associated device | |||
// driver | |||
std::unordered_map<chip_id_t, std::unique_ptr<tt_device>> mmio_device_id_to_driver_; | |||
tt::IndexMap<std::unique_ptr<tt_device>> mmio_device_id_to_driver_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #13949 will be merged soon and updates this so that there is one device driver for all devices detectd in a cluster (this will not need to be an IndexMap at that point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds amazing!
|
||
// Need to hold reference to cluster descriptor to detect total number of devices available in cluster | ||
// UMD static APIs `detect_available_device_ids` and `detect_number_of_chips` only returns number of MMIO mapped | ||
// devices | ||
std::string cluster_desc_path_; | ||
std::unique_ptr<tt_ClusterDescriptor> cluster_desc_; | ||
// There is an entry for every device that can be targeted (MMIO and remote) | ||
std::unordered_map<chip_id_t, metal_SocDescriptor> sdesc_per_chip_; | ||
tt::IndexMap<metal_SocDescriptor> sdesc_per_chip_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there will always be one soc desc per chip so this can just be a vector sized to total number of chips in a cluster
// Save mapping of device id to associated MMIO device id for fast lookup | ||
std::unordered_map<chip_id_t, chip_id_t> device_to_mmio_device_; | ||
tt::IndexMap<chip_id_t> device_to_mmio_device_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, there is an entry for each device so we don't necessarily need the IndexMap here
@@ -296,10 +297,10 @@ class Cluster { | |||
// 2 -> 1 | |||
// 1 -> 0 | |||
// 3 -> 1 | |||
std::unordered_map<chip_id_t, uint16_t> device_to_host_mem_channel_; | |||
tt::IndexMap<uint16_t> device_to_host_mem_channel_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also an entry for each device id in here
Based on the review from @abhullar-tt, I will wait on #13949, and then replace IndexMap with a simple std::vector in the mentioned places. I would really appreciate if we can get similar feedback on It would be really nice to just have std::vector everywhere, so we can kill IndexMap. The less code the better 🎉 |
|
||
// Collections of devices that are grouped based on the associated MMIO device. MMIO device is included in the | ||
// grouping | ||
std::unordered_map<chip_id_t, std::set<chip_id_t>> devices_grouped_by_assoc_mmio_device_; | ||
tt::IndexMap<std::set<chip_id_t>> devices_grouped_by_assoc_mmio_device_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there will be one entry per mmio device so it can be a vector of this size
I guess I'm ok w/ this. From 10K feet it feels like we're using IndexMap to preserve the map-like API when using a map was bad design to begin with. So, is this a bandaid to easy transition or is there value to how these are being used? |
Ticket
#14375
Problem description
Cluster::get_driver consumes a lot of resources according to the profiler. Primarily there are two reasons: it's being called too often, and it uses an std::unordered_map where an std::vector would work
What's changed
Describe the approach used to solve the problem.
Summarize the changes made and its impact.
Checklist