Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing system descriptor #265

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/common/module_builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ void ModuleBuilder::convertFromTTIRToTTNN(
mlir::PassManager ttir_to_ttnn_pm(mlir_module.get()->getName());

mlir::tt::ttnn::TTIRToTTNNBackendPipelineOptions options;
options.systemDescPath = system_desc_path.data();
mlir::tt::ttnn::createTTIRToTTNNBackendPipeline(ttir_to_ttnn_pm, options);

// Run the pass manager.
Expand Down
2 changes: 2 additions & 0 deletions src/common/module_builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ class ModuleBuilder {
// code. Currently hardcoded to one, as we only support one-chip execution.
size_t getNumAddressableDevices() const { return 1; }

static constexpr std::string_view system_desc_path = "system_desc.ttsys";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why this is not just a plain string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this can be a plain string, there is no benefit to keep it as string_view. See more info here regarding string vs string_view (std string_view is implementing the same thing as abseil's).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would store tt::runtime::SystemDesc object inside the ClientInstance and pass it through the Compile function to the ModuleBuilder which would then pass it to the createTTIRToTTNNBackendPipeline. Since tt-mlir currently only supports passing a path to the descriptor and is not easy to change to support passing already parsed object, as a temporary solution we need to save it on the disk as pass the path to the compiler.

I propose these changes:

  • Create a member variable tt::runtime::SystemDesc m_system_descriptor in ClientInstance which will be set in the ClientInstance::PopulateDevices() function from the system_desc variable.
  • Create a member variable std::string m_cached_system_descriptor_path which should be set inside the ClientInstance constructor to the combination of std::filesystem::temp_directory_path() directory and some file name that should be unique from other programs, for example tt_pjrt_system_descriptor plus maybe name of the device architecture, and maybe even some client id if there is some in pjrt structures.
  • In the ClientInstance::PopulateDevices() initialize m_system_descriptor with system_desc, and then store it into m_cached_system_descriptor_path with a TODO comment to remove that once the support in tt-mlir is done to pass the system descriptor object. Check if store method checks for errors, if not we should check.
  • In the ClientInstance::Compile function pass the m_cached_system_descriptor_path to the module_builder_->buildModule call.
  • In the ClientInstance::~ClientInstance() remove the cached system descriptor, as you already do.


private:
// Creates VHLO module from the input program code.
mlir::OwningOpRef<mlir::ModuleOp>
Expand Down
2 changes: 2 additions & 0 deletions src/common/pjrt_implementation/client_instance.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ ClientInstance::ClientInstance(std::unique_ptr<Platform> platform)

ClientInstance::~ClientInstance() {
DLOG_F(LOG_DEBUG, "ClientInstance::~ClientInstance");
std::remove(ModuleBuilder::system_desc_path.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ClientInstance destructor doing this? Why even do this in the first place? I don't think I have ever seen a string being cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::remove is cleaning up a file stored in that path

}

PJRT_Error *ClientInstance::Initialize() {
Expand Down Expand Up @@ -164,6 +165,7 @@ void ClientInstance::BindApi(PJRT_Api *api) {
tt_pjrt_status ClientInstance::PopulateDevices() {
DLOG_F(LOG_DEBUG, "ClientInstance::PopulateDevices");
auto [system_desc, chip_ids] = tt::runtime::getCurrentSystemDesc();
system_desc.store(ModuleBuilder::system_desc_path.data());
int devices_count = chip_ids.size();

devices_.resize(devices_count);
Expand Down
Binary file added system_desc.ttsys
Binary file not shown.
Loading